Triangularization : My Last Memento of Linear Algebra

ABSTRACT. Schur triangularization is a powerful tool in linear algebra: it implies spectral decompostion, Cayley-Hamilton, removal rule and Jordan canonical form over the complex number field. With the help of algebraic closure, ordinary triangularization serves to generalize the previous results. The spirit of simultaneous triangularization is explored through problem-solving, and several relevant theorems from Lie algebra are recalled. Finally, we consider the triangularization of matrices over PID. A review of complexification, algebraic closure, Gramian and finitely generated modules over PID is given in the appendices. As the goal is to go over linear algebra through one topic, many preliminary results are recalled and some interesting digressions are inserted.

Schur Triangularization

Generalized Schur’s Theorem: Statement

THEOREM. Let \(V\) be a nonzero finite-dimensional real inner product space and \(T\) a linear operator on \(V\).

I) There exists an orthonormal basis \(\beta\) for \(V\) such that

\[[T]_{\beta}=\small \begin{pmatrix} A_1 &  &  &  &  &\\ &  \ddots&  &  & \Large{*} &\\ &  & A_q &  &  &\\ &  &  & c_1 &  &\\ &  &  &  & \ddots &\\ &  &  &  &  & c_p \end{pmatrix} \]

is a block upper triangular matrix, where \(A_j\in M_{2\times 2}(\mathbb{R})\ (j=1,\cdots,q)\) and \(c_j\in \mathbb{R}\ (j=1,\cdots,p)\).

II) Moreover, if \(T\) is normal, then there exists an orthonormal basis \(\beta\) such that

\[[T]_{\beta}= \begin{pmatrix} \left.\begin{matrix} \begin{matrix}a_1 & -b_1 \\ b_1 & a_1\end{matrix} &  & \\ & \ddots & \\ &  & \begin{matrix}a_q & -b_q \\ b_q & a_q\end{matrix}\\ \hline \end{matrix}\hspace{-0.2em}\right| & \begin{matrix} *&\cdots&* \\ *&\cdots&* \\ \vdots&&\vdots \\ *&\cdots&* \\ *&\cdots&* \end{matrix} \\ & \begin{matrix} c_1 & \cdots & * \\ & \ddots & \vdots \\ &  & c_p \end{matrix} \end{pmatrix} \]

is a block upper triangular matirx, where the multiset \(\{c_1,\cdots,c_p,a_1\pm ib_1,\cdots,a_q\pm ib_q\}\) with \(b_j\neq 0\ (j=1,\cdots,q)\) is the spectrum of the complexification of \(T\).

III) In particular, if \(T\) is orthogonal, then there exists an orthonormal basis \(\beta\) for \(V\) such that

\[[T]_{\beta}=\begin{pmatrix} \small\begin{matrix}\cos\theta_1 & -\sin\theta_1 \\ \sin\theta_1 & \cos\theta_1\end{matrix} &  &  &  &  &\\ &  \ddots&  &  &  &\\ &  & \small\begin{matrix}\cos\theta_q & -\sin\theta_q \\ \sin\theta_q & \cos\theta_q\end{matrix} &  &  &\\ &  &  & \varepsilon_1 &  &\\ &  &  &  & \ddots &\\ &  &  &  &  & \varepsilon_p \end{pmatrix} \]

is a block diagonal matrix, where \(\theta_j\in \mathbb{R}\setminus\{k\pi:k\in \mathbb{Z}\}\ (j=1,\cdots,q)\) and \(\varepsilon_j=\pm 1\ (j=1,\cdots,p)\). That is to say, any orthogonal operator on a nonzero finite-dimensional real inner product space is the composite of rotations and reflections (the order does not matter), and the space can be decomposed as a direct sum of pairwise orthogonal \(1\)– or \(2\)-dimensional spaces that are invariant under the operator.

In each case above, the \(q\)-tuple of second-order blocks on the diagonal can be arranged to its arbitrary permutation, e.g., \((A_1,\cdots,A_q)\) can be arranged to \((A_{\sigma(1)}\cdots,A_{\sigma(q)})\), where \(\sigma\) is any given permutation. The similar statement also holds for the \(p\)-tuple of first-order blocks on the diagonal. However, if we change the order of the basis \(\beta\) in I) in order that the \(q\)-tuple and the \(p\)-tuple are mixed, e.g., \(\text{diag}([T]_{\beta})=(A_1,c_1,A_2,c_2,\cdots)\), then \([T]_{\beta}\) will no longer be garanteed to be a blocked upper triangular matrix. The same statement goes for II).

Generalized Schur’s Theorem: Proof

We focus on proving I) and add in the proof the key observations of II), after which III) follows immediately.

Proof of I). Induction on \(n:=\dim V\). The theorem is trivial when \(n=1\), so we may assume that \(n\ge 2\). Assume that the theorem is true for any real inner product space of dimension less than \(n\).

Case 1: \(T\) has an eigenvalue \(\lambda\).

Since \(\ker(T^*-\lambda I)=\text{im}(T-\lambda I)^{\perp}\), we have

\[\dim \ker(T^*-\lambda I)=n-\dim \text{im}(T-\lambda I)=\dim \ker(T-\lambda I)\ge 1 \]

and thus there exists a unit vector \(z\) such that \(T^*z=\lambda z\). Define \(W:=\text{span}(\{z\})\). Then \(W\) is \(T^*\)-invariant, and so \(W^{\perp}\) is \(T\)-invariant, with \(\dim W^{\perp}=n-1\). By induction hypothesis, there exists an orthonormal basis \(\gamma\) for \(W^{\perp}\) such that \([T_{W^{\perp}}]_{\gamma}\) is of the stated form. Define \(\beta:=\gamma\cup \{z\}\). Then \(\beta\) is an orthonormal basis for \(V\) such that

\[[T]_{\beta}=\begin{pmatrix}\left.\begin{matrix}[T_{W^{\perp}}]_{\gamma}\\O_{1\times (n-1)}\end{matrix}\ \right| \phi_{\beta}(T(z))\end{pmatrix} \]

is of the stated form, with \([T]_{\beta}(n,n)=\langle T(z),z\rangle=\langle z,T^*(z)\rangle=\langle z,\lambda z\rangle=\lambda\).

Case 2: \(T\) has no eigenvalue.

In this case, \(T^*\) has no eigenvalue either. Let \(x=x_1\otimes 1+x_2\otimes i\ (x_1,x_2\in V)\) be an eigenvector of \((T^*)_{\mathbb{C}}\) with the corresponding eigenvalue \(\overline{\lambda}=\lambda_1+i\lambda_2\ (\lambda_1,\lambda_2\in \mathbb{R})\). Clearly, \(\lambda_2\neq 0\) and \(x_1,x_2\) are linearly independent in \(V\). Also, we have

\[\begin{align*} &T^*(x_1)\otimes 1+T^*(x_2)\otimes i\\ =&(T^*)_{\mathbb{C}}(x_1\otimes 1+x_2\otimes i)\\ =&(\lambda_1+i\lambda_2)(x_1\otimes 1+x_2\otimes i)\\ =&(\lambda_1x_1-\lambda_2x_2)\otimes 1+(\lambda_2x_1+\lambda_1x_2)\otimes i \end{align*}\implies \begin{cases}T^*(x_1)=\lambda_1x_1-\lambda_2x_2\\T^*(x_2)=\lambda_2x_1+\lambda_1x_2\end{cases}\quad ① \]

Define \(W:=\text{span}(\{x_1,x_2\})\). Then \(W\) is a \(2\)-dimensional \(T^*\)-invariant subspace, and so \(W^{\perp}\) is an \((n-2)\)-dimensional \(T\)-invariant subspace. By induction hypothesis, there exists an orthonormal basis \(\gamma\) for \(W^{\perp}\) such that \([T_{W^{\perp}}]_{\gamma}\) is of the stated form. Let \(\{x’_1,x’_2\}\) be an orthonormal basis for \(W\), and define \(\beta’:=\gamma\cup\{x’_1,x’_2\}\). Then \(\beta’\) is an orthonormal basis for \(V\) and

\[[T]_{\beta’} =\begin{pmatrix} \left.\left.\begin{matrix}[T_{W^{\perp}}]_{\gamma}\\O_{2\times (n-2)}\end{matrix}\right|\phi_{\beta’}(T(x’_1))\right| \phi_{\beta’}(T(x’_2)) \end{pmatrix} \]

is of the stated form. Therefore I) is proved.

Proof of II). Now we assume that \(T\) is normal. Thanks to the identity \((T_{\mathbb{C}})^*=(T^*)_{\mathbb{C}}\), \(T_{\mathbb{C}}\) is normal as well. Therefore, we have

\[(T^*)_{\mathbb{C}}(x)=\lambda x\implies (T_{\mathbb{C}})^*(x)=\lambda x\iff T_{\mathbb{C}}(x)=\bar{\lambda}x\implies \begin{cases}T(x_1)=\lambda_1x_1+\lambda_2x_2\\T(x_2)=-\lambda_2x_1+\lambda_1x_2\end{cases}\quad ② \]

Consequently, \(W\) is \(T\)-invariant as well and \([T_{W}]_{\{x_1,x_2\}}=\begin{pmatrix}\lambda_1 & -\lambda_2\\ \lambda_2 & \lambda_1 \end{pmatrix}\). Note that

\[\begin{align*} &\langle T(x_1),x_1 \rangle=\langle x_1,T^*(x_1) \rangle\implies \lambda_2 \langle x_2,x_1 \rangle=0\implies \langle x_2,x_1 \rangle=0\\ &\langle T(x_1),x_2 \rangle=\langle x_1,T^*(x_2) \rangle\implies \lambda_2\langle x_2,x_2 \rangle=\lambda_2\langle x_1,x_1 \rangle\implies \|x_1\|=\|x_2\| \end{align*} \]

Without the loss of generality, we may assume that \(\|x_1\|=\|x_2\|=1\). Define \(\beta:=\gamma\cup \{x_1,x_2\}\), then \(\beta\) is a orthonormal basis for \(V\) such that

\[[T]_{\beta} =\begin{pmatrix} [T_{W^{\perp}}]_{\gamma} & O \\ O & [T_{W}]_{\{x_1,x_2\}} \end{pmatrix}\]

as desired.

Proof of III). Note that if a block upper triangular matrix \(\begin{pmatrix}A & C\\O & B\end{pmatrix}\) is unitary, then \(A^*A=I,B^*B+C^*C=I\) and \(BB^*=I\), implying that \(A,B\) are unitary and \(C=O\). Now III) follows from II) and the fact above, recalling that the eigenvalues of a unitary matrix are all of modulus one. \(\blacksquare\)

Remark. In fact, if the block upper triangular matrix \(\begin{pmatrix}A & C\\O & B\end{pmatrix}\) is normal, then we have \(A^*A=AA^*+CC^*\), and so \(\text{tr}(CC^*)=0\), implying that \(C=O\), and consequently both \(A\) and \(B\) are normal.

Corollary 1: The Spectral Theorem

From the proof of I), we see that for any linear operator on a nonzero finitely-dimensional complex inner product space is unitarily triangonalizable. This is Schur’s theorem. Using the fact in III), it follows that a linear operator on a nonzero finitely-dimensional complex inner product space is unitarily diagonalizable iff it is normal. Moreover, if a linear operator on a finitely-dimensional real inner product space is self-adjoint, then it is normal and hence has the matrix representation

\[\small \begin{pmatrix} \begin{matrix}a_1 & -b_1 \\ b_1 & a_1\end{matrix} &  &  &  &  &\\ &  \ddots&  &  & \Large{*} &\\ &  & \begin{matrix}a_q & -b_q \\ b_q & a_q\end{matrix} &  &  &\\ &  &  & c_1 &  &\\ &  &  &  & \ddots &\\ &  &  &  &  & c_p \end{pmatrix} \]

with respect to some orthonormal basis. But the matrix is self-adjoint, and thus it is actually diagonal. Thus we obtain the spectral theorem from the generalized Schur’s theorem. \(\blacksquare\)

Remark 1 (Schur’s Inequality). Let \(A\in M_{n\times n}(\mathbb{C})\) and \(\lambda_i\ (i=1,\cdots,n)\) be the eigenvalues of \(A\). Denote by \(\|\cdot\|_F\) the Frobenius norm. By Schur’s theorem, we have

\[\sum_{i=1}^{n}|\lambda_i|^2\le \|A\|_F^2 \]

with the equality holds iff \(A\) is normal. This is Schur’s inequality. In fact, we can derive the following equality:

\[\inf\limits_{X\in GL_n(\mathbb{C})}\|X^{-1}AX\|_F^2=\sum_{i=1}^{n}|\lambda_i|^2 \]

Therefore, every normal operator has the minimal Frobenius norm in its similarity class. (Note that two similar normal operators are automatically unitarily equivalent and hence have the same Frobenius norm.) Conversely, if a matrix minize the Frobenius norm in its similarity class, then it must be normal.

Remark 2 (Disgression: Low-Rank Approximation). Let \(A\in M_{m\times n}(\mathbb{C})\) and \(\sigma_1\ge \cdots \ge \sigma_k\ge \cdots\ge \sigma_r\ge 0\) be the nonzero eigenvalues of \(A\), where \(r=\text{rank}(A)\) and \(1\le k\le r\). Then we have

\[\inf\limits_{\text{rank}(B)\le k}\|B-A\|_{F}^2=\sum_{i=k+1}^{r}|\sigma_i|^2 \]

(Also, note that \(\|A\|_F^2=\sum_{i=1}^{r}|\sigma_i|^2\).) If \(A=U\Sigma V^*\) is a SVD such that

\[\Sigma=\small\begin{pmatrix} \left.\begin{matrix} \sigma_1 & & & & \\ & \ddots & & & \\ & & \sigma_k & &\\ & & & \ddots & \\ & & & & \sigma_r \\ \hline \end{matrix}\hspace{-0.2em}\right| & & & & \\ & & & & \\ & & & & \end{pmatrix}\]

then \(\widehat{A}=U\widehat{\Sigma}V^*\) achieves the infimum, where

\[\widehat{\Sigma}=\begin{pmatrix} \left.\small\begin{matrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_k \\\hline \end{matrix}\hspace{-0.2em}\right| & & & & & & \\ & & & & & & \\ & & & & & & \\ & & & & & & \end{pmatrix} \]

Moreover, if \(\sigma_k\neq \sigma_{k+1}\), then the minimizer is unique. This is Eckart–Young–Mirsky theorem for Frobenius norm. In fact, \(\widehat{A}\) is also the best rank-\(k\) approximation to \(A\) in the spectral norm, and

\[\inf\limits_{\text{rank}(B)\le k}\|B-A\|_2=\|\widehat{A}-A\|_2=\sigma_{k+1} \]

The proof can be find on Wikipedia.

Corollary 2: Cayley-Hamilton and Removal Rule

Proof of Cayley-Hamilton. By Schur’s theorem, it suffices prove Cayley-Hamilton for any complex upper triangular matrix \(A=(a_{ij})_{n\times n}\).

Note that the characteristic polynomial of \(A\) is \(f(t)=\prod_{k=1}^{n}(t-a_{kk})\) and hence \(f(A)=\prod_{k=1}^{n}(A-a_{kk}I)\). We prove by induction that the first \(l\) column of the matrix \(B_{l}=\prod_{k=1}^{l}(A-a_{kk}I)\) are all \(0\), for all \(1\le l\le n\), and then conclude that \(f(A)=B_n=O\).

When \(l=1\), obvious. Assume that the result is true for \(l-1\), i.e. the first \(l-1\) column of \(B_{l-1}=\prod_{k=1}^{l-1}(A-a_{kk}I)\) are all \(0\). Then \(\forall 1\le i\le n\) and \(\forall 1\le j\le l\), we have

\[\begin{align*} B_l(i,j)&=\sum_{k=1}^{n}B_{l-1}(i,k)(A-a_{ll}I)(k,j)\\&=\underbrace{\sum_{k=1}^{l-1}B_{l-1}(i,k)(A-a_{ll}I)(k,j)}_{(1)}+\underbrace{\sum_{k=l}^{n}B_{l-1}(i,k)(A-a_{ll}I)(k,j)}_{(2)}. \end{align*} \]

Note that \(\forall 1\le k\le l-1, B_{l-1}(i,k)=0\) (induction hypothesis), and that \(\forall l\le k\le n, (A-a_{ll}I)(k,j)=0\), both \((1)\) and \((2)\) are zero, and so \(B_{l}(i,j)=0\). Therefore, the first \(l\) column of \(B_l\) are all \(0\). \(\blacksquare\)

Thanks to Schur’s theorem, we can prove the following lemma without using the Jordan canonical form.

Lemma. Let \(A\in M_{n\times n}(\mathbb{C})\) and \(\text{Spec}(A)=\{\lambda_1,\cdots,\lambda_n\}\) (multiset). Then for any polynomial \(f\) over \(\mathbb{C}\), we have \(\text{Spec}(f(A))=\{f(\lambda_1),\cdots,f(\lambda_n)\}\). \(\blacksquare\)

Next proposition serves as a preparation for Corollary 3: Jordan Canonical Form. It is our removal rule.

Proposition. Let \(F\) be any subfield of \(\mathbb{C}\). Let \(A\in M_{m\times m}(F),B\in M_{n\times n}(F)\) be two square matrices. Let \(p_A,p_B\) be the characteristic polynomials of \(A,B\). If \(\text{gcd}(p_A,p_B)=1\) over \(F\), then for any \(M\in M_{m\times n}(F)\), the matrix \(\begin{pmatrix}A & M\\O & B\end{pmatrix}\) is similar to \(\begin{pmatrix}A & O\\O & B\end{pmatrix}\) as matrices in \(M_{(m+n)\times (m+n)}(F)\). (Note that if \(F=\mathbb{C}\), then the condition is equivalent to \(\text{Spec}(A)\cap \text{Spec}(B)=\varnothing\).)

Proof. If the Sylvester equation \(AX-XB=M\) has a solution, then

\[\begin{pmatrix}I_m & X\\O & I_n\end{pmatrix}\begin{pmatrix}A & M\\O & B\end{pmatrix}\begin{pmatrix}I_m & -X\\O & I_n\end{pmatrix}=\begin{pmatrix}A & O\\O & B\end{pmatrix} \]

and thus the two matrices are similar. (In fact, the converse is also true, but much more difficult. It’s called Roth’s removal rule.) Consider the linear operator

\[\varphi:M_{m\times n}(F)\to M_{m\times n}(F)\quad X\mapsto AX-XB \]

We need to show that \(\varphi\) is surjective. It suffices to show that \(\varphi\) is injective, i.e., if \(AX=XB\), then \(X=O\). Note that \(A^2X=A(AX)=A(XB)=(AX)B=(XB)B=XB^2\), and \(A^3X=A(A^2X)=A(XB^2)=(AX)B^2=(XB)B^2=XB^3\), etc. Thus, for any polynomial \(f\) over \(F\), we have \(f(A)X=Xf(B)\). Let \(m_A,m_B\) to be the minimal polynomials of \(A,B\) over \(F\). Then \(\text{gcd}(m_A,m_B)=1\) and \(m_B(A)X=Xm_B(B)=O\). We show that \(m_B(A)\) is invertible and therefore \(X=O\). Assume for the contrary that \(0\) is an eigenvalue of \(m_B(A)\). Since that the minimal polynomial of \(A\) over \(\mathbb{C}\) equals \(m_{A}\), by the lemma above there exists \(\lambda\in \mathbb{C}\) such that \(m_A(\lambda)=0\) and \(m_B(\lambda)=0\), and clearly \(\lambda\notin F\). Let \(h\) be the minimal polynomial of \(\lambda\) over \(F\), then \(h|m_A\) and \(h|m_B\), contradicting \(\text{gcd}(m_A,m_B)=1\). \(\blacksquare\)

Remark 1 (Minimal Polynomial). If \(E/F\) be a field extension and \(A\in M_{n\times n}(F)\), then the minimal polynomial of \(A\) over \(E\) equals the minimal polynomial of \(A\) over \(F\). Here is an elegant proof: In the same fashion of complexification, \(F^n\otimes_{F} E\) is naturally a vector space over \(E\), with a canonical isomorphism

\[\phi:F^n\otimes_F E\to E^n\quad\phi(\sum_i \mathbf{e}_i\otimes \mu_i):=\sum_i \mu_i\mathbf{e}_i \]

Consider the commutative diagram


\begin{tikzcd}
F^n\otimes_F E \arrow[rr, "L_A\otimes \text{id}_E"] \arrow[d, "\phi"]  &  & F^n\otimes_F E \arrow[d, "\phi"] \\
E^n \arrow[rr, "T:=\phi^{-1}\circ (L_A\otimes \text{id}_E)\circ \phi"] &  & E^n                             
\end{tikzcd}
We have $$ \begin{align*} T(\sum_i\mu_i\mathbf{e}_i)&=T\circ \phi(\sum_i \mathbf{e}_i\otimes \mu_i)\\ &=\phi\circ (L_A\otimes \text{id}_E)(\sum_i \mathbf{e}_i\otimes \mu_i)\\ &=\phi(\sum_i A\mathbf{e}_i\otimes \mu_i)=\sum_i \mu_i A\mathbf{e}_i=A(\sum_i\mu_i\mathbf{e}_i) \end{align*} $$ Note that $\beta$ is a basis of $E$, and $[T]_{\beta}=A$. Therefore, $$ \begin{align*} &\hphantom{=\ }\text{The minimal polynomial of $A$ over $E$}\\ &=\text{The minimal polynomial of $T:E^n\to E^n$}\\ &=\text{The minimal polynomial of $L_A\otimes \text{id}_E:F^n\otimes_F E\to F^n\otimes_F E$}\\ &=\text{The minimal polynomial of $L_A:F^n\to F^n$}\\ &=\text{The minimal polynomial of $A$ over $F$} \end{align*} $$

Remark 2 (Alternative Proof). When \(F=\mathbb{C}\), there is an alternative proof without invoking Cayley-Hamilton. (Note that the existence of minimal polynomials of square matrices is garanteed by Cayley-Hamilton.) By Schur’s theorem, there exist two unitary matrix \(U_1,U_2\) such that \(T_1=U_1AU_1^*,T_2=U_2BU_2^*\) are upper triangular, with the eigenvalues of \(A,B\) on their diagonals. Define \(U=\begin{pmatrix}U_1&O\\O&U_2\end{pmatrix}\). Then \(U\) is unitary and

\[U\begin{pmatrix}A & O\\O & B\end{pmatrix}U^*=\begin{pmatrix}T_1 & O\\O & T_2\end{pmatrix},\quad U\begin{pmatrix}A & M\\O & B\end{pmatrix}U^*=\begin{pmatrix}T_1 & U_1MU_2^*\\O & T_2\end{pmatrix} \]

Therefore, we may assume without loss of generality that \(A=(a_{ij})_{m\times m},B=(b_{ij})_{n\times n}\) are upper triangular without commun diagonal entries. As showed in the previous proof, it suffices to show that if \(AX=XB\), then \(X=(x_{ij})_{m\times n}\) is zero. Indeed,

\[\begin{align*} & \text{Entries of } AX=XB : \text{Equations and Consequences} \\\hline \bullet\ & (m,1): a_{mm}x_{m1}=x_{m1}b_{11}\implies x_{m1}=0\\ \bullet\ & (m,2),(m-1,1): \begin{cases} a_{m,m}x_{m2}=x_{m2}b_{22}\implies x_{m2}=0\\ a_{m-1,m-1}x_{m-1,1}=x_{m-1,1}b_{11}\implies x_{m-1,1}=0 \end{cases}\\ \bullet\ & (m,3),(m-1,2),(m-2,1): \begin{cases} a_{m,m}x_{m3}=x_{m3}b_{33}\implies x_{m3}=0\\ a_{m-1,m-1}x_{m-1,2}=x_{m-1,2}b_{22}\implies x_{m-1,2}=0\\ a_{m-2,m-2}x_{m-2,1}=x_{m-2,1}b_{11}\implies x_{m-2,1}=0 \end{cases}\\ \end{align*} \]

\[\cdots \]

Hence we are done. \(\blacksquare\)

Corollary 3: Jordan Canonical Form

Schur triangularization implies Jordan canonical form over \(\mathbb{C}\).

Proof. Given any complex square matrix \(A\), by Schur’s theorem \(A\) is unitarily equivalent to an upper triangular matrix of the form

\[\begin{pmatrix} \boxed{\small\begin{matrix}\lambda_1&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_1\end{matrix}} & \Large{*} & \cdots & \Large{*} \\ & \boxed{\small\begin{matrix}\lambda_2&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_2\end{matrix}} & \cdots & \Large{*}\\ & & \ddots & \vdots\\ & & & \boxed{\small\begin{matrix}\lambda_l&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_l\end{matrix}} \end{pmatrix} \]

where \(\lambda_1,\lambda_2,\cdots,\lambda_s\) are all the distinct eigenvalues of \(A\). By applying removal rule inductively, we derive that the matrix above is similar to the block diagonal matrix

\[\begin{pmatrix} \boxed{\small\begin{matrix}\lambda_1&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_1\end{matrix}} & & & \\ & \hspace{-0.5em}\boxed{\small\begin{matrix}\lambda_2&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_2\end{matrix}} & &\\ & & \ddots &\\ & & & \boxed{\small\begin{matrix}\lambda_l&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_l\end{matrix}} \end{pmatrix} \]

Therefore it suffices to show that each \(\small\begin{pmatrix}\lambda_i&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_i\end{pmatrix}\) has a Jordan canonical form.

We only need to show that if \(\Lambda\in M_{n\times n}(\mathbb{C})\) is strictly upper triangular, then it has a Jordan canonical form. Denote \(L_{\Lambda}:\mathbb{C}^n\to \mathbb{C}^n\) by \(T\). Clearly, \(T\) is nilpotent. Denote by \(k\) the nilpotency index of \(T\). Since \(k=1\) implies \(T=O\), we may assume that \(k\ge 2\). Let \(\gamma_j\) be any basis for \(\ker(T^j)\ (j=1,\cdots,k-1)\). Note that

\[0=\ker(T^0)\subset \ker(T^1)\subset \cdots\subset \ker(T^{k-1})\subset \ker(T^k)=\mathbb{C}^n \]

We construct a Jordan canonical basis for \(T\):

Step 1 Extend \(\gamma_{k-1}\) to a basis for \(\ker(T^k)\): \(\gamma_{k-1}\cup \beta_k\). Then \(\gamma_{k-2}\cup T^1\beta_{k}\) is linearly independent. Indeed, let \(\gamma_{k-1}=\{w_1,\cdots,w_p\},\gamma_{k-2}=\{w’_1,\cdots,w’_q\}\) and \(\beta_k=\{v_1,\cdots,v_m\}\), then

\[\begin{align*} \sum_{i=1}^{m}\widetilde{k}_iT(v_i)+\sum_{i=1}^{q}k’_iw’_i=0 &\implies T^{k-1}(\sum_{i=1}^{m}\widetilde{k}_iv_i)=0\\ &\implies \sum_{i=1}^{m}\widetilde{k}_iv_i=\sum_{i=1}^{p}k_iw_i\text{ for some $(k_1,\cdots,k_p)$}\\ &\implies \text{$\widetilde{k}_i=0$ for all $i$, and further $k’_i=0$ for all $i$} \end{align*} \]

This argument also works in the following steps.

Step 2 Extend \(\gamma_{k-2}\cup T^1\beta_{k}\) to a basis for \(\ker(T^{k-1})\): \(\gamma_{k-2}\cup T^1\beta_k\cup \beta_{k-1}\). Then \(\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}\) is linearly independent.

Step 3 Extend \(\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}\) to a basis for \(\ker(T^{k-2})\): \(\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}\cup\beta_{k-2}\). Then \(\gamma_{k-4}\cup T^3\beta_k\cup T^2\beta_{k-1}\cup T^1\beta_{k-2}\) is linearly independent.

\(\cdots\)

Step k-1 Extend \(\gamma_1\cup T^{k-2}\beta_{k}\cup T^{k-3}\beta_{k-1}\cup\cdots\cup T^1\beta_3\) to a basis for \(\ker(T^2)\): \(\gamma_1\cup T^{k-2}\beta_{k}\cup T^{k-3}\beta_{k-1}\cup\cdots\cup T^1\beta_3\cup \beta_2\). Then \(T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\) is linearly independent.

Step k Extend \(T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\) to a basis for \(\ker(T^1)\): \(T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\cup \beta_1\).

Since \(\gamma_1\) is an arbitrary basis for \(\ker(T^1)\), by substituting \(\gamma_1=T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\cup \beta_1\) into Step k-1, we conclude that the union of

\[\begin{matrix} T^{k-2}\beta_{k} & T^{k-3}\beta_{k-1} & \cdots & \beta_2 & \\ T^{k-1}\beta_k & T^{k-2}\beta_{k-1} & \cdots & T^1\beta_2 & \beta_1 \end{matrix} \]

is a basis for \(\ker(T^2)\). Repeating this procedure inductively, we see that the union of

\[\begin{matrix} \beta_k & & & & \\ T\beta_k & \beta_{k-1} & & & \\ \vdots & \vdots & \ddots & & \\ T^{k-2}\beta_{k} & T^{k-3}\beta_{k-1} & \cdots & \beta_2 & \\ T^{k-1}\beta_k & T^{k-2}\beta_{k-1} & \cdots & T^1\beta_2 & \beta_1 \end{matrix} \]

is a basis for \(\ker(T^k)=\mathbb{C}^n\). Moreover,

\[\begin{align*} \#(\beta_i)&=[\dim\ker(T^i)-\dim\ker(T^{i+1})]-[\dim\ker(T^{i-1})-\dim\ker(T^i)]\\ &=2\dim\ker(T^i)-\dim\ker(T^{i+1})-\dim\ker(T^{i-1}) \end{align*} \]

for all \(i\).

Let \(\beta_i=\{v_{i,1},\cdots,v_{i,n_i}\}\ (i=1,\cdots,k)\). Then

\[\beta:=\bigcup_{i=1}^{k}\bigcup_{j=1}^{n_i}\{T^{i-1}(v_{i,j}),\cdots,T(v_{i,j}),v_{i,j}\} \]

is an ordered basis for \(\mathbb{C}^n\) such that

\[[T]_{\beta}=\text{diag}\{\underbrace{N_1,\cdots,N_1}_{\#(\beta_1)};\underbrace{N_2,\cdots,N_2}_{\#(\beta_2)};\cdots;\underbrace{N_k,\cdots,N_k}_{\#(\beta_k)}\} \]

where

\[N_i=\small\begin{pmatrix} 0 & 1 & & & \\ & 0 & 1 & & \\ &&\ddots&\ddots& \\ &&&0&1 \\ &&&& 0 \end{pmatrix}_{i\times i} \]

for each \(i\). This fulfills the proof. \(\blacksquare\)

Ordinary Triangularization

Statement & Proof

Many results above are proved over \(\mathbb{C}\) using Schur triangularization. Their generalizations are made possible by ordinary triangularization.

Lemma. Let \(F\) be any field. Let \(V\) be a vector space over \(F\), and \(T\) a linear operator on \(V\).

Suppose that \(W\) is a nonzero \(T\)-invariant subspace of \(V\). Then it is easy to check that

\[\overline{T}:V/W\to V/W\quad\overline{T}(v+W):=T(v)+W \]

is a well-defined linear operator, and it is the unique linear operator such that \(\eta T=\overline{T}\eta\), where \(\eta:V\to V/W\) is the linear transformation defined by \(\eta(v):=v+W\).

To go further, assume that \(V\) is finite-dimensional and nonzero. Let \(\gamma=\{v_1,\cdots,v_k\}\) be an ordered basis for \(W\) and extend \(\gamma\) to an ordered basis \(\beta=\{v_1,\cdots,v_k,v_{k+1},\cdots,v_n\}\) for \(V\). Then \(\alpha=\{v_{k+1}+W,\cdots,v_n+W\}\) is an ordered basis for \(V/W\). Using the equation \([\eta]_{\beta}^{\alpha}[T]_{\beta}=[\overline{T}]_{\alpha}[\eta]_{\beta}^{\alpha}\), it is easy to show that

\[[T]_{\beta}= \begin{pmatrix} [T_W]_{\gamma}&*\\ O&[\overline{T}]_{\alpha}     \end{pmatrix}\]

Based on this fact, we have:
(1) The characteristic polynomials satisfy

\[p_T(\lambda)=p_{T_W}(\lambda)p_{\overline{T}}(\lambda) \]

(2) If \(T\) is diagonalizable, then \([T]_{\beta}\) is normal and so we must have

\[[T]_{\beta}= \begin{pmatrix} [T_W]_{\gamma}&O\\ O&[\overline{T}]_{\alpha}     \end{pmatrix} \]

(see the remark at the end of Generalized Schur’s Theorem: Proof), and hence both \(T_W\) and \(\overline{T}\) are diagonalizable.
(3) If both \(T_W\) and \(\overline{T}\) is diagonalizable and \(\text{gcd}(p_{T_W},p_T)=1\) over \(F\), then by removal rule \(T\) is diagonalizable as well. \(\blacksquare\)

THEOREM. Let \(F\) be any field. Let \(V\) be a nonzero finite-dimensional vector space over \(F\), and \(T\) a linear operator on \(V\). If the characteristic polynomial of \(T\) splits, then there exists a basis \(\beta\) for \(V\) such that \([T]_{\beta}\) is an upper triangular matrix.

Proof. Induction on \(n:=\dim V\). The theorem is trivial when \(n=1\), so we may assume that \(n\ge 2\). Assume the theorem is true whenever the dimension the space is less then \(n\). Since the characteristic polynomial of \(T\) splits, \(T\) has an eigenvector \(z\). Define \(W:=\text{span}(\{z\})\), then \(W\) is \(T\)-invariant and \(\dim(V/W)=n-1\). By the lemma above, \(p_{\overline{T}}\) divides \(p_T\) and hence splits as well. Applying the induction hypothesis to \(\overline{T}:V/W\to V/W\) finishes the proof. \(\blacksquare\)

Corollaries

TODO:

Simultaneous Triangularization

Problem 1. Let \(A,B\in M_{n\times n}(\mathbb{C})\).
(1) Show that if \(AB=BA\), then \(A,B\) are simultaneously triangularizable.
(2) Show that if \(\text{rank}(AB-BA)=1\), then \(A,B\) are simultaneously triangularizable.

TODO:

Problem 2. Let \(A,B\) and \(C\) be matrices in \(M_n(\mathbb{C})\) such that \(C=AB-BA, AC=CA,BC=CB\).
(1) Show that the eigenvalues of \(C\) are all zero.
(2) Let \(m_A(\lambda)\) and \(m_B(\lambda)\) be the minimal polynomials of \(A\) and \(B\), respectively, and \(k:=\min \{\deg m_A(\lambda), \deg m_B(\lambda), n-1\}\). Show that \(C^k=0\).
(3) If \(n=2\), then \(C=O\).
(4) Show that there exists a commun eigenvector of \(A,B\) and \(C\).
(5) Show that \(A,B\) and \(C\) are simultaneously triangularizable.

TODO:

Triangularization over PID

TODO:

Appendices

Appendix A: Complexification

“The complexification of the real vector space \(\mathbb{R}^n\) is the complex vector space \(\mathbb{C}^n\). The complexification of the real inner product space \((L^2(\Omega;\mathbb{R}),\langle u,v \rangle:=\int_{\Omega} u(x)v(x)\,\mathrm{d}x)\) is the complex inner product space \((L^2(\Omega;\mathbb{C}),\langle f,g \rangle:=\int_{\Omega} f(x)\overline{g(x)}\,\mathrm{d}x)\).”

I. Let \(V\) be a real vector space and \(T\) a linear operator on \(V\). Define the complexification of \(V\) to be the complex vector space

\[V_{\mathbb{C}}:=V\otimes_{\mathbb{R}}\mathbb{C} \]

The scalar multiplication is made possible by defining

\[\lambda (v\otimes \mu):=v\otimes (\lambda\mu)\quad (v\in V;\lambda,\mu\in \mathbb{C}) \]

where \(\otimes\) replaces \(\otimes_{\mathbb{R}}\) for brevity. Define the complexification of \(T\) to be the linear operator

\[T_{\mathbb{C}}:=T\otimes \text{id}_{\mathbb{C}} \]

Complexifications of linear transformations are defined in the same fashion.

II. Every vector in \(V_{\mathbb{C}}\) is uniquely of the form

\[v+iw:=v\otimes 1+i(w\otimes 1)=v\otimes 1+w\otimes i\quad (v,w\in V) \]

If \(\beta\) is a basis for the real vector space \(V\), then \(\beta\otimes_{\mathbb{R}} 1\) is automatically a basis for the complex vector space \(V_{\mathbb{C}}\). In particular, \(\dim_{\mathbb{C}}(V_{\mathbb{C}})=\dim_{\mathbb{R}}(V)\). It is obvious that if \(T\) is invertible, then so is \(T_{\mathbb{C}}\), and

\[(T_{\mathbb{C}})^{-1}=(T^{-1})_{\mathbb{C}} \]

Moreover, if \(V\) is nonzero and finite-dimensional, then i) thanks to the fact that \(T_{\mathbb{C}}\) has an eigenvector, there exists a \(T\)-invariant subspace of dimension \(1\) or \(2\); ii) The characteristic/minimal polynomial of \(T_{\mathbb{C}}\) is exactly the same as that of \(T\) (and hence has real coefficients).

III. If \(\langle\cdot,\cdot\rangle:V\times V\to \mathbb{R}\) is an inner product, then \(\langle\cdot,\cdot\rangle_{\mathbb{C}}:V_{\mathbb{C}}\times V_{\mathbb{C}}\to \mathbb{C}\) defined by

\[\langle v+iw,v’+iw’ \rangle_{\mathbb{C}}:=\langle v,v’ \rangle+\langle w,w’ \rangle+i\langle w,v’ \rangle-i\langle v,w’ \rangle \]

is the unique inner product on \(V_{\mathbb{C}}\) that restricts back to \(\langle \cdot,\cdot \rangle\). We claim that, with respect to this pair of inner products, if the adjoint of \(T\) exists, then so does the adjoint of \(T_{\mathbb{C}}\), and

\[(T_{\mathbb{C}})^*=(T^*)_{\mathbb{C}} \]

Indeed, for any \(v+iw,v’+iw’\in V_{\mathbb{C}}\), we have

\[\begin{align*} &\hphantom{=\ }\langle T_{\mathbb{C}}(v+iw),v’+iw’ \rangle_{\mathbb{C}}\\ &=\langle T(v)+iT(w),v’+iw’ \rangle_{\mathbb{C}}\\ &=\langle T(v),v’ \rangle+\langle T(w),w’ \rangle+i\langle T(v),w’ \rangle-i\langle T(w),v’ \rangle\\ &=\langle v,T^*(v’) \rangle+\langle w,T^*(w’) \rangle+i\langle v,T^*(w’) \rangle-i\langle w,T^*(v’) \rangle\\ &=\langle v+iw,T^*(v’)+iT^*(w’) \rangle_{\mathbb{C}}\\ &=\langle v+iw,(T^*)_{\mathbb{C}}(v’+iw’) \rangle_{\mathbb{C}} \end{align*} \]

IV. Let \(W\) be a complex vector space. Let \(\gamma\) be any basis for \(W\), then \(i\gamma\) is also a basis for \(W\). Clearly, \(\gamma\cap i\gamma=\varnothing\), and \(\gamma\cup i\gamma\) is linearly independent over \(\mathbb{R}\). Define \(W_{\mathbb{R}}\) to be the real vector space formed by all the linear combinations with real coefficients of the vectors in \(\gamma\cup i\gamma\). Since \(W_{\mathbb{R}}=\text{span}_{\mathbb{R}}(\gamma)\oplus \text{span}_{\mathbb{R}}(i\gamma)\), it is independent of the choice of \(\gamma\), and is called the realification of \(W\). They have the same underlying set. Let \(S\) be a linear operator on \(W\), then it is automatically a linear operator on \(W_{\mathbb{R}}\), denoted by \(S_{\mathbb{R}}\). If \(W\) is nonzero and finite-dimensional, then \(\dim_{\mathbb{R}}(W_{\mathbb{R}})=2\dim_{\mathbb{C}}(W)\), and \([S_{\mathbb{R}}]_{\gamma\cup i\gamma}=\begin{pmatrix}\text{Re} [S]_{\gamma} & -\text{Im}[S]_{\gamma} \\ \text{Im}[S]_{\gamma} & \text{Re} [S]_{\gamma}\end{pmatrix}\). If \(\langle \cdot,\cdot \rangle:W\times W\to \mathbb{C}\) is an inner product, then \(\langle \cdot,\cdot \rangle_{\mathbb{R}}:W_{\mathbb{R}}\times W_{\mathbb{R}}\to \mathbb{R}\) defined by \(\langle w_1,w_2 \rangle_{\mathbb{R}}:=\text{Re}\langle w_1,w_2\rangle\) is the unique inner product such that \(\langle w,w \rangle_{\mathbb{R}}=\langle w,w \rangle\) and \(\langle w,iw\rangle_{\mathbb{R}}=0\) for all \(w\in W_{\mathbb{R}}\).

V. Complexification is obviously an additive functor from \(\text{Vect}_{\mathbb{R}}\) to \(\text{Vect}_{\mathbb{C}}\). By knowledge of homological algebra, it is the left adjoint functor of the forgetful functor from \(\text{Vect}_{\mathbb{C}}\) to \(\text{Vect}_{\mathbb{R}}\) (see, for example, Proposition 2.6.3 in Weibel’s Introduction to Homological Algerbra). We now present some natural isomorphisms. The first is

\[(V^*)_{\mathbb{C}}=V^*\otimes \mathbb{C}\cong \text{Hom}_{\mathbb{R}}(V,\mathbb{C})\cong \text{Hom}_{\mathbb{C}}(V_{\mathbb{c}},\mathbb{C})=(V_{\mathbb{C}})^* \]

where the isomorphisms are given by

\[\varphi_1\otimes 1+\varphi_2\otimes i\leftrightarrow\underbrace{\varphi_1+i\varphi_2}_{=:\varphi}\leftrightarrow \Big(v\otimes 1\mapsto \varphi(v)\Big) \]

Given another real vector spaces \(U\), we have

\[(U\otimes V)_{\mathbb{C}}\cong U_{\mathbb{C}}\otimes V_{\mathbb{C}} \]

And there is a natural isomorphism

\[\begin{align*} (\text{Hom}_{\mathbb{R}}(U,V))_{\mathbb{C}}&\overset{\cong}{\rightarrow} \text{Hom}_{\mathbb{C}}(U_{\mathbb{C}},V_{\mathbb{C}})\\ f\otimes 1+g\otimes i &\mapsto f_{\mathbb{C}}+ig_{\mathbb{C}} \end{align*} \]

(For any \(h\in \text{Hom}_{\mathbb{C}}(U_{\mathbb{C}},V_{\mathbb{C}})\), there exists a unique pair of maps \(f,g:U\to V\) such that \(h(u\otimes 1)=f(u)\otimes 1+g(u)\otimes i\) for all \(u\in U\). It is easy to check that \(f,g\) are linear and \(h=f_{\mathbb{C}}+ig_{\mathbb{C}}\).)

Appendix B: Algebraic Closure

An algebraic closure of a field \(F\) is an algebraic extension of \(F\) that is algebraically closed. We will prove that every field has an algebraic closure. The proof invokes the Axiom of Choice when referring to Krull’s theorem: if \(R\) is a ring and \(I\subset R\) is a proper ideal, then there exists a maximal ideal of \(R\) containing \(I\). The basic idea of the proof goes back to Emil Artin. (Do NOT hesitate to consult the excellent notes by Keith Conrad: 1. [https://kconrad.math.uconn.edu/blurbs/galoistheory/algclosureshorter.pdf]; 2. [http://math.stanford.edu/~conrad/121Page/handouts/algclosure.pdf].)

Proof. Let \(F\) be any field. Denote by \(\{f_j:j\in J\}\) the set of all monic irreducible polynomials over \(F\). Introduce indeterminates \(u_{j,1},\cdots,u_{j,d_j}\) for each \(\lambda\), where \(d_j:=\deg(f_j)\). Let \(R\) be the polynomial ring over \(F\) generated by these indeterminates. For each \(j\), consider the coefficients of the polynomial

\[f_j(x)-\prod_{i=1}^{d_j}(x-u_{j,i})=:\sum_{i=0}^{d_j-1}r_{j,i}x^i\in R[x] \]

Denote by \(I\) the ideal in \(R\) generated by these coefficients. Let \(E_j\) be the splitting field of \(f_j\) over \(F\). Subtituting \(u_{j,1},\cdots,u_{j,d_j}\) by the roots of \(f_j\) in \(E_j\), we see that \(r_{j,i}\) vanishes at these roots. Consequently, \(1\notin I\) and so \(I\) is a proper ideal in \(R\). By Krull’s theorm, there exists a maximal ideal \(\mathfrak{m}\) in \(R\) such that \(I\subset \mathfrak{m}\). Then the field \(K_1:=R/\mathfrak{m}\) is an algebraic extension of \(F\) such that every polynomial over \(F\) has a root in \(K_1\). By repeating inductively, we may construct a sequence of fields \(\{K_n\}_{n=1}^{\infty}\) such that each \(K_{n+1}\) is an algebraic extension of \(K_n\) and every polynomial over \(K_n\) has a root in \(K_{n+1}\). Define \(K:=\bigcup_{n=1}^{\infty}K_n\). Then \(K\) is an algebraic extension of \(F\), and every polynomial over \(K\) has a root in \(K\), i.e., \(K\) is algebraically closed. Hence, \(K\) is an algebraic closure of \(F\). \(\blacksquare\)

In fact, algebraic closure is essentially unique.

TODO:

Appendix C: Gramian Determines Shape

This appendix is a disgression.

Proposition. Let \(\{v_1,\cdots,v_s\}\) and \(\{w_1,\cdots,w_s\}\) be two subsets of \(\mathbb{R}^n\). Show that there exists \(A\in O(n)\) such that \(Av_i=w_i\ (1\le i\le s)\) iff \(\langle v_{i}, v_{j}\rangle=\langle w_{i}, w_{j}\rangle\ (1\le i,j\le s)\), i.e., the two Gramians are equal.

Remark 1. Similarly, if \(\{v_1,\cdots,v_s\}\) and \(\{w_1,\cdots,w_s\}\) be two subsets of \(\mathbb{C}^n\), then there exists \(A\in U(n)\) such that \(Av_i=w_i\ (1\le i\le s)\) iff \(\langle v_{i}, v_{j}\rangle=\langle w_{i}, w_{j}\rangle\ (1\le i,j\le s)\), i.e., the two Gramians are equal.

Proof. We summarize the idea of proving \((\Leftarrow)\) as follows. Given the data \(\langle v_i,v_j \rangle\ (1\le i,j\le s)\), we may focus on a maximal linearly independent subset of \(\{v_1,\cdots,v_s\}\) to study the shape formed by these \(s\) vectors in \(\mathbb{R}^n\). Suppose that \(\{v_{k_1},\cdots,v_{k_r}\}\) is a maximal linearly independent subset of \(\{v_1,\cdots,v_s\}\), then \(\{w_{k_1},\cdots,w_{k_r}\}\) is automatically a maximal linearly independent subset of \(\{w_1,\cdots,w_s\}\). Perform Gram-Schmidt on them simultaneously and extend the resulted orthonormal subsets to two orthonormal bases for \(\mathbb{R}^n\). The associated change of coordinates matrix then fulfills the proof. (The Gram-Schmidt process can be embodied by QR decomposition.)

\((\Rightarrow)\): Obvious.

\[\begin{align*} G(w_1,\cdots,w_s)&=(w_{1}\ \cdots\ w_{s})^{T}(w_{1}\ \cdots\ w_{s})\\ &=(v_{1}\ \cdots\ v_{s})^{T}A^TA(v_{1}\ \cdots\ v_{s})\\ &=(v_{1}\ \cdots\ v_{s})^{T}(v_{1}\ \cdots\ v_{s})=G(v_1,\cdots,v_s)\end{align*} \]

\((\Leftarrow)\): Note that

\[\begin{align*}\text{rank}(v_{1}\ \cdots\ v_{s})&=\text{rank}\, G(v_1,\cdots,v_{s})\\ &=\text{rank}\, G(w_1,\cdots,w_s)=\text{rank}(w_{1}\ \cdots\ w_{s})\end{align*} \]

Denote \(r:=\text{rank}(v_{1}\ \cdots\ v_{s})\). Without loss of generality, assume that \(v_1,\cdots,v_r\) are linearly independent. We claim that \(w_1,\cdots,w_r\) are linearly independent as well. Indeed,

\[\begin{align*} \text{Null}\,(w_1\ \cdots\ w_r)&=\text{Null}\,(w_1\ \cdots\ w_r)^T(w_1\ \cdots\ w_r)\\ &=\text{Null}\,(v_1\ \cdots\ v_r)^T(v_1\ \cdots\ v_r)=\text{Null}\,(v_1\ \cdots\ v_r)=0 \end{align*} \]

Therefore, there exist \(B,C\in M_{r\times (s-r)}(\mathbb{R})\) such that

\[\begin{align*} (v_1\ \cdots\ v_s)&=(v_1\ \cdots\ v_r)(I_r\ |\ B)\\ (w_1\ \cdots\ w_s)&=(w_1\ \cdots\ w_r)(I_r\ |\ C) \end{align*} \]

Denote \(G:=G(v_1,\cdots,v_r)=G(w_1,\cdots,w_r)\). Since \(\text{rank}(G)=\text{rank}(v_1\ \cdots\ v_r)=r\), the matrix \(G\) is invertible. Thus we have

\[\begin{pmatrix} I_r\\\hline B^T \end{pmatrix}\,G\,(I_r\ |\ B)=\begin{pmatrix} I_r\\\hline C^T \end{pmatrix}\,G\,(I_r\ |\ C)\implies GB=GC\implies B=C \]

Now, it suffices to find some \(A\in O(n)\) such that

\[A(v_1\ \cdots\ v_r)=(w_1\ \cdots\ w_r) \]

By QR decomposition, we have

\[(v_1\ \cdots\ v_r)=Q_1R_1,\quad (w_1\ \cdots\ w_r)=Q_2R_2 \]

where \(R_i\in M_{r\times r}(\mathbb{R})\) is an upper triangular matrix with positive digonal entries and \(Q_i\in M_{n\times r}(\mathbb{R})\) satisfies \(Q_i^TQ_i=I_r\) (semi-orthogonal). Thus we have

\[(Q_1R_1)^TQ_1R_1=(Q_2R_2)^TQ_2R_2\implies R_1^TR_1=R_2^TR_2 \]

By the uniqueness of Cholesky decompostion for postive-definite matrices, we have \(R_1=R_2\). Indeed, \(R_1R_2^{-1}=(R_1^{T})^{-1}R_2^T\) is upper triangular and lower triangular, and hence a diagonal matrix, denoted by \(D\). Thus we have

\[R_1=DR_2,\ R_2^T=R_1^TD\implies D=I_r\implies R_1=R_2 \]

Denote \(R:=R_1=R_2\). Now it suffices to find some \(A\in O(n)\) such that

\[AQ_1=Q_2 \]

Since the columns of \(Q_i\) forms an orthonormal subset of \(\mathbb{R}^n\) and thus extends to an orthonormal basis for \(\mathbb{R}^n\), there exists \(\widehat{Q}_i\in O(n)\) such that \(\widehat{Q}_i=(Q_i\ |\ X_i)\) for some \(X_i\in M_{n\times (n-r)}(\mathbb{R})\). Now define \(A:=\widehat{Q}_2\widehat{Q}_1^{-1}\). Then \(A\in O(n)\) and \(AQ_1=Q_2\), as desired. \(\blacksquare\)

Appendix D: Finitely Generated Modules over PID

TODO:

原文地址:http://www.cnblogs.com/chaliceseven/p/16853190.html

1. 本站所有资源来源于用户上传和网络,如有侵权请邮件联系站长! 2. 分享目的仅供大家学习和交流,请务用于商业用途! 3. 如果你也有好源码或者教程,可以到用户中心发布,分享有积分奖励和额外收入! 4. 本站提供的源码、模板、插件等等其他资源,都不包含技术服务请大家谅解! 5. 如有链接无法下载、失效或广告,请联系管理员处理! 6. 本站资源售价只是赞助,收取费用仅维持本站的日常运营所需! 7. 如遇到加密压缩包,默认解压密码为"gltf",如遇到无法解压的请联系管理员! 8. 因为资源和程序源码均为可复制品,所以不支持任何理由的退款兑现,请斟酌后支付下载 声明:如果标题没有注明"已测试"或者"测试可用"等字样的资源源码均未经过站长测试.特别注意没有标注的源码不保证任何可用性