The Eckart-Young-Mirsky Theorem

The result of the Eckart-Young-Mirsky Theorem is easily stated: It simple tells us that the solution problem of finding the best rank-\(k\) approximation to a certain \(n\times n\) matrix, w.r.t to the spectral or Frobenius norm, is the truncated SVD with \(k\)-terms.

As a start we note that the problem of finding the best rank-\(k\) approximation of the \(n\times n\)-matrix \(A\) w.r.t to a certain norm \(\|\cdot \|\) can be viewed as the solution to the following minimization problem

\[ \underset{B\in\mathbb{R}^{n\times n}}{\text{minimize}} \|A - B\|, \quad \text{subject to}\quad \text{rank}(B) = k. \]

Furthermore the equality constraint can be expressed explictly if we reformulate the minimization problem as

\[ \underset{X,Y \in \mathbb{R}^{n\times k}}{\text{minimize}} \|A - XY^\top \|. \]

Note that a solution \((X,Y)\) to the above minimization problem will not be unique: If \((X,Y)\) is a solution then for any invertible \(R\in\mathbf{R}^{k\times k}\) also \((XR,YR^{-T})\) is a solution.

For The Spectral Norm

Suppose that \(A = U\Sigma V^\top \) is the singular value decomposition of \(A\). Then the best rank-\(k\) approximation of the matrix \(A\) w.r.t the spectral norm, \(\|\cdot\|_2\), is given by

\[ A_k = \sum_{i=1}^k\sigma_iu_iv_i^\top . \]

First note that if \(A_k\) is defined as in (3) then

\[ \|A - A_k\|_2 = \left\|\sum_{i=k+1}^n\sigma_iu_iv_i^\top \right\|_2 = \sigma_{k+1}. \]

Now if this really is the best approximation then we must show that for all \(B_k = XY^\top \) where \(X,Y\in\mathbb{R}^{n\times k}\) then \(\|A - A_k\|_2 = \sigma_{k+1} \leq \|A-B_k\|_2\).

Since \(Y\) has \(k\) columns there must be a nontrivial linear combination of the first \(k+1\) columns of \(V\), i.e. \(w = v_1\gamma_1 + \dots + v_{k+1}\gamma_{k+1} = V_{k+1}\gamma\), such that \(Y^\top w = 0\). Without loss of generality we can scale \(w\) such that \(\|w\|_2=1\). Therefore

\[ \|A - B_k\|_2 \geq \|(A-B_k)w\|_2 = \|Aw\|_2 = \sqrt{\gamma_1^2\sigma_1^2 + \dots + \gamma_{k+1}^2\sigma_{k+1}^2} \geq \sigma_{k+1}, \]

which shows that the truncated SVD is the best rank-\(k\) approximation of a matrix w.r.t to the spectral norm.

Note that we in (5) used that

\[ \|A - B_k\|_2 = \sup\{\|(A - B_k)w\|_2 : w \in \mathbb{R}^n,\ \|w\|_2 = 1\} \]

and

\[ \|Aw\|_2 = \|U\Sigma V^\top V_{k+1}\gamma\|_2 = \|\Sigma_k \gamma\|_2 = \sqrt{\gamma_1^2\sigma_1^2 + \dots + \gamma_{k+1}^2\sigma_{k+1}^2} \]

For The Frobenius Norm

Suppose that \( A = U\Sigma V^\top \) is the singular value decomposition of \(A\). Then the best rank-\(k\) approximation of the matrix \(A\) w.r.t the Frobenius norm, \(\|\cdot\|_F\), is given by

\[ A_k = \sum_{i=1}^k\sigma_iu_iv_i^\top . \]

The first step is rewrite the minization problem as

\[ \underset{X,Y \in \mathbb{R}^{n\times k}}{\text{minimize}} \|A - XY^\top \| = \|\Sigma - U^\top XY^\top V\|_F. \]

where \(A = U\Sigma V^\top \) is the SVD of \(A\) and we have used the orthogonal invariance of the Frobenius norm. Its easy to see that there exist no better solution to the above minization problem than \(U^\top XY^\top V = \text{diag}(\sigma_1,\dots, \sigma_k,0,\dots, 0)\). This solution can be achived by setting

\[ \Sigma_k^{1/2} = \text{diag}(\sigma_1^{1/2},\dots, \sigma_k^{1/2}), \quad X = \begin{bmatrix}u_1& \dots & u_k\end{bmatrix}\Sigma_k^{1/2}, \quad Y^\top = \Sigma_k^{1/2}\begin{bmatrix} v_1^\top \\ \vdots \\ v_k^\top \end{bmatrix}. \]

From this we can conclude that the best rank-\(k\) approximation w.r.t the Frobenius norm is the truncated SVD.