In this note we derive the general form of the Sherman-Morrison-Woodury identity. In addition, we derive an extension of the formula that is used in Kernel ridge regression, often without any explanation.
We start by introducing three useful identities.
For an invertible \(A\) and \(C\) one can, by repeadly using identity 3, show that the following hold
\[\begin{aligned} (A + BCD)^{-1} BC &= (A(I + A^{-1}BCD))^{-1}BC\quad\ \quad\ \ \ \text{Using Invertible A}\\ &= (I + A^{-1}BCD)^{-1}A^{-1}BC\\ &= A^{-1}(I + BCDA^{-1})^{-1}BC\quad\ \quad\ \ \ \text{Using Identity 3}\\ &= A^{-1}B(I + CDA^{-1}B)^{-1}C\quad\ \quad\ \ \ \text{Using Identity 3}\\ &= A^{-1}B(C(C^{-1} + DA^{-1}B))^{-1}C\quad \text{Using Invertible C}\\ &= A^{-1}B(C^{-1} + DA^{-1}B)^{-1}C^{-1}C\\ &= A^{-1}B(C^{-1} + DA^{-1}B)^{-1} \end{aligned}\]This result is use in e.g. kernel ridge regression as it makes it possible to possible change the size of the matrix being inverted. In particular if \(\Phi \in \mathbb{R}^{d \times n}\) we have that
\[\begin{aligned} w &= (\lambda I_d + \Phi\Phi^\top)^{-1}\Phi y\\ &= \lambda^{-1}\Phi\left(I_n + \Phi^\top \lambda^{-1} \Phi\right)^{-1}y\\ &= \Phi\left(\lambda\left(I_n + \Phi^\top \lambda^{-1} \Phi\right)\right)^{-1}y\\ &= \Phi\left(\lambda I_n + \Phi^\top \Phi\right)^{-1}y. \end{aligned}\]From which it can be seen that can compute \(w\) by either inverting a \(d\times d\) matrix of a \(n\times n\) matrix.
We are now ready to introduce the general Sherman-Morrison-Woodbury identity as
Either of the equalities might be of use, depending on the chosen application. In particular in the case of \(C\) being invertible we arrive at the more known Sherman-Morrison-Woodbury identity as
\[\begin{aligned} (A + BCD)^{-1} &= A^{-1} - A^{-1}B(C(C^{-1} + DA^{-1}B))^{-1}CDA^{-1}\\ &= A^{-1} - A^{-1}B(C^{-1} + DA^{-1}B)^{-1}C^{-1}CDA^{-1}\\ &= A^{-1} - A^{-1}B(C^{-1} + DA^{-1}B)^{-1}DA^{-1}\\ \end{aligned}\]