Mikkel Paltorp

General Sherman-Morrison-Woodbury Identity

In this note we derive the general form of the Sherman-Morrison-Woodury identity. In addition, we derive an extension of the formula that is used in Kernel ridge regression, often without any explanation.

Some identities

We start by introducing three useful identities.

Identity 1 The first identity comes from the standard mathematical trick of adding zero as \(I + P - P = I\). Using this it follows that \[\begin{aligned} (I + P)^{-1} &= (I + P)^{-1}(I + P - P)\\ &= I - (I + P)^{-1}P \end{aligned}\]
Identity 2 For invertible A, but possible rectangular \(B, C\), and \(D\) one can show using identity 1 that \[\begin{aligned} (A + BCD)^{-1} &= (A(I + A^{-1}BCD)^{-1})\quad\quad\quad\quad\quad\quad\quad\quad \text{Using invertible A}\\ &= (I + A^{-1}BCD)^{-1}A^{-1}\\ &= \left(I - (I + A^{-1}BCD)^{-1}A^{-1}BCD\right)A^{-1}\quad \text{Using identity 1} \end{aligned}\] This identity will be the first step in derivation of the general Sherman-Morrison-formula.
Identity 3 First notice that we have \[ P + PQP = P(I + QP) = (I + PQ)P \] Moving the parentheses to the corresponding other sides of the equation result in \[ (I + PQ)^{-1}P = P(I + QP)^{-1} \] This identity is extensively used to move matrices from one side of an inverse to the other.

For an invertible \(A\) and \(C\) one can, by repeadly using identity 3, show that the following hold

\[\begin{aligned} (A + BCD)^{-1} BC &= (A(I + A^{-1}BCD))^{-1}BC\quad\ \quad\ \ \ \text{Using Invertible A}\\ &= (I + A^{-1}BCD)^{-1}A^{-1}BC\\ &= A^{-1}(I + BCDA^{-1})^{-1}BC\quad\ \quad\ \ \ \text{Using Identity 3}\\ &= A^{-1}B(I + CDA^{-1}B)^{-1}C\quad\ \quad\ \ \ \text{Using Identity 3}\\ &= A^{-1}B(C(C^{-1} + DA^{-1}B))^{-1}C\quad \text{Using Invertible C}\\ &= A^{-1}B(C^{-1} + DA^{-1}B)^{-1}C^{-1}C\\ &= A^{-1}B(C^{-1} + DA^{-1}B)^{-1} \end{aligned}\]

This result is use in e.g. kernel ridge regression as it makes it possible to possible change the size of the matrix being inverted. In particular if \(\Phi \in \mathbb{R}^{d \times n}\) we have that

\[\begin{aligned} w &= (\lambda I_d + \Phi\Phi^\top)^{-1}\Phi y\\ &= \lambda^{-1}\Phi\left(I_n + \Phi^\top \lambda^{-1} \Phi\right)^{-1}y\\ &= \Phi\left(\lambda\left(I_n + \Phi^\top \lambda^{-1} \Phi\right)\right)^{-1}y\\ &= \Phi\left(\lambda I_n + \Phi^\top \Phi\right)^{-1}y. \end{aligned}\]

From which it can be seen that can compute \(w\) by either inverting a \(d\times d\) matrix of a \(n\times n\) matrix.

We are now ready to introduce the general Sherman-Morrison-Woodbury identity as

General Sherman-Morrison-Woodbury identity \[\begin{aligned} (A + BCD)^{-1} &= A^{-1} - (I + A^{-1}BCD)^{-1}BCDA^{-1}\quad\quad\ \text{ Using Identity 2}\\ &= A^{-1} - A^{-1}(I + BCDA^{-1})^{-1}BCDA^{-1}\quad \text{Using Identity 3}\\ &= A^{-1} - A^{-1}B(I + CDA^{-1}B)^{-1}CDA^{-1}\quad \text{Using Identity 3}\\ &= A^{-1} - A^{-1}BC(I + DA^{-1}BC)^{-1}DA^{-1}\quad \text{Using Identity 3}\\ &= A^{-1} - A^{-1}BCD(I + A^{-1}BCD)^{-1}A^{-1}\quad \text{Using Identity 3}\\ &= A^{-1} - A^{-1}BCDA^{-1}(I + BCDA^{-1})^{-1}\quad \text{Using Identity 3}\\ \end{aligned}\]

Either of the equalities might be of use, depending on the chosen application. In particular in the case of \(C\) being invertible we arrive at the more known Sherman-Morrison-Woodbury identity as

\[\begin{aligned} (A + BCD)^{-1} &= A^{-1} - A^{-1}B(C(C^{-1} + DA^{-1}B))^{-1}CDA^{-1}\\ &= A^{-1} - A^{-1}B(C^{-1} + DA^{-1}B)^{-1}C^{-1}CDA^{-1}\\ &= A^{-1} - A^{-1}B(C^{-1} + DA^{-1}B)^{-1}DA^{-1}\\ \end{aligned}\]