Intro RLS

Toward better evaluation efficiency

We have seen how the Normal Equations and Linear Least Squares method for determining best fit model parameter values can be constructed sequentially, by an updating process. However, it was noted that when it comes time to evaluate parameter values, solving the Normal Equations can require some relatively intense calculations. If you need to do this often — for example, in an adaptive real-time scenario where best parameter estimates must be pushed into service as quickly as possible — intensive calculations for the matrix inverse can consume an immense amount of processing. For such a scenario, is there a more efficient strategy?

The truth is, you don't need any of this for most model building purposes. Unfortunately, it is what you are going to see in virtually all the "scholarly literature" elsewhere. So the main point of this diversion is to show how all of those additional and probably unnecessary complications are really the same Linear Least Squares in disguise.

There are two ideas to explore.

Is there any computational advantage to calculating only incremental adjustments to the parameter values at each step, rather than fully recomputing new parameter values?
Is there a way to maintain the Normal Equations matrix in an already-inverted form, so that the numerically expensive inversion operations can be avoided?

Incremental parameter adjustments

Start from the unweighted update formula for the Normal Equations that we developed previously.

({m_{n + 1}}^{T} m_{n + 1} + {M_{n}}^{T} M_{n}) \cdot (p_{n} + Δ p) = ({m_{n + 1}}^{T} \cdot y_{n + 1} + {M_{n}}^{T} \cdot Y_{n})

If we now do a partial algebraic expansion of this, we can get the following equivalent formula in which the effects of the new and the previous terms are separated.

{m_{n + 1}}^{T} m_{n + 1} \cdot p_{n} + {M_{n}}^{T} M_{n} \cdot p_{n} + {M_{n + 1}}^{T} M_{n + 1} \cdot Δ p = {m_{n + 1}}^{T} \cdot y_{n + 1} + {M_{n}}^{T} \cdot Y_{n}

We can observe that the M_n^TM_n·p_n term on the left and the M_n^TY_n terms on the right are known and equal — this is just the Normal Equation conditions as established during the previous update. We can cancel them from each side of the equation.

{m_{n + 1}}^{T} m_{n + 1} \cdot p_{n} + {M_{n + 1}}^{T} M_{n + 1} \cdot Δ p = {m_{n + 1}}^{T} \cdot y_{n + 1}

Collect the two terms with the common m_n+1^T factor on the right side of the equation.

({M_{n + 1}}^{T} M_{n + 1}) \cdot Δ p = {m_{n + 1}}^{T} \cdot (y_{n + 1} - m_{n + 1} \cdot p_{n})

The quantity in parentheses on the right reduces to a scalar value for the case when a single new constraint is added to the Normal Equations system.
You will find expressions of this form in the scholarly papers and textbooks. If you have ever wondered where that "y minus something" term comes from in the various updating formulas, now you know.

The Matrix Inversion Lemma

Before we can study the idea of maintaining the Normal Equation matrix in inverse form, we will need to know about an obscure matrix formula, a special case of the Sherman-Morrison-Woodbury matrix inversion formula, often referred to as "the Matrix Inversion Lemma." This particular variant happens to be useful when working with variance and linear least squares problems.^[1]

The lemma is as follows:

For compatible matrices, with all matrices of full rank and matrix F square and nonsingular, the following relationship holds.

{(F + A B^{- 1} A^{T})}^{- 1} \Leftrightarrow F^{- 1} - F^{- 1} A {(B + A^{T} F^{- 1} A)}^{- 1} A^{T} F^{- 1}

It is not necessary to read through the following proof, but if you do care to trace through the steps, the main thing to notice is that it is just some simple (but careful) algebra.

If one of the expressions above really is the inverse of the other, when they are multiplied together in either order, the product must yield an identity matrix. We will form these product expressions, then follow up with algebraic manipulations to verify that the necessary reduction occurs.

Proof: the left-right product case

(F + A B^{- 1} A^{T}) \cdot (F^{- 1} - F^{- 1} A {(B + A^{T} F^{- 1} A)}^{- 1} A^{T} F^{- 1})

(I + A B^{- 1} A^{T} F^{- 1}) - (F + A B^{- 1} A^{T}) (F^{- 1} A) {(B + A^{T} F^{- 1} A)}^{- 1} (A^{T} F^{- 1})

(I + A B^{- 1} A^{T} F^{- 1}) - (A + A B^{- 1} A^{T} F^{- 1} A) {(B + A^{T} F^{- 1} A)}^{- 1} (A^{T} F^{{- 1}^{T}})

(I + A B^{- 1} A^{T} F^{- 1}) - (A B^{- 1}) (B + A^{T} F^{- 1} A) {(B + A^{T} F^{- 1} A)}^{- 1} (A^{T} F^{- 1})

(I + A B^{- 1} A^{T} F^{- 1}) - (A B^{- 1}) (A^{T} F^{- 1})

I

Proof: the right-left product case

(F^{- 1} - F^{- 1} A {(B + A^{T} F^{- 1} A)}^{- 1} A^{T} F^{- 1}) \cdot (F + A B^{- 1} A^{T})

(I + F^{- 1} A B^{- 1} A^{T}) - (F^{- 1} A {(B + A^{T} F^{- 1} A)}^{- 1}) (A^{T} F^{- 1}) \cdot (F + A B^{- 1} A^{T})

(I + F^{- 1} A B^{- 1} A^{T}) - (F^{- 1} A {(B + A^{T} F^{- 1} A)}^{- 1}) (A^{T} + A^{T} F A B^{- 1} A^{T})

(I + F^{- 1} A B^{- 1} A^{T}) - (F^{- 1} A) ({(B + A^{T} F^{- 1} A)}^{- 1}) (B + A^{T} F A) (B^{- 1} A^{T})

(I + F^{- 1} A B^{- 1} A^{T}) - (F^{- 1} A) (B^{- 1} A^{T})

I

Promise, next time we will find some relevance in this unlikely seeming theorem, and apply it to the Least Squares updating problem. Then we will do some testing to verify that it works.

Footnotes:

[1] See the Wikipedia article for more information and alternate forms of the Sherman Morrison Woodbury matrix inversion formula.

Recursive Least Squares, Part 1

Toward better evaluation efficiency

Incremental parameter adjustments

The Matrix Inversion Lemma

Proof: the left-right product case

Proof: the right-left product case