Efficient "Recursive Least Squares" Computation
Developing Models for Kalman Filters
Last time we began a search for modifications to the Normal Equations updating method that could make recalculation of the parameter values more efficient after each update. We also presented an obscure mathematical theorem, with a promise to explain the relevance.
We obtained the following formula for calculating incremental adjustments to parameter values.
$$\left({{M}_{n+1}}^{T}{M}_{n+1}\right)\xb7\Delta p\phantom{\rule{1em}{0ex}}=\phantom{\rule{1em}{0ex}}{{m}_{n+1}}^{T}\xb7\left({y}_{n+1}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{m}_{n+1}\xb7{p}_{n}\right)\phantom{\rule{1em}{0ex}}$$But so far, a numerically intensive matrix inversion calculations for the parameter updates is still required.
$$\Delta p\phantom{\rule{1em}{0ex}}=\phantom{\rule{1em}{0ex}}{\left({{M}_{n+1}}^{T}\phantom{\rule{0.4em}{0ex}}{M}_{n+1}\right)}^{1}\phantom{\rule{0.4em}{0ex}}{{m}_{n+1}}^{T}\xb7\left({y}_{n+1}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{m}_{n+1}\xb7{p}_{n}\right)\phantom{\rule{1em}{0ex}}$$
Applying the Matrix Inversion Lemma
Restating the special form of the Matrix Inversion Lemma^{[1]} here:
$${\left(F+A\phantom{\rule{0.25em}{0ex}}{B}^{1}{A}^{T}\right)}^{1}\phantom{\rule{1em}{0ex}}\iff \phantom{\rule{1em}{0ex}}{F}^{1}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{F}^{1}\phantom{\rule{0.25em}{0ex}}A\phantom{\rule{0.25em}{0ex}}{\left(B+{A}^{T}\phantom{\rule{0.25em}{0ex}}{F}^{1}\phantom{\rule{0.25em}{0ex}}A\right)}^{1}{A}^{T}{F}^{1}$$To apply this specifically to the problem of updating and solving Normal Equations systems with rank one updates, define the following changes of notation.
$$\phantom{\rule{2em}{0ex}}{F}^{1}\phantom{\rule{1em}{0ex}}\iff \phantom{\rule{1em}{0ex}}{\left({{M}_{n}}^{T}{M}_{n}\right)}^{1}$$ $$B\phantom{\rule{1em}{0ex}}\iff \phantom{\rule{1em}{0ex}}I$$ $$A\phantom{\rule{1em}{0ex}}\iff \phantom{\rule{1em}{0ex}}{{m}_{n+1}}^{T}$$Inserting these expressions systematically into the Matrix Inversion Lemma formula yields the following.
$${\left({M}_{n}{}^{T}{M}_{n}\phantom{\rule{1em}{0ex}}+\phantom{\rule{1em}{0ex}}{{m}_{n+1}}^{T}\phantom{\rule{0.4em}{0ex}}I\phantom{\rule{0.6em}{0ex}}{m}_{n+1}\right)}^{1}\phantom{\rule{1em}{0ex}}=$$ $$\phantom{\rule{1em}{0ex}}{\left({M}_{n}{}^{T}{M}_{n}\right)}^{1}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{\left({M}_{n}{}^{T}{M}_{n}\right)}^{1}{{m}_{n+1}}^{T}{\left(I+{m}_{n+1}{\left({M}_{n}{}^{T}{M}_{n}\right)}^{1}{{m}_{n+1}}^{T}\right)}^{1}{m}_{n+1}{\left({M}_{n}{}^{T}{M}_{n}\right)}^{1}$$ The expression on the left (the first line) is the updated
inverse matrix that we are trying to obtain. On the right side
(the second line), the previouslyconstructed inverse Normal Equations matrix
M_{n}^{T}M_{n}^{1}
is
used to calculate the updated inverse matrix.
There is some initial disappointment in seeing that matrix inversions are still involved. However:
 The
(M_{n}^{T}M_{n})^{1}
matrix terms are known from the previous update step and requires no extra calculations. They carry forward from the previous step.  For a rankone update, the complicated doubleparenthesized sum expression reduces to a scalar value, making its inversion trivial.
The calculations on the right side of the expression start to look much simpler after some additional notation changes.

X1 == (M_{n+1}^{T}M_{n+1})^{1} · (m_{n+1})^{T}

X2 == (m_{n+1}) · (M_{n+1}^{T}M_{n+1})^{1}

X3 == (m_{n+1}) · X1
Substituting these intermediate terms into the expression above:
$$=\phantom{\rule{1em}{0ex}}{\left({M}_{n}{}^{T}{M}_{n}\right)}^{1}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{X}_{1}\phantom{\rule{1em}{0ex}}{\left(I+{X}_{3}\right)}^{1}\phantom{\rule{1em}{0ex}}{X}_{2}$$When the parenthesized sum term is a scalar, this expression can be simplified even further.
$${\left({M}_{n+1}{}^{T}{M}_{n+1}\right)}^{1}=\phantom{\rule{1em}{0ex}}{\left({M}_{n}{}^{T}{M}_{n}\right)}^{1}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\frac{{X}_{1}{X}_{2}}{\left(I+{X}_{3}\right)}$$ That kind of expression is also something that you might have seen
before. It is now clear that the expression on the right is a rankone
update applied directly to the inverse matrix we had previously. In
other words, updating the inverse matrix is almost as easy as updating
the original matrix. There are just a few additional preliminary
calculations for intermediate X1
, X2
,
and X3
expressions prior to applying the update.
The strategy consisting of
 calculation of incremental parameter adjustments
 maintaining the Normal Equations matrix in an inverse form
 rankone updates
in combination, is known as the Recursive Least Squares method (RLS).
Verifying the method
Do you suppose this actually works? Let's give it a try. Suppose that we want to determine the parameters for a second order polynomial model.
y = a·1 + b·x + c·x^{2}
Though this model is not linear, it is linear in the parameters, so
given the values of the powers of x
as the input data, we
can apply the linear least squares methods to determine the best fit for
the observed output values y
. We will let Octave do all of
the hard calculations. Here is the set of input and output values,
consisting of powers of dependent variable x
.
M7 = 1 2 4 1 1 1 1 0 0 1 1 1 1 2 4 1 3 9 1 4 16 Y7 = 1.3800 1.1600 1.4100 2.0800 2.9800 4.4100 6.2400
Now construct the Normal Equations in the direct way for the best fit.
m7tm7 = M7'*M7 m7ty7 = M7'*Y7 m7inv = m7tm7^1 fit7 = m7inv*m7ty7 m7tm7 = 7 7 35 7 35 91 35 91 371 m7ty7 = 19.660 42.310 160.210 m7inv = 0.285714 0.035714 0.035714 0.035714 0.083333 0.023810 0.035714 0.023810 0.011905 fit7 = 1.40643 0.41345 0.19774
Lets do the same things again, but this time with one additional constraint row added to the matrix and right hand side in the first row.
M8 = 1 5 25 1 2 4 1 1 1 1 0 0 1 1 1 1 2 4 1 3 9 1 4 16 Y8 = 8.4900 1.3800 1.1600 1.4100 2.0800 2.9800 4.4100 6.2400 m8tm8 = M8'*M8 m8ty8 = M8'*Y8 m8inv = m8tm8^1 fit8 = m8inv*m8ty8 m8inv = 0.2321429 0.0178571 0.0178571 0.0178571 0.0773810 0.0178571 0.0178571 0.0178571 0.0059524 fit8 = 1.39732 0.41042 0.20077
We can see that after including the additional new constraint, the calculated best fit parameter values are a little different (as expected).
Now lets go back to the original problem and exercise the RLS updating formulas. We know what the inverse matrix was for the original problem, so we will start with that. Introduce the new constraint data. Calculate the intermediate terms, and using these, update the Normal Equations matrix in inverse form.
m8 = [1.0 5.0 25.0] y8 = 8.49 X1 = m7inv * m8' X3 = m8 * X1 divis = (1.0 + X3) newm8 = m7inv  (X1 * X1') / divis X1 = 0.42857 0.14286 0.14286 X3 = 2.4286 divis = 3.4286 newm8 = 0.2321429 0.0178571 0.0178571 0.0178571 0.0773810 0.0178571 0.0178571 0.0178571 0.0059524
A quick inspection shows that this calculation has yielded exactly the same inverted matrix results as the direct method. Next, apply the incremental update equation to determine the updated parameter values.
resid = y8  m8*fit7 delfit8 = newinv8 * m8' * resid newfit8 = fit7 + delfit8 resid = 0.072857 delfit8 = 0.0091071 0.0030357 0.0030357 newfit8 = 1.39732 0.41042 0.20077
Once again, this is exactly the same solution that we would obtain using the direct construction method for the Normal Equations. Thus, we have verified the equivalence of the direct construction and the RLS inverse matrix updating approach.
Startup details
Until you have collected enough updates, the normal equations
matrix (M^{T}M)
is not full rank, and
therefore an inverse matrix does not exist. For the RLS updating,
one way to work around this limitation is to collect a few updates
and invert the matrix explicitly one time to get things started.
Another very popular trick is to fudge the
initial (M^{T}M)
matrix. Instead of starting
with an (M^{T}M)
matrix of zero and trying
to invert it, which will of course fail, start with an
(M^{T}M)
matrix that is not very far from
zero but full rank, and invert that. Something like the following
will do the job.
MTM = eye(7,7) * 1.0e7; MTMinv = MTM^1;
Using this trick, the inverse matrix remains well defined from the beginning, even if the first few parameter estimates generated don't mean much. This introduces a deliberate small numerical error, but so small that it is overwhelmed by numerical rounding errors or residual random noise disturbances coming in from raw input/output data.
Coming next
The ability of the RLS formulation to incorporate new information efficiently, as it becomes available, explains why this method is appealing for realtime adaptive systems. Model adjustments can go into effect as soon as possible. If you have followed this series closely, you already know why this idea could lead to disaster. Next time, we will examine why this is so, and what can be done to set things right.
Footnotes:
[1] See the previous article in this series.