Adaptive RLS

This installment is more for completeness than for practicality. Do you remember the two opposite but complementary approaches for improving the performance of a dynamic model?

The Kalman approach. Assume that the model is perfect, but unknown random effects appearing in the state variables need to be suppressed.
The adaptive systems approach. Assume that the data are perfect, but unknown persistent errors in the system model parameters need corrections to calculate better predictions.

These two ideas can be applied in tandem, because incremental changes to the system generally apply over a much longer time scale than the moment-by-moment state corrections of Kalman Filters.

Constructing a system model has more in common with the adaptive systems approach. So, while on the topic of the RLS method for performing Linear Least Squares updates, let's take a moment to consider what could happen if this is done in a naive manner.

Influence of constraints in RLS methods

Remember the discussion of the sequential Least Squares updating methods, about how each new constraint added information to the Linear Least Squares constraint matrix? The consequence of this was that the Normal Equations matrix terms grow larger with each new constraint that is added, to the point that eventually new information becomes insignificant compared to all the accumulated history from the past.

Since the original Normal Equations matrix grows larger and larger, the inverse of that matrix must shrink smaller and smaller, to maintain the relationship

   (M^T·M)^-1 · (M^T·M) = I

When this fact is applied to the formula for incremental parameter updates:

Δ p = {({M_{n + 1}}^{T} M_{n + 1})}^{- 1} {m_{n + 1}}^{T} \cdot (y_{n + 1} - m_{n + 1} \cdot p_{n})

\to 0 \cdot {m_{n + 1}}^{T} \cdot (y_{n + 1} - m_{n + 1} \cdot p_{n})

\to 0

Guess what, if parameter adjustments go to zero, things aren't adaptive any more. If you think your model parameters are being adjusted, and they aren't, this could mean anything from a grand waste of time to a total disaster.

Fading memory in the RLS method

For the case of the sequential Least Squares updates, we found that when scaling the previous Normal Equations matrix by a number w == (N-1)/N prior to each update, the result after a very large number of update steps was a matrix exhibiting the equivalent of N units of influence. The influence of each past update decays away exponentially over time, allowing current new data to have relatively higher influence, and allowing parameter adjustments to continue.

Justify to yourself the claim that the corresponding adjustment in the RLS method is to scale the inverse Normal Equation matrix by 1/(w²) prior to each update.

Changes will remain slow. About 5·N updates would be required for persistent changes to become fully influential. But this is still a whole lot better than never.

Summary: RLS updates with fading memory

Notations:

K ↔ number of equivalent update contributions to preserve
w ↔ the derived "forgetting factor weight" (K-1)/K
n ↔ the number of constraints processed
y_n ↔ the new "dependent variable" (observed output) value introduced at step n.
m_n ↔ the row vector of new "independent variable" values introduced during update step n
M_n ↔ the non-square matrix of constraint equations for the Least Squares solution to attempt to satisfy, with one column for each parameter to be determined, and n rows, one for each constraint.
N_n ↔ weighted Normal Equations square matrix M^TM constructed from the Least Squares constraint matrix M after inclusion of n constraints and applying "forgetting factors" during updates.

The following equations summarize the RLS updating scheme after applying the after applying "fading memory" weighting adjustments.

X_{4} = {N_{n}}^{- 1} \cdot (\frac{1}{w^{2}})

X_{1} = X_{4} \cdot {m_{n + 1}}^{T}

X_{2} = m_{n + 1} \cdot X_{4}

X_{3} = m_{n + 1} \cdot X_{1}

{(N_{n + 1})}^{- 1} = X_{4} - X_{1} \cdot {(I + X_{3})}^{- 1} \cdot {X_{2}}^{T}

Δ p = {(N_{n + 1})}^{- 1} \cdot m_{n + 1} \cdot (y_{n + 1} - m_{n + 1} \cdot p)

No explicit numerical example of this is provided. Go ahead and try it for yourself if you wish. The point is that you will know what you need to do if you have special requirements for dynamically changing models. Most of the time, you will not have that, and you won't need this.

Adaptive "Recursive Least Squares" Applications

Influence of constraints in RLS methods

Fading memory in the RLS method

Summary: RLS updates with fading memory

Notations: