Updating and Sequential Least Squares
Developing Models for Kalman Filters
In previous installments, we covered the Linear Least Squares method for determining parameters of a linear model, starting with a simple input/output model, and working towards a dynamic state transition model. For the dynamic model, the model predicts output values based on a given set of current input values and the current internal state variables. The predictions are calculated as a sum of the input values times constant coefficient values.
Those coefficients are the model parameters. Getting good
estimates of multiple parameter values from noisy data can
require collecting thousands (tens of thousands, hundreds of thousands)
of input/output data combinations.
That means an independent data matrix M
with thousands of rows.
Sometimes it is no big deal: bulky or not, you just grind through the
data once and you are done. But if you have to do this crunching a lot,
it means reserving quite a lot of system resources. First, you need plenty
of memory just to retain all the data. Then, at the finish, there is a
long delay for all of the CPU cycles needed to perform the calculations.
Is there some way to utilize the data "on the fly" as collected, rather
than accumulating everything and then concentrating all the processing
at the end?
Introducing rankone updates
Suppose that n
items of data were accumulated to
build the input data matrix M_{n}
of independent ("input")
values. The linear model
uses the "multipliers" in the parameter vector p_{n}
to
"predict" the corresponding output values Y_{n}
. If the
model were perfect, and if the data were perfect, the following conditions would
all be satisfied exactly.
However, in the real world these conditions can't all be satisfied perfectly. As we saw in preceding installments, we can solve the Normal Equations to obtain an approximate solution that is "best" according to the least sum of squared errors criterion.
$${{M}_{n}}^{T}{M}_{n}\xb7{p}_{n}\phantom{\rule{1em}{0ex}}=\phantom{\rule{1em}{0ex}}{{M}_{n}}^{T}\xb7{Y}_{n}$$In most nonpathological cases, this "necessary condition" yields a unique solution that is the bestfit solution we seek.
Now suppose that this is only a partial solution; the solution we
actually want includes all of the previous constraint data, plus one
more input/output relationship. We can modify the original problem by including
the data terms for the new constraint as the new top row of the data
matrix. All of the rows previously present are still there,
covered by the M_{n}
matrix terms. Similar for the
corresponding output values, a new output term y_{n+1}
is added to the top of the output data vector, with the previous
output terms Y_{n}
still present below.
The new Normal Equations formulation to solve for this adjusted best fit problem is:
$${{M}_{n+1}}^{T}{M}_{n+1}\xb7{p}_{n+1}\phantom{\rule{1em}{0ex}}=\phantom{\rule{1em}{0ex}}{{M}_{n+1}}^{T}\xb7{Y}_{n+1}$$ In this solution, we can expect the new information to result in
a new solution p_{n+1}
differing from the previous best
estimate by the small amount Δp
.
If we substitute the notations from above, we can see the roles that previous data and new data play in producing the updated solution.
$$\left({{m}_{n+1}}^{T}{m}_{n+1}+{{M}_{n}}^{T}{M}_{n}\right)\xb7\left({p}_{n}+\Delta p\right)\phantom{\rule{0.5em}{0ex}}=\phantom{\rule{0.5em}{0ex}}\left({{m}_{n+1}}^{T}\xb7{y}_{n+1}+{{M}_{n}}^{T}\xb7{Y}_{n}\right)$$This provides a scheme to construct the Normal Equations incrementally.
 Let
k
be the number of parameters to be evaluated. Start by constructing thek x k
matrixM_{0}^{T}M_{0}
and ak
length vectorM_{0}^{T}Y_{0}
with all matrix and vector terms set to 0.  Begin adding constraint rows. For the first input/output constraint condition
to be added to the system, construct from the row of independent data values
the peculiar matrix product
m_{1}^{T}m_{1}
that forms a square matrix of sizek x k
but rank 1. This matrix is summed into the previously constructed matrix storage,M_{0}
, to form the updated matrixM_{1}
. On the other side of the equation system under construction, the new data vector is transposed into a column vector formm_{1}^{T}
, scaled by the new output valuey_{1}
, and this is summed with the previous rightside accumulationY_{0}
to form the updated vectorY_{1}
.  With the next row of independent values and corresponding output
value, the
M_{1}
matrix is updated to formM_{2}
, and theY_{1}
vector is updated to formY_{2}
.  Repeat the same way until the entire input output data set is processed.
 The value of the modified
p
parameter vector? No need to worry about this until the time comes to compute it  which can be after any update.
Once all of the updates are done, the Normal Equations are all set up and ready to solve.
Some observations

The point of this scheme is that it is no longer necessary to accumulate a huge collection of inputoutput data. All you need to carry forward from one step to the next is one
k x k
matrix and one vector of sizek
. This processing is neither more nor less efficient than the calculations you would perform to set up the Normal Equations solution in the usual manner. All this does is spread the work more evenly over time.
The rankone updates (and similar lowerrank updates) are not iterative approximations. You will get exactly the same result from building up normal equations in this sequential manner or from one large monolithic matrix construction.
You probably should not discard all of the original input/output data after updating the Normal Equations. You will need some of that data to verify your fitting results.
Some extensions

Instead of a single new constraint with single output value added at each update, the updating process is equally valid when groups of updates are accumulated and incorporated together as one batch mode update.
The method is easily adapted for the case of more than one output variables. Go ahead and try this!
For next time
When constructing the Normal Equations one constraint row at a time, the single most timeconsuming operation is the matrix inverse you must calculate to evaluate parameters. This is not a big deal if done rarely. But if you need updated parameter values frequently (in the worst case, as soon as possible after each new update), some modifications to be discussed later can streamline the calculations.