# Quantifying Variation

Developing Models for Kalman Filters

Through the series on State Observers, we dealt with
hidden random variations in state variables in an entirely qualitative
manner. Observer designs were chosen so that the tracking of
system outputs *looked* good. Whatever that means! To some
extent, you don't care what it means. You do the cut-and-try analysis,
you get an observer that works *well enough* and if the
results are well within the tolerances you need, you are
allowed to stop right there.

However, that situation is not completely satisfying. How can
you be sure that you didn't miss something really important? Is there
a way to bypass the *cut-and-try* experimentation, and
go directly to a *best observer design* (whatever that means)?
The objectives of the observer design are to make it
insensitive to noise (whatever that means) while producing the best
tracking of the real system state (ditto). In general, these
goals are incompatible.

To quantify all of this, we need a means to represent and
analyze variability. In classic Kalman filtering, the tool for doing
this is *variance*.

## Review of variance and covariance

For model building, we used a *correlation function* to
identify linear relationships between two data sequences for a
specified time separation. The sequences under study were typically
input data and output data after a specified delay. Correlation was
estimated by multiplying terms from each sequence, pairwise, and
averaging the product terms to estimate the statistical expected
value.

Given a vector of variables, each of which is zero mean and
random, we can select any two of these variables and perform the
same kind of correlation analysis on them. For Kalman filters, noise
sources are presumed to be *white*; that is, there is no
non-zero correlation with any other term for time shifts other
than zero. Consequently, attention can be restricted to zero
time shift. Selecting the variable pairs systematically and repeating
the analysis, the results can be collected
into a matrix, with rows/columns of the matrix indicating the first/second variable selected for analysis. This matrix is called a *covariance
matrix*. We can observe that the termwise products are commutative,
and this results in a covariance matrix that is *symmetric*
with respect to the main diagonal.^{[1]}

Like correlation, if the resulting values in the covariance matrix are distinctly positive, or distinctively negative, this indicates that there is a linear relationship between the variables at each instant of time. As a special case, if the covariance is calculated for one of the terms and itself, every intermediate product term is positive. Consequently, the main diagonal terms of the covariance matrix dominate and are always positive. If two random terms are statistically independent, the corresponding covariance matrix terms are zero.

A widely-adopted practice in Kalman Filter applications is to ignore
off-diagonal terms in covariance matrices^{[2]}.

## Important properties of variance

### Covariance calculations and rank-one updates

Observe that for column vector `x`

the rank one matrix
` x x`

contains the pairwise products
of every term with every other term at a given time instant.
Averaging these rank-one matrices over a long sequence provides
an alternate formulation for estimating the covariance matrix. ^{T}

### Covariance and expected values

As the number of terms used to estimate variance increases
toward infinity, the estimates converge to the statistical
expected value, indicated by notation `E( · )`

.

### Covariance and constant vectors

For constant vector terms, the *expected values*
are the same as the averages and the same as the values. For any
constant vector `x`

, its covariance is
` x x`

.^{T}

### Vector addition and covariance

Variances are *additive*. If you have a random vector
`x1`

with variance matrix `V1`

, and another
random vector `x2`

with variance matrix `V2`

,
the covariance for the sum vector `x1 + x2`

is
is the matrix `V1 + V2`

.

### Covariances under transformations

Suppose that we know that matrix `V`

is
a covariance matrix characterizing the relationships between
the variables in a vector `x`

. Suppose that
we then apply matrix `M`

to transform vector `x`

into some new vector `q = M x `

.

Using the rank one scheme for computing the new variance, we can determine that the covariance of the transformed vector is

cov( q ) = cov( M x ) = E ( [M x] [M x]^{T}) = M E(x x^{T}) M^{T}= M V M^{T}

## Covariance and dynamic system noise

When we first introduced the dynamic state transition models for
linear systems, we reserved some notations ` w `

and
` v `

to represent random effects. For the
`N`

state variables in the dynamic model, there are
`N`

terms in the random noise vector
` w `

to represent disturbances that directly affect
next-state values. These random influences can be characterized
by covariance matrix `Q`

. For the `M`

output
variables in the observation equation, there will be
`M`

corresponding terms in the random noise vector
` v `

to represent disturbances that occur during the
process of observing the output values. These random influences
can be characterized by covariance matrix `V`

.

Since the dynamic equations are linear, the effects on state
due to inputs and the effects on state due to randomness can be
separated, at least for purposes of analysis. The subsequent
cumulative effects on the state variables are also random, and
these random effects too can be separated from the driven
response effects, and described by a reserved
covariance matrix, `P`

. The difference is that this
particular covariance, being an unobservable property of hidden
variables, is extraordinarily difficult to pin down. Also, because
random effects are propagated by state transition matrix operations,
the random component persists over time, and this *state noise* is
*not white*.

## Coming up next

We need to follow up on the basic ideas in this installment and examine what they mean for random influences included in the dynamic state transition equations.

[1] Statisticians will be appalled at this tail-wags-the-dog way of describing variance. Variance is considered a fundamental property of statistical distributions, characterizing the "spread" of the distribution, while correlation is something that engineers use to describe a time series. But at the end of the day, they are both just averages of product-terms.

[2] This practice is known as
"**CR**eative **A**ssignment of **P**arameters. The
justification usually given is... well, nonexistent. And
vexing problems can result, as we will see.