# Recognizing State Effects on Correlation

Developing Models for Kalman Filters

Let's review how we got to this point. We were considering how we might estimate the number of system states from input/output data. To do that, we first needed the input/output data, so we performed a stimulus-response test with a special randomized but non-white driving signal. If there is a linear relationship between the input data and the output data that follows, this will show up in the correlation curve constructed from input/output data. The calculations were massive but relatively primitive: multiply terms together, sum these, and optionally scale by the number of contributing terms to normalize the results. If there was no relationship, the correlation function would be a slightly random flat line hovering around zero.

But that is not what we saw. There were some clear patterns, produced by the system dynamics. What was not clear was how to interpret them.

## Review of impulse response sequences

The simplest kind of input stimulus that can provoke a system
to produce an output is an *impulse.* In discrete time
systems, an impulse sequence ` δ`

is
a sequence consisting of the value 1.0 at a single sample location
_{n} `n`

, and zero for all other positions in the sequence.
Larger or smaller impulses can be obtained by scaling the impulse
sequence by a positive or negative scaling factor
(which of course changes only the one nonzero term). Other impulses
can occur separately and independently at other positions.

The impulse response of the system is then the output sequence
`h`

observed when the system is hammered with
this special kind of input. Subscript _{k}`k`

indicates
the number of positions after the impulse where the response effect
is observed. Because of linearity, a larger impulse results in the
same response shape, only scaled proportionally larger or smaller.

All other input sequences can be decomposed into a sum of impulse sequences, each at a different time-position, each with a different scaling factor. Since linear systems are additive, the output sequence is a sum of the individual impulse responses, shifted according to the time when each impulse arrived, and scaled according to the magnitude of each impulse.

So far, this isn't particularly helpful. But fortunately,
linear discrete time systems theory tells us that there are
actually **very few things** that these response patterns
can be. ^{[1]}

- An impulse response can be associated with a single state variable, and in this case, the pattern is an exponential shape.
- An impulse response can be associated with a complex conjugate pair of state variables, and in this case, the pattern is an exponential times a sinusoidal function at a fixed frequency.

For a general input signal, the complete response curve for the system is then a linear weighted sum of the impulse response curves. There is one curve for each eigenvalue or pair of complex eigenvalues for the state transition matrix. Since the number of states in the model is small, the number of characteristic impulse response patterns you need to look for is small.

## Impulse response and correlation

Taking a look at the response of a linear discrete time system to a signal that consists of a sequence of impulses:

- The first output term will be the first impulse response term resulting from the immediate input impulse.
- The second response term will be the first impulse response term resulting from the newest input impulse, plus a second impulse response term resulting from the previous impulse.
- The third response term will be the first impulse response resulting from the third input impulse, plus a second response term resulting from the preceding impulse, plus a third response term resulting from the initial impulse.
- And so forth...

Mathematically, this combination can be expressed as a
* convolution.*

where the `h`

sequence is the impulse response
sequence associated with one eigenvalue of the system.

Now we want to consider what happens when a system response
in this form is subjected to a correlation analysis, in which
the system output sequence `y`

is compared
to the randomized input sequence _{j}`u`

that
produced it. _{j}

The notation is a little informal here; each summation means "summing over the full applicable range of the index variable."

We will use of the fact that the correlation and impulse response sequences are "shift invariant." That is, it doesn't make any difference if you shift the indexing on all of the terms together, as long as the number of steps separation between the terms remains the same.

We now perform some algebraic manipulations.

$$\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}n\phantom{\rule{1em}{0ex}}\equiv \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}j\phantom{\rule{1em}{0ex}}-\phantom{\rule{1em}{0ex}}i$$$$=\phantom{\rule{1em}{0ex}}\left(1/N\right)\phantom{\rule{1em}{0ex}}\sum _{i}\sum _{j}{h}_{i}\xb7{u}_{j}\xb7{u}_{j-i+k}$$ $$=\phantom{\rule{1em}{0ex}}\sum _{i}{h}_{i}\phantom{\rule{1em}{0ex}}\left(1/N\right)\phantom{\rule{1em}{0ex}}\sum _{j}\xb7{u}_{j}\xb7{u}_{j-i+k}$$ $$\phantom{\rule{1em}{0ex}}cor{r}_{k}\left(u,y\right)\phantom{\rule{1em}{0ex}}=\phantom{\rule{1em}{0ex}}\sum _{i}{h}_{i}\phantom{\rule{1em}{0ex}}{corr}_{k-i}\left(u,u\right)$$

If the input sequence had been pure white noise, that final
term `corr(u,u)`

would have been zero everywhere
except when the `i`

and `j`

index terms
were equal. Under those conditions, * the correlation function
would consist of a weighted sum of the impulse response functions.
* But we do not have that. Our input signal is * autocorrelated*
because of the smoothing filter used. So what we end up with is
that the observed correlation curve is a weighted sum of the
impulse response functions for all of the states *filtered*
by the autocorrelation characteristic of the band-limiting filter.

If it is really true that the filtering only affects first 20 terms of these response functions, and nothing of any significance is happening there anyway, we can disregard the filtering issues for purposes of counting the number of state response curves we can identify. Thus, as an approximation:

$$\phantom{\rule{1em}{0ex}}cor{r}_{k}\left(u,y\right)\phantom{\rule{1em}{0ex}}\approx \phantom{\rule{1em}{0ex}}\alpha \xb7{h}_{k}$$In other words, the cross-correlation sequence looks very much like a scaled version of the impulse response sequence. This seems like a rather unexpected result, since the concepts of impulse response and correlation seem so starkly different.

## Impulse response relationship to autocorrelation

A crosscorrelation analysis has a couple of drawbacks.
It involves the randomized test signal `u`

as one of the
sequences, so it tends to include a lot of residual noise, despite the
attempts to limit noise by signal filtering. Also, noise in
the input signal tends to disturb the phase of oscillating impulse
responses, attenuating the patterns in large data sets.

Another option is to study the autocorrelation sequence of the output. It is less clear how this has anything to do with the impulse response, because the input signal is not directly involved.

I suggest not worrying too much about the following details.
The notation is even more casual than usual, and you are forbidden
to notice mathematical inconsistencies. The analysis starts by
considering pairwise product of output values terms separated by
some number of sample positions `k`

. In the autocorrelation
analysis, a general product pair term looks like this.

To get the autocorrelation, we then take the average over a
large number of terms, allowing the number of terms in the data set
`N`

to extend to infinity.

Now we will do some arcane algebraic manipulations.

$$\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}=\phantom{\rule{1em}{0ex}}\left(1/2N\right)\phantom{\rule{1em}{0ex}}\sum _{n}\sum _{i}\sum _{j}\phantom{\rule{1em}{0ex}}{h}_{n-i}\xb7{h}_{n+k-j}{u}_{i}\xb7{u}_{j}$$ Using the same kind of argument as before, the impulse response
sequences `h`

change only gradually, so instead of
a "spread" of terms, the `u`

sequence could be an impulse function and this would make very
little difference to the results. If _{i} u_{j}` u `

was in fact
an impulse sequence, the only product terms that are not zero would
occur when ` i = j `

, which would allow simplification.

The last term is a convolution, but it is not a correlation in the manner we've usually seen because it is applied to a determinate function. This would seem to have gotten us nowhere, but now we can use the fact that the impulse response patterns are exponential.

Take the simple exponential case first. The impulse response
has no terms at negative index numbers, so we can start without
loss of generality at term `n=0`

and see what the
product pairs look like for this response pattern.

Now we can advance position `k`

by one step and
calculate the next product pair. And the next... and so forth.

We can start to see the pattern here. When we sum over all of
the product terms, we see that the terms inside the parentheses
form a convergent series, reducing to some constant `C`

.

And there you go. The approximate autocorrelation sequence has the same exponential form that the impulse response has... but possibly inverted in sign.

To avoid the mathematical grit of showing that the same thing
happens with the exponentially-modulated sine wave signals, we
will just observe that an exponentially-modulated sine wave is a
special case of an exponential function, using complex variables
instead of real-valued variables. Once the problem is
reformulated that way, the previous result applies.^{[2]}

Despite all the blowing wind here, I suggest using this as a "back-up plan." Try analyzing the input/output crosscorrelation first; if it remains too noisy, try analyzing the output auto-correlation.

## Next time

The important thing to take away from this installment
is **not** the mathematics! In fact, you can forget all of
that. The mathematics is horribly weak anyway. All of this was
only to justify the claim:

**You can look in the correlation sequence
(which you know how to calculate in a straightforward manner)
for the exponential and exponentially damped sine wave response
patterns that are characteristic of a linear system's impulse
response. **

Correlations are additive — or if you want to take the other point of view, they can be removed one at a time after patterns have been identified. If this process deflates the correlation curve to the point that it looks approximately flat-line (except near zero where the test signal filter messes things up), you have identified the effects of all relevant system states, and therefore you know how many of them you need in your model.

Next installment, we will apply an exploratory process to some test data, looking for the apparent number of state patterns, and see what happens.