State Effects

Let's review how we got to this point. We were considering how we might estimate the number of system states from input/output data. To do that, we first needed the input/output data, so we performed a stimulus-response test with a special randomized but non-white driving signal. If there is a linear relationship between the input data and the output data that follows, this will show up in the correlation curve constructed from input/output data. The calculations were massive but relatively primitive: multiply terms together, sum these, and optionally scale by the number of contributing terms to normalize the results. If there was no relationship, the correlation function would be a slightly random flat line hovering around zero.

But that is not what we saw. There were some clear patterns, produced by the system dynamics. What was not clear was how to interpret them.

Review of impulse response sequences

The simplest kind of input stimulus that can provoke a system to produce an output is an impulse. In discrete time systems, an impulse sequence δ_n is a sequence consisting of the value 1.0 at a single sample location n, and zero for all other positions in the sequence. Larger or smaller impulses can be obtained by scaling the impulse sequence by a positive or negative scaling factor (which of course changes only the one nonzero term). Other impulses can occur separately and independently at other positions.

The impulse response of the system is then the output sequence h_k observed when the system is hammered with this special kind of input. Subscript k indicates the number of positions after the impulse where the response effect is observed. Because of linearity, a larger impulse results in the same response shape, only scaled proportionally larger or smaller.

All other input sequences can be decomposed into a sum of impulse sequences, each at a different time-position, each with a different scaling factor. Since linear systems are additive, the output sequence is a sum of the individual impulse responses, shifted according to the time when each impulse arrived, and scaled according to the magnitude of each impulse.

So far, this isn't particularly helpful. But fortunately, linear discrete time systems theory tells us that there are actually very few things that these response patterns can be. ^[1]

An impulse response can be associated with a single state variable, and in this case, the pattern is an exponential shape.
An impulse response can be associated with a complex conjugate pair of state variables, and in this case, the pattern is an exponential times a sinusoidal function at a fixed frequency.

For a general input signal, the complete response curve for the system is then a linear weighted sum of the impulse response curves. There is one curve for each eigenvalue or pair of complex eigenvalues for the state transition matrix. Since the number of states in the model is small, the number of characteristic impulse response patterns you need to look for is small.

Impulse response and correlation

Taking a look at the response of a linear discrete time system to a signal that consists of a sequence of impulses:

The first output term will be the first impulse response term resulting from the immediate input impulse.
The second response term will be the first impulse response term resulting from the newest input impulse, plus a second impulse response term resulting from the previous impulse.
The third response term will be the first impulse response resulting from the third input impulse, plus a second response term resulting from the preceding impulse, plus a third response term resulting from the initial impulse.
And so forth...

Mathematically, this combination can be expressed as a convolution.

y_{n} = \sum_{j} h_{n - j} \cdot u_{j}

where the h sequence is the impulse response sequence associated with one eigenvalue of the system.

Now we want to consider what happens when a system response in this form is subjected to a correlation analysis, in which the system output sequence y_j is compared to the randomized input sequence u_j that produced it.

c o r r_{k} (y, u) \leftrightarrow (1 / N) \sum_{n} \sum_{j} h_{n - j} \cdot u_{j} \cdot u_{n + k}

The notation is a little informal here; each summation means "summing over the full applicable range of the index variable."

We will use of the fact that the correlation and impulse response sequences are "shift invariant." That is, it doesn't make any difference if you shift the indexing on all of the terms together, as long as the number of steps separation between the terms remains the same.

We now perform some algebraic manipulations.

n \equiv j - i

= (1 / N) \sum_{i} \sum_{j} h_{i} \cdot u_{j} \cdot u_{j - i + k}

= \sum_{i} h_{i} (1 / N) \sum_{j} \cdot u_{j} \cdot u_{j - i + k}

c o r r_{k} (u, y) = \sum_{i} h_{i} {c o r r}_{k - i} (u, u)

If the input sequence had been pure white noise, that final term corr(u,u) would have been zero everywhere except when the i and j index terms were equal. Under those conditions, the correlation function would consist of a weighted sum of the impulse response functions. But we do not have that. Our input signal is autocorrelated because of the smoothing filter used. So what we end up with is that the observed correlation curve is a weighted sum of the impulse response functions for all of the states filtered by the autocorrelation characteristic of the band-limiting filter.

If it is really true that the filtering only affects first 20 terms of these response functions, and nothing of any significance is happening there anyway, we can disregard the filtering issues for purposes of counting the number of state response curves we can identify. Thus, as an approximation:

c o r r_{k} (u, y) \approx α \cdot h_{k}

In other words, the cross-correlation sequence looks very much like a scaled version of the impulse response sequence. This seems like a rather unexpected result, since the concepts of impulse response and correlation seem so starkly different.

Impulse response relationship to autocorrelation

A crosscorrelation analysis has a couple of drawbacks. It involves the randomized test signal u as one of the sequences, so it tends to include a lot of residual noise, despite the attempts to limit noise by signal filtering. Also, noise in the input signal tends to disturb the phase of oscillating impulse responses, attenuating the patterns in large data sets.

Another option is to study the autocorrelation sequence of the output. It is less clear how this has anything to do with the impulse response, because the input signal is not directly involved.

I suggest not worrying too much about the following details. The notation is even more casual than usual, and you are forbidden to notice mathematical inconsistencies. The analysis starts by considering pairwise product of output values terms separated by some number of sample positions k. In the autocorrelation analysis, a general product pair term looks like this.

y_{n} \cdot y_{n + k} = (\sum_{i} h_{n - i} \cdot u_{i}) (\sum_{j} h_{n + k - j} \cdot u_{j})

To get the autocorrelation, we then take the average over a large number of terms, allowing the number of terms in the data set N to extend to infinity.

c o r r_{k} (y, y) = (1 / 2 N) \sum_{n} (\sum_{i} h_{n - i} \cdot u_{i}) (\sum_{j} h_{n + k - j} \cdot u_{j})

Now we will do some arcane algebraic manipulations.

= (1 / 2 N) \sum_{n} \sum_{i} \sum_{j} h_{n - i} \cdot h_{n + k - j} u_{i} \cdot u_{j}

Using the same kind of argument as before, the impulse response sequences h change only gradually, so instead of a "spread" of terms, the u_i u_j sequence could be an impulse function and this would make very little difference to the results. If u was in fact an impulse sequence, the only product terms that are not zero would occur when i = j , which would allow simplification.

= (1 / 2 N) \sum_{n} \sum_{i} h_{n - i} \cdot h_{n + k - i} u_{i}^{2}

= (1 / 2 N) \sum_{i} u_{i}^{2} \sum_{n} h_{n - i} \cdot h_{n + k - i}

= v a r (u) \cdot \sum_{n} h_{n - i} \cdot h_{n + k - i}

= v a r (u) \cdot \sum_{n} h_{n} \cdot h_{n + k}

The last term is a convolution, but it is not a correlation in the manner we've usually seen because it is applied to a determinate function. This would seem to have gotten us nowhere, but now we can use the fact that the impulse response patterns are exponential.

Take the simple exponential case first. The impulse response has no terms at negative index numbers, so we can start without loss of generality at term n=0 and see what the product pairs look like for this response pattern.

h_{0} = A e^{- 0 α}

h_{k} = A e^{- k α}

h_{0} h_{k} = A e^{- 0 α} A e^{- k α} = A^{2} e^{- k α}

Now we can advance position k by one step and calculate the next product pair. And the next... and so forth.

h_{1} h_{k + 1} = A e^{- 1 α} A e^{- (k + 1) α} = A^{2} e^{- k α} e^{- 2 α}

h_{2} h_{k + 2} = A e^{- 2 α} A e^{- (k + 2) α} = A^{2} e^{- k α} e^{- 4 α}

We can start to see the pattern here. When we sum over all of the product terms, we see that the terms inside the parentheses form a convergent series, reducing to some constant C.

A^{2} e^{- k α} (1 + e^{- 2 α} + e^{- 4 α} + e^{- 6 α} + \dots)

A^{2} \cdot e^{- k α} \cdot C

And there you go. The approximate autocorrelation sequence has the same exponential form that the impulse response has... but possibly inverted in sign.

To avoid the mathematical grit of showing that the same thing happens with the exponentially-modulated sine wave signals, we will just observe that an exponentially-modulated sine wave is a special case of an exponential function, using complex variables instead of real-valued variables. Once the problem is reformulated that way, the previous result applies.^[2]

Despite all the blowing wind here, I suggest using this as a "back-up plan." Try analyzing the input/output crosscorrelation first; if it remains too noisy, try analyzing the output auto-correlation.

Next time

The important thing to take away from this installment is not the mathematics! In fact, you can forget all of that. The mathematics is horribly weak anyway. All of this was only to justify the claim:

You can look in the correlation sequence (which you know how to calculate in a straightforward manner) for the exponential and exponentially damped sine wave response patterns that are characteristic of a linear system's impulse response.

Correlations are additive — or if you want to take the other point of view, they can be removed one at a time after patterns have been identified. If this process deflates the correlation curve to the point that it looks approximately flat-line (except near zero where the test signal filter messes things up), you have identified the effects of all relevant system states, and therefore you know how many of them you need in your model.

Next installment, we will apply an exploratory process to some test data, looking for the apparent number of state patterns, and see what happens.

Footnotes:

[1] For example, "Linear Systems Theory by João P. Hespanha, 2009, Princeton University Press, ISBN: 9781400831890.

[2] Is this cheating, lying, or cowardice? Check Wikipedia for an article on voodoo mathematics.

Recognizing State Effects on Correlation

Review of impulse response sequences

Impulse response and correlation

Impulse response relationship to autocorrelation

Next time