# One-Sided Derivative Estimation

## Convolution formulas

In case you missed it, there is still time to go back and scan through the series on estimating derivatives from data on this site, so that you have the full background to follow the presentation here.

## One sided evaluation scenarios

Since the "least squares" design strategy produced some very useful derivative estimators, perhaps the success can be extended to a related problem.

Previously considered was the case where data are available in a balanced manner surrounding the point where the value of the function's derivative is to be estimated. Estimator formulas were based on approximations that use smooth and simple function models. The analytical derivative based on the function model is then the estimated derivative for the noisy data.

An important but difficult case is when the estimation process runs in real time, updated with each new function value that becomes available in a time sequence. The immediately available data consist of the new value and any "past history" values temporarily held in memory. In order to apply a balanced estimator formula, it is necessary to collect a number of values "on the future side" of the evaluation point, and buffer these in memory as well. The delay needed to collect the additional values results in a corresponding delay in delivering the derivative value estimates. That delay is unacceptable in situations where an updated result is needed "as soon as possible."

## Why one-sided estimators are harder

Derivative estimation formulas for a point centrally
located in a range of data have some very useful properties.
A derivative inherently has a *negative * or *odd*
symmetry; if a slope is "uphill" looking in
one direction, then it is necessarily "downhill" looking in the
opposite direction. Given a balanced data set, the estimator formula
can exhibits the same kind of odd symmetry. That dramatically
simplifies the determination of a suitable formula, because half
of the formula coefficients arise spontaneously from symmetry.

Symmetric formulas also have the special advantage that they produce derivative value estimates that converge naturally and asymptotically to zero when applied to:

- any constant function
- any functions consisting of even-order polynomial terms
- any functions consisting of only cosine waveform terms

When data are not available in a balanced manner, an estimator must force similar properties, while still trying to match the theoretically correct derivative response and also maintaining reasonable noise rejection. This is a much more difficult problem. As a result, the one-sided estimators tend to be less efficient and yield poorer accuracy.

## A Least-Squares one sided estimator

Suppose we have a data sequence that results from a band-limited
process, and we want to estimate the derivative at "position zero."
That sort of sequence can be locally decomposed into a sum of sine
and cosine (including constant) wave components. The frequency band of
relevance is the range from 0.0 to 0.2·π. Analytically, we know
that the derivative of function `sine(wt)`

is
`w·cosine(wt)`

. Also we know that the derivative
of function `cosine(wt)`

is zero at location zero for any
frequency. These two conditions should be satisfied at any frequency
we might randomly pick from the range 0.0 to 0.2·π. In addition,
as discussed, there must be constraints that rigorously force the
derivative estimate to be zero when frequency `w`

is zero
— more-or-less zero is not good enough.

Thus, we can arbitrarily pick a frequency, and generate
corresponding one-sided sine wave and cosine wave data for that
frequency. For each such set of *independent data values*,
the output *dependent data* values need to satisfy the
constraints that:

- if
`w`

is above 0.2·π, the response to the sine wave data should be zero. - if
`w`

is above 0.2·π, the response to the cosine wave data should be zero. - if
`w`

is under 0.2·π, the response to the sine wave data should be the analytical derivative value for a sine wave,`w`

. - if
`w`

is under 0.2·π, the response to the cosine wave data should be zero.

However, a limited number of model coefficients cannot be expected
to satisfy the very large number of constraints exactly. Collecting the
data for the independent values into a large matrix, and collecting the
dependent (desired output) values into a large vector, a
linear least squares
*Normal Equations* system^{[2]}
can be constructed. Solving the equation system produces coefficients for
the estimator formula. The conditioning on this problem is very poor,
which means that getting any solution is tricky, and when you do
get one, it is not very good.

Full details of the estimator design are not provided here because, well, it isn't worth the effort given the poor results. The following is representative of what you might get.

Coefficients: (13 terms, newest to oldest) -0.232434 -0.142361 -0.032711 0.108363 0.183252 0.178972 0.127357 0.023053 -0.068244 -0.118948 -0.078936 -0.010887 0.063524

Here is a plot of the estimator frequency response. For sine waves, the estimates (in blue) should track the ideal curve (in green). The response to cosine waves (in red) should be zero.

** Frequency response of one-sided derivative estimator**

You can see that this has lots of problems.

- Tracking of the desired response curve is not asymptotic and not very accurate near frequency zero.
- Accuracy is acceptable but not great in the first 5% of Nyquist frequency (40 points per wave cycle).
- Beyond 5% of the Nyquist frequency, excessive response levels and large phase shifts start to cause features to appear in the wrong locations.
- Response to noise and to actual derivative effects in the middle band area is excessive.
- Maybe it isn't all bad news, the rejection of high-frequency noise is reasonable.

There might be some circumstances in which this is the best you can do, and a poor estimate is better than no estimate. At least you need to understand the hazards if you should try using this.

## Possibilities for improvement?

The lowest-order *Central Difference Method* derivative estimator
used two data values, at points that bracket the point of evaluation,
to produce its primitive derivative estimate. This motivates
a possible "cheat" in which a delay of just one additional time period is
allowed, so that one additional "point in the future" is buffered and
made available to the estimator, at the cost of one extra unit of delay
to deliver the corresponding result.

Here is a typical result.

Coefficients: (11 terms, newest to oldest) -0.3088999 -0.0840958 0.0739282 0.1640762 0.1527848 0.1094955 0.0075387 -0.0400537 -0.0967387 -0.0389184 0.0608832

** Frequency response of derivative estimator using one "future value"**

Comparing to the true one-sided filter:

- Pretty good accuracy from 0.0 to 10% of the Nyquist limit.
- Almost no phase distortion from 0.0 to 10% of the Nyquist limit.
- There is some information with impaired accuracy above 10% of the Nyquist limit.
- Excess response still occurs in the middle frequency band, beyond 20% of the Nyquist level.
- Slightly better efficiency.
- Somewhat poor noise attenuation (though still a big improvement over balanced Central Differences estimators).

In general, the performance is a lot better, if you can tolerate that extra step of delay in getting a results.

## Doubling down on the cheat

The one-value lookahead offered some clear benefits. If a delay of one time interval is tolerable, what about a delay of two time intervals? The second extra buffered value and delay should give the almost one-sided formula an accuracy that compares well to a third-order Central Differences derivative estimator, but with greatly improved noise response.

Here is a typical design.

** Derivative estimator using past data and two "future values"**

Coefficients: (10 terms, newest to oldest) -0.185962 -0.081104 -0.015378 0.063227 0.109721 0.092016 0.061069 0.014893 -0.027744 -0.030738

Compared to the previous one-sided estimators:

- Reasonably good accuracy to about 17% of Nyquist limit.
- Almost no phase distortion through to about 15% of the Nyquist limit.
- Very reasonable bounds on noise response in the middle band.
- Good but not great noise rejection in the high frequency bands.

** This is an entirely usable estimator, if you can stand the two
periods delay.**

## Alternative: "stateful" processing

A distinct weakness in all of the one-sided estimators is noise suppression in the high-frequency band. The estimator has so much to do when trying to maintain accuracy and zero phase shift that it can't do very much about high frequency noise attenuation. It would help if some kind of additional filtering could reduce the high-frequency noise problems, so that the derivative estimator can place more emphasis on the derivative estimation part of the problem.

This strategy worked pretty well for the case of balanced derivative estimates,^{[1]} but computationally not
very efficient. In the next section, we will discuss why this idea does not
extend well to one-sided estimates.

Footnotes:

[1] Find the index to the other articles in this series here. (Or see the Index Link at the top of this page.)

[2] Articles in another series on this site discuss Linear Least Squares methods for fitting models to data. The emphasis there is on fitting dynamic models, not functional models, but the basic techniques are the same.