# Least Squares Derivative Estimator

Numerical Methods for Derivative Estimation

In previous sections, we found that it was feasible to produce elegant derivative estimators that combine the features of a maxflat filter to exclude high frequency noise, and also capabilities like the central differences method for estimating derivative values. While these designs have excellent performance at the frequency extremes, they compromise the accuracy and noise attenuation in the middle bands. Promising but insufficient designs based on least-squares best fit of polynomial curves produced generally better accuracy across the lower frequency range, but lacked control of noise rejection properties at higher frequencies.

In this section, we will consider a completely different approach. Instead of designing with polynomials and evaluating the frequency response, we will design the filter directly according to the desired frequency response,[1] again using "least squares best fit" methods. The resulting estimators offer a more balanced tradeoff between accuracy of the estimates and noise rejection.

## The least squares strategy

We have seen previously that the frequency response imposed by a derivative filter will have the following properties.

• Odd symmetry. When applied to a function with even symmetry like a cosine wave, any smooth function will have a derivative of zero at frequency zero. This can be achieved when the curves used in the approximation formula have odd symmetry, and naturally pass through zero. Instead of polynomials, sine waves are used.
• For sine and cosine wave functions, the magnitude of the derivative of these functions at frequency `w` is equal to `w`.
• The filter must attempt to match this linear frequency-proportional response characteristic for frequencies from 0 through ` 0.2 π`.
• Responses for frequencies beyond ` 0.2 π` through to ` π` should ideally be zero, so that high frequency noise is attenuated.
• The desired frequency response can be specified by a vector with a large number `M` of points spanning the representable frequency range from ` 0.0` to ` π`. Only the first `0.2 * M` of these `M` terms are nonzero.
• We can try to match this response shape using a sum of a limited number `N` of sine waves. Using more sine waves gives possibilities for a more accurate fit with better noise rejection, but at a cost in computational effort when the estimator is applied.
• Only positive frequencies need to be analyzed. A corresponding fit for "negative frequencies" is implied automatically by the selection of sine waves for fitting.
• For each of the frequencies used for the approximation, the values along that waveform are evaluated at all M of the frequencies from 0 to π.
• A perfect fit would produce `N` parameter values that, when multiplied with the `M` sets of expanded values of the full waveforms, would yield the `M` desired ideal derivative response values in `V` . In general, it is not possible to satisfy a relatively large number `M` constraints exactly using a relatively small number `N` of adjustable parameters. Which means, the best that can be done is "almost" solve them, "as close as possible."

## Obtaining a "least squares best fit"

The expansions of the fitting waveforms are placed into column vectors of length `M`, and the `N` column vectors are then collected to form an `M x N` matrix `MMat`.

The `M` desired response gain values are collected into column vector `V`.

The values of the filter coefficients to be determined constitute an unknown vector `F` with `N` terms. The linear system to be solved is then

``` (LLS constraint equation)
MMat · F ≈ V ```

An exact solution for not available, but the classic Normal Equations can be applied to determine the necessary conditions for "least squares best fit values" of the parameters (the solution that minimizes the sum of the fitting errors squared).

``` (LLS normal equation)
F = (MMatT MMat)-1 (MMatT)  · V
```

In a well-posed problem, the best-fit solution exists and is unique.

Let's try this — call it a "naive least squares fit" — and see what happens. Here is a typical plot of the kind of filter characteristic produced.

There is no need to provide the coefficients of this particular design, since it is deficient in many ways:

• Its has poor asymptotic accuracy near zero.
• Its accuracy falls off seriously, well before reaching the upper end of the "20% of Nyquist" band.
• It is a lot bigger, hence inefficient.

However, it is worth observing that the noise rejection is far better than anything we have yet seen. Perhaps we can retain some of this behavior, and mitigate the poor fit in the lower frequencies.

## Weighting in least squares problems

Suppose that you arbitrarily take one of the `M` rows in the constraint equation system, and scale every term on both the left and right hand side of that equation by an arbitrary non-negative number `q`. This doesn't affect the validity of the constraint, since multiplying both sides of an equality expression with the same number yields an equivalent equality expression. However, the weighting can have a big effect on the least squares fitting problem. In the extreme case, weighting factor `q = 0` will render the constraint equation moot, because "zero times anything is still zero" and there is never any error in fitting this constraint exactly. For small but nonzero values of `q`, the constraints have some influence, but much larger matching errors are tolerated.

The most serious problems with the naive fit is that it expends too much effort trying to accurately fit that severe discontinuity at 20% of the Nyquist frequency. This particular shape is not important. So, let's apply a weighting factor of zero to the band from 20% to 40% of the Nyquist range. In effect, this says "I don't care where the curve goes in this region, as long as it ends up low and doesn't act too wacky." Let's make this change and see what happens.

This result is rather stunning. But keep in mind that this 21-term filter requires significant computational effort. Let's call this the overkill least squares derivative estimator.

```   -2.415514e-03  -3.498196e-03    3.733505e-04    8.819803e-03    1.416449e-02
5.951338e-03  -1.850546e-02   -4.727302e-02   -5.953525e-02   -4.163835e-02
0.0
4.163835e-02   5.953525e-02    4.727302e-02    1.850546e-02   -5.951338e-03
-1.416449e-02  -8.819803e-03   -3.733505e-04    3.498196e-03    2.415514e-03
```

Maybe you don't need that much accuracy, or that much of a low-frequency band width, or quite that much noise attenuation. Shorter and computationally more efficient filters can be obtained, if you are willing to compromise a little on the other performance characteristics. Here are some adjustment options.

• The "20% of Nyquist" band is not absolute. Small deviations can be tolerated in that band length.
• It is probably sufficient to allow the noise in the upper bands to be just small enough by applying smaller weighting factors there, allowing tighter control at other frequencies.
• Allow a slightly wider transition zone by setting a few more weighting factors to zero between 20% and 50% of the Nyquist limit.

Here are the coefficients for a 13-term design that I like.

```    0.021028   0.018163  -0.025149  -0.089145  -0.124062  -0.090751   0.0
0.090751   0.124062   0.089145   0.025149  -0.018163  -0.021028
```

Here is the response of this particular least squares design, comparing it to the best robust maxflat design I could find.

Points to notice:

• This least-squares estimator only uses 4 more terms than the maxflat estimator. (This is not intended to be an "unfair" comparison. A shorter maxflat estimator gives much worse noise attenuation, while a longer maxflat estimator has worse low band accuracy.)
• The response of the least-squares estimator is allowed to degrade a little at the 20% of Nyquist limit band edge, to keep midband gain lower.
• The least-squares estimator extends good accuracy to about 15% of the Nyquist limit, while the comparable range for the maxflat design is about 5% of Nyquist limit.
• The LS estimator is better at resolving and attenuating noise in middle frequencies between 40% and 60% of the Nyquist limit.
• The maxflat designs have better asymptotic accuracy near frequency zero than least squares designs, even though this is not apparent from plot. That means better performance for estimating very small derivative values, but only in a low-noise scenario.
• The LS estimator is less aggressive for attenuating noise in the extreme high frequencies, in exchange for the other benefits.

As a final comparison, let's go back to the "noisy data set" example used in previous sections of this series. If you go back and compare to all of the previous results, it will become apparent just how much benefit the least squares estimator provides.

Finally, if you know that your derivative is relatively small, and the 20% of Nyquist limit bandwidth is too wide for your needs, you can get smoother estimates with good accuracy through about 8% of the Nyquist limit using the following estimator.

```  Nominal "10% of Nyquist" Least Squares Derivative Estimator - 19 term
0.0237475   0.0195601   0.0018814  -0.0279459 -0.0626650  ...
-0.0909902  -0.1016449  -0.0879622  -0.0510464  0.0        ...
0.0510464   0.0879622   0.1016449   0.0909902   0.0626650  ...
0.0279459  -0.0018814  -0.0195601  -0.0237475
```

At this point, it is reasonable to say: "We are done. Just take the least squares estimator design and use it." However, it is possible to optimize further for a particular application. You can do this! Run this Octave (Matlab) design script. Fiddle with the parameters at the top of the script and rerun it until you find something that you like the best.

Some hints about what to look for as you adjust parameters.

• Try starting with a filter length 15. Making the filter length longer improves numerical performance, but at the cost of more computation. A longer filter needs more "start up" values, so it will require a slightly longer input data set to produce the same number of estimate points. A narrower low-frequency band will require more terms, because the filter will need to be more selective.
• Try starting with the low frequency response band equal to the frequency range that you want, and then tweak up and down from there. Reducing the flat response band usually allows a smoother transition and less noise in the middle frequencies in exchange for slight impairment at the end of the band.
• Try starting with an attenuation band above 35% of the Nyquist limit, and then adjust up or down from there. A smaller band lets more mid-band noise through, but promotes accuracy at the low frequencies.
• Try a weighting factor of 0.25 for the attenuation band, 1.0 for the accurate band, and then adjust these weights up or down.
• There is an additional weighting that can be applied to the low band to encourage additional low-frequency accuracy. Start it at 0, and try adjusting it to 10 or 20 after the rest of the filter characteristics look good.

And with that, this series comes to an end... for now.

Footnotes:

[1] The classic "Digital Filters" by R. W. Hamming covers the derivative estimator filters from a frequency analysis point of view, and describes the Lanczos low-noise designs (which don't work very well at all, by the way) in "an exercise."

Contents of the "Numerical Estimation of Derivatives" section of the ridgerat-tech.us website, including this page, are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise specifically noted. For complete information about the terms of this license, see http://creativecommons.org/licenses/by-sa/4.0/. The license allows usage and derivative works for individual or commercial purposes under certain restrictions. For permissions beyond the scope of this license, see the contact information page for this site.