One Sided Derivative Estimation
Filterbased estimators
A one sided convolution estimator buffers a history of past function values as it progresses through time. Clearly, this can only apply to the case of an analysis performed sequentially, not "on demand" at an arbitrary location in the data set. Clearly, buffering past input values is not the only way of collecting past history. Perhaps it is more effective to extract and maintain information in some kind of distilled form.
There are some wellknown "digital filter" formulas that do this kind of thing for the purpose of rejecting high frequencies from an input data sequence. Perhaps that kind of "stateful" processing can be applied both to derivative estimation and high frequency filtering at the same time, producing a better estimator.
This seems like such an obvious good idea that it is worth examining how this idea fails.
How can filtering help?
There are two ways that additional filtering could help the derivative estimator do its work.
 Help the detect and reject constant or nearlyconstant patterns in the data.
 Help to reject highfrequency chatter.
Differencing filter
It isn't possible to construct central differences — at least not without introducing extra delays. However, it is easy to construct a sequence of single differences. This loses no information (except for the initial constant level, to which the derivative estimator should not respond anyway.) The main advantage is that difference values go asymptotically to zero as function's derivative goes to zero. ^{[1]}
Here is a diagram of a filter^{[2]} that can calculate and lowpasssmooth the differences sequence.
The D notation indicates that the value propagates through storage in one time interval.

Differencing.
Subtracting a delayed input value from the current new value produces a difference value,df
, which is then available for subsequent processing. Lowpass filtering.
The difference values propagate through a number of delays. These delayed values are then available to be combined by filter processing for purposes of removing extraneous high frequencies. Theb
coefficients, applied to the history of differences, contribute to the filtered outputs. Those output are processed through a second chain of delays, so that prior filtered outputs help to predict future output values, using thea
coefficients. The lowpass filtering constitutes the majority of the processing shown in the diagram.Derivative estimation.
Given the history of filtered, noisereduced differences, a best combination of those values using thec
coefficients estimates the derivative value at the point of the latest new input.
Obtaining estimator parameters
I will skip the details of trying to design this filter. The lowpass filter is based on a canonical Butterworth filter, and the estimator coefficients are obtained by a weighted leastsquares best fit process. For reference, here are the coefficients for the filter.
b coefficients  1.282581e03 6.412905e03 1.282581e02 ... 1.282581e02 6.412905e03 1.282581e03 a coefficients  2.9754221 3.8060181 2.5452529 ... 0.8811301 0.1254306 c coefficients  5.085767, 5.896902, 1.582392
The resulting filter was applied to a swept sine waveform sequence. The input sequence looks like the following.
This is not a frequency spectrum; it is a time sequence going from left to right. However, the frequency is arranged to go increasingly higher as time progresses.
Here is a plot comparing the estimated derivative values (in blue) to the analyticallyderived "true" derivative values (in green). The position "200" along the horizontal axis corresponds to 20% of the Nyquist limit.
This isn't very good, is it.
The one good feature is that high frequencies are nicely attenuated, the higher the frequency the better the attenuation.
Every other feature of this response is disappointing. The accuracy at low frequencies is poor. The response in the middleband frequencies is pretty well bounded but not much of an improvement over other onesided estimator formulas. There is a troublesome amount of phase shift even at relatively low frequencies.
It is the phase shift problem that really cause the damage. It impairs the accuracy at lower frequencies, and it causes reinforcement rather than cancellation of undesirable responses in the middle frequencies.
In short, despite the more complicated processing, this approach provides no apparent advantages. As plausible as it seems, it should be avoided.
However, there is another kind of filtering that directly addresses delays, and the next section will cover that topic.
Footnotes:
[1] If you compare to the lowestorder Central Differences Estimator formula (see the page CentralDifferences.html on this site), you can observe that a difference value is in itself an approximator (though a relatively inaccurate and noisy one) of the function derivative, but delayed by one half time interval.
[2] This is one of the common "canonical" filter architectures. You can read more in the Wikipedia article Digital Filter, in the Direct Form 1 subsection.