ajcr / rolling Goto Github PK

View Code? Open in Web Editor NEW

197.0 8.0 9.0 225 KB

Computationally efficient rolling window iterators for Python

License: MIT License

Python 100.00%

rolling-windows algorithm rolling-algorithms python iterator sliding-windows efficient-algorithm rolling-hash-functions

rolling's People

Contributors

Stargazers

Watchers

Forkers

weijias-fork mromero107 amirgraily7 bishal-bit useric notinmood ac-freeman mgonline daviddavo

rolling's Issues

Performance Comparisons

Hi,

thanks for the good effort you have put in this project already.

May I ask for an enhancement of the documentation?

It would be good to have an overview of the (overall) performance of the different functions.

My use case is that I have a numpy array and want to apply a rolling standard deviation window on one of the columns and put it in another column of this numpy array.

It would be good to compare the effort for this with the time it takes to do this for alternatives (like: use pandas directly).

Thanks in advance.

I've found that a sorted list implemented with standard list + bisect module will be faster for tracking median than rolling's implementation up to about N < 50,000. For example, for N of 10,000 it's 4x faster. Resources explaining why list + bisect are unbeatable at these sizes: http://www.grantjenks.com/docs/sortedcontainers/implementation.html, http://www.grantjenks.com/docs/sortedcontainers/performance-scale.html

Of course, rolling would want to perform well for N > 50,000 too. So use sortedcontainers.SortedList. Even though it doesn't beat standard list + bisect until about N of 20,000, it's still faster than rolling's implementation for all sizes. For example, for N of 100,000 using SortedList is 2x faster.

Make rolling Mean more numerically stable

Rolling Mean is currently implemented as the sum of the window divided by the size of the window.

To give better numerical stability when working with large floating point values, it should use the approach taken in Welford's algorithm (cf. the Rolling Var class).

Investigate usefulness of additional rolling window algorithms

Possible algorithms to implement over a rolling window:

palindromic / longest palindrome (Manacher's algorithm)
monotonicity
convexity

Rolling window contains function

Essentially a rolling version of Python's in operator. Return True if the window contains a given string, e.g.:

>>> seq = 'rollrollingndsfw'
>>> r = rolling.Contains(seq, window_size=10, match='rolling')
>>> list(r)
[False, True, True, ...]

Could also be extended to match multiple fixed strings.

Upload 0.5.0 to pypi

Hello! Latest version on pypi is 0.4.0 from Mar' 23, could you update it to the latest version?

Implement longest increasing subsequence algorithm

It would be interesting to try and implement an algorithm to find the longest increasing subsequence in a sliding window.

Such an approach is described in Albert et al., 2004:

https://pdfs.semanticscholar.org/5517/cc2a743fa07f2ae2fa6442ca77a8b419a8ee.pdf

Implement rolling skew and rolling kurtosis methods

pandas has implementations for rolling skew and rolling kurtosis. The implementations can be seen here.

Implement rolling mode

The mode is the most common observation in the window.

Need to decide how to handle cases where there are two or more equally common values - statistics module raises an error, while pandas returns each item.

I probably prefer pandas' approach here.

Rolling bitwise methods (&, |, XOR)

I have no idea if this would be practically useful, but it might be interesting.

Ability to add data instead of having to pass generators

It would be very convenient if there was a method such as append() to add series in arbitrary manner instead of having to use generators. In many cases, converting existing code to form generators is time consuming and can become complex task in multithreaded environment.

Add ddof parameter for rolling variance and rolling standard deviation

ddof means 'delta degrees of freedom' (cf. NumPy).

Currently the code implicitly assumes ddof=1 (i.e. k - 1 degrees of freedom, sample variance). This should be the default, but the user should be able to set ddof with a keyword argument when a rolling iterator is instantiated if they want to.

Add rolling quantile implementation

Implement rolling regression / least squares

For each window of numerical values, compute the equation of the line of best fit through these points.

This looks like a decent starting point: https://en.wikipedia.org/wiki/Moving_least_squares

Implement rolling argmin/argmax

Very similar to min/max algorithms: track the index of the (first) minimum value in the rolling window.

Summation of float values is not precise enough in the calculation of variance/std

The rolling summation of float values is not precise enough in the calculation of variance/std.

>>> import rolling
>>> list(rolling.Std([0,1,1,1], 3))
[0.5773502691896258, 7.450580596923828e-09]

The first value is correct. The second value 7.450e-09 is small, but should be 0. It is not close to 0 using the default tolerances in math.isclose() for example.

This is related to #20 which caused the variance to drop below 0 (when it should have been 0).

Rolling statistics of series of two random variables

PROD, rolling mean of X * Y: 1/window * sum(x*y)
COV, rolling covariance of X, Y, PROD - mean(X) * mean(Y)
CORR, rolling correlation of X, Y, COV / (std(X) * std(Y))
SLOPE, rolling estimation of slope of regression: y = inteception + slope * x + \epsilon, COV / var(X)
INTERCEPT, rolling estimation of intercept of regression: y = inteception + slope * x + \epsilon, ...
BETA, rolling estimation of y = \beta * x + \epsilon

Add a Suffix Tree implementation

Suffix Tree representing window should support O(1) updates (append tail, delete head).

This would allow implementation of various matching/search algorithms.

Implement alternative rolling sum using Kahan summation

cf. https://en.wikipedia.org/wiki/Kahan_summation_algorithm

Also refer to CPython's implementation of math.fsum:

pypi release

please push releases to pypi-- last was 0.2.0

https://pypi.org/project/rolling/

Add type annotations

Methods/classes should have type annotations for easier integration with other codebases.

Handling of NaN

Short question, is it somehow possible to extend this to handle NaN, like numpy nanmedian?

rolling.Std ValueError: math domain error

Environment: OSX 10.15.7, python 3.7.9, rolling=0.2.0

When I run the following code,

values = [
    138,
    136,
    137,
    137,
    135,
    136,
    135,
    135,
    135,
]
std = rolling.Std(values, window_size=3, window_type='variable')
for _ in values:
    next(std)

I got an error.

.venv/lib/python3.7/site-packages/rolling/stats.py", line 166, in current_value
    return sqrt(self._sslm / (self._obs - self.ddof))
ValueError: math domain error

This is because self._sslm is a negative value.
The value of sslm changed as follows.

0.0
2.0
2.0
0.6666666666666572
2.6666666666666288
1.9999999999999147
0.6666666666664867
0.6666666666664867
-2.2737367544323206e-13

Add optional dependency for SortedContainers library

Rolling median uses a basic SortedList implementation (binary search/insert on a Python list).

The SortedList implementation in sortedcontainers is more advanced and will scale better for larger window sizes.

This should be an added as an optional dependency (e.g. pip install rollling[extras]) and rolling median should use the third party sortedcontainers implementation if it's available:

try:
    from sortedcontainers import SortedList
except Import Error:
    from rolling.structures.sorted_list import SortedList

Some minor work is needed to make the SortedList method names compatible with the calls in the rolling median implementation (or vice versa).

'expanding' type: allow all iterator classes to be used as online accumulators

The classes should have a window_type='expanding' mode. I.e. the window grows with each new value that is added from the iterator.

There would be no need to keep track of seen values as nothing is removed from the window.

(Later Note): for some algorithms this may not be possible (e.g. median) or worthwhile (e.g. any, all). I need to think further whether it's worth it.

Can the rolling.Apply support multidimensional?

Now rolling.apply can roll window at one-dimensional,will be able to support multidimensional in the future?Like that:

I find it will be finish in that "https://gist.github.com/seberg/3866040".

Exponential weighted moving window statistics

It seems that pandas has not provided iterators for rolling/ewm functions, and your project is really nice structured. Maybe another base class is needed for iterators of ewm statistics.