Giter Club home page Giter Club logo

rolling's People

Contributors

ac-freeman avatar ajcr avatar daviddavo avatar davidpratt512 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rolling's Issues

Performance Comparisons

Hi,

thanks for the good effort you have put in this project already.

May I ask for an enhancement of the documentation?

It would be good to have an overview of the (overall) performance of the different functions.

My use case is that I have a numpy array and want to apply a rolling standard deviation window on one of the columns and put it in another column of this numpy array.

It would be good to compare the effort for this with the time it takes to do this for alternatives (like: use pandas directly).

Thanks in advance.

median implementation is slow

TLDR; don't use skip lists

I've found that a sorted list implemented with standard list + bisect module will be faster for tracking median than rolling's implementation up to about N < 50,000. For example, for N of 10,000 it's 4x faster. Resources explaining why list + bisect are unbeatable at these sizes: http://www.grantjenks.com/docs/sortedcontainers/implementation.html, http://www.grantjenks.com/docs/sortedcontainers/performance-scale.html

Of course, rolling would want to perform well for N > 50,000 too. So use sortedcontainers.SortedList. Even though it doesn't beat standard list + bisect until about N of 20,000, it's still faster than rolling's implementation for all sizes. For example, for N of 100,000 using SortedList is 2x faster.

Make rolling Mean more numerically stable

Rolling Mean is currently implemented as the sum of the window divided by the size of the window.

To give better numerical stability when working with large floating point values, it should use the approach taken in Welford's algorithm (cf. the Rolling Var class).

Rolling window contains function

Essentially a rolling version of Python's in operator. Return True if the window contains a given string, e.g.:

>>> seq = 'rollrollingndsfw'
>>> r = rolling.Contains(seq, window_size=10, match='rolling')
>>> list(r)
[False, True, True, ...]

Could also be extended to match multiple fixed strings.

Upload 0.5.0 to pypi

Hello! Latest version on pypi is 0.4.0 from Mar' 23, could you update it to the latest version?

Implement rolling mode

The mode is the most common observation in the window.

Need to decide how to handle cases where there are two or more equally common values - statistics module raises an error, while pandas returns each item.

I probably prefer pandas' approach here.

Ability to add data instead of having to pass generators

It would be very convenient if there was a method such as append() to add series in arbitrary manner instead of having to use generators. In many cases, converting existing code to form generators is time consuming and can become complex task in multithreaded environment.

Add ddof parameter for rolling variance and rolling standard deviation

ddof means 'delta degrees of freedom' (cf. NumPy).

Currently the code implicitly assumes ddof=1 (i.e. k - 1 degrees of freedom, sample variance). This should be the default, but the user should be able to set ddof with a keyword argument when a rolling iterator is instantiated if they want to.

Summation of float values is not precise enough in the calculation of variance/std

The rolling summation of float values is not precise enough in the calculation of variance/std.

>>> import rolling
>>> list(rolling.Std([0,1,1,1], 3))
[0.5773502691896258, 7.450580596923828e-09]

The first value is correct. The second value 7.450e-09 is small, but should be 0. It is not close to 0 using the default tolerances in math.isclose() for example.

This is related to #20 which caused the variance to drop below 0 (when it should have been 0).

Rolling statistics of series of two random variables

  • PROD, rolling mean of X * Y: 1/window * sum(x*y)
  • COV, rolling covariance of X, Y, PROD - mean(X) * mean(Y)
  • CORR, rolling correlation of X, Y, COV / (std(X) * std(Y))
  • SLOPE, rolling estimation of slope of regression: y = inteception + slope * x + \epsilon, COV / var(X)
  • INTERCEPT, rolling estimation of intercept of regression: y = inteception + slope * x + \epsilon, ...
  • BETA, rolling estimation of y = \beta * x + \epsilon

Add a Suffix Tree implementation

Suffix Tree representing window should support O(1) updates (append tail, delete head).

This would allow implementation of various matching/search algorithms.

Add type annotations

Methods/classes should have type annotations for easier integration with other codebases.

Handling of NaN

Short question, is it somehow possible to extend this to handle NaN, like numpy nanmedian?

rolling.Std ValueError: math domain error

Environment: OSX 10.15.7, python 3.7.9, rolling=0.2.0

When I run the following code,

values = [
    138,
    136,
    137,
    137,
    135,
    136,
    135,
    135,
    135,
]
std = rolling.Std(values, window_size=3, window_type='variable')
for _ in values:
    next(std)

I got an error.

.venv/lib/python3.7/site-packages/rolling/stats.py", line 166, in current_value
    return sqrt(self._sslm / (self._obs - self.ddof))
ValueError: math domain error

This is because self._sslm is a negative value.
The value of sslm changed as follows.

0.0
2.0
2.0
0.6666666666666572
2.6666666666666288
1.9999999999999147
0.6666666666664867
0.6666666666664867
-2.2737367544323206e-13

Add optional dependency for SortedContainers library

Rolling median uses a basic SortedList implementation (binary search/insert on a Python list).

The SortedList implementation in sortedcontainers is more advanced and will scale better for larger window sizes.

This should be an added as an optional dependency (e.g. pip install rollling[extras]) and rolling median should use the third party sortedcontainers implementation if it's available:

try:
    from sortedcontainers import SortedList
except Import Error:
    from rolling.structures.sorted_list import SortedList

Some minor work is needed to make the SortedList method names compatible with the calls in the rolling median implementation (or vice versa).

'expanding' type: allow all iterator classes to be used as online accumulators

The classes should have a window_type='expanding' mode. I.e. the window grows with each new value that is added from the iterator.

There would be no need to keep track of seen values as nothing is removed from the window.

(Later Note): for some algorithms this may not be possible (e.g. median) or worthwhile (e.g. any, all). I need to think further whether it's worth it.

Exponential weighted moving window statistics

It seems that pandas has not provided iterators for rolling/ewm functions, and your project is really nice structured. Maybe another base class is needed for iterators of ewm statistics.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.