deepcharles / ruptures Goto Github PK

ruptures: change point detection in Python

License: BSD 2-Clause "Simplified" License

Python 86.70% C 9.10% Cython 4.20%

signal-processing changepoint python scientific-computing science change-point-detection

ruptures's Introduction

Welcome to ruptures

ruptures is a Python library for off-line change point detection. This package provides methods for the analysis and segmentation of non-stationary signals. Implemented algorithms include exact and approximate detection for various parametric and non-parametric models. ruptures focuses on ease of use by providing a well-documented and consistent interface. In addition, thanks to its modular structure, different algorithms and models can be connected and extended within this package.

How to cite. If you use ruptures in a scientific publication, we would appreciate citations to the following paper:

C. Truong, L. Oudre, N. Vayatis. Selective review of offline change point detection methods. Signal Processing, 167:107299, 2020. [journal] [pdf]

Basic usage

(Please refer to the documentation for more advanced use.)

The following snippet creates a noisy piecewise constant signal, performs a penalized kernel change point detection and displays the results (alternating colors mark true regimes and dashed lines mark estimated change points).

import matplotlib.pyplot as plt
import ruptures as rpt

# generate signal
n_samples, dim, sigma = 1000, 3, 4
n_bkps = 4  # number of breakpoints
signal, bkps = rpt.pw_constant(n_samples, dim, n_bkps, noise_std=sigma)

# detection
algo = rpt.Pelt(model="rbf").fit(signal)
result = algo.predict(pen=10)

# display
rpt.display(signal, bkps, result)
plt.show()

General information

Contact

Concerning this package, its use and bugs, use the issue page of the ruptures repository. For other inquiries, you can contact me here.

Important links

Dependencies and install

Installation instructions can be found here.

Changelog

See the changelog for a history of notable changes to ruptures.

Thanks to all our contributors

License

This project is under BSD license.

BSD 2-Clause License

Copyright (c) 2017-2022, ENS Paris-Saclay, CNRS
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

ruptures's People

Contributors

Stargazers

Watchers

Forkers

channsoden chetanmehra sandy4321 kmichel jtwangtw kianqunki kevinbalkoski geoffdneal startrekor renaudln tanthml jsmartinstats nicharuc biodun juanlp jozelazarevski bencoscia jing-andy xiaojiazhang yimingli1994 theauheral pete-york guillaumerosinosky justnow0 zorrock jingmouren igorpereirabr1 bravepennisetum vitasiku strong-roots-capital rafaelmri knut0815 kidluo geopars x-malet lian-yu swicech alexliberzonlab lennonmwy juehyoung jingguoguo concenterate astrotuna201 ancostas puhachov guillaumegilles98 sadiatab onarada trentonbush multimeric fangniuer labrook kmsquire statefb bohblue2 jimmywellc guillaumewrobel achab nchams veeruamma gjaeger abv-hub aurelienserre dongishan icecity96 mlondschien chenxiaomin1997 sichen-cuc exp-time-series-tools yousr1 violetaseo jeanmonet ziongao benwaldner cheathscott mustangs0786 yungdurum kenenbek cleet-hub saullu carldata kravi2018 isabelceo-dot gaelalk weirdocxl m-r-tanha piaoyunhw earthgecko nanjingzqz manob-git julia-shenshina dasakab hirni-meshram4 pckk123 kristianspurling manishgit138 odidev sodiqadewole sahandissa antoinedemathelin

ruptures's Issues

C segmentation fault on some inputs

Minimal example:

In [1]: import numpy as np
   ...: import ruptures as rpt
   ...: signal = np.array([720.1, 720.1, 1800.2, 360.0, 9361.0])
   ...: algo = rpt.KernelCPD(kernel="rbf", min_size=10).fit(signal)
   ...: cps = algo.predict(pen=10)

[1] 63503 segmentation fault (core dumped) ipython

I realized this is probably because the signal is less than the min_size (longer signals don't cause that), but in that case, I would expect to get a meaningful python error which I can try-except. The C-level errors cause really unexpected behavior for example killed kernel on jupyter notebooks.

ruptures==1.1.2

Request for documentation to annotate changepoint

Hey Ruptures Team,
Great work with the package I am a beginner and I can't find good documentation to help me annotate the changepoints (dates) in my changepoint pelt plot using raptures.
PS. top 4 ranked changepoints is what I need.

Currently I use the following code:
algo = rpt.Pelt(model = 'rbf').fit(points)
bkps = algo.predict(pen = 10)
plt.title('Change Point Detection: Pelt Search Method')
rpt.show.display(points, bkps, figsize=(17, 6))
plt.savefig('1rpt_plot.png',dpi=80)

which gives me a plot like this

There are 3 problems in this ,
a) The x axis is not showing dates
b) I have no idea which changepoints are the top 4 changepoints and
c) to annotate them

Could you provide a small code example , my dataset looks like this toy dataset. It would really help me!

Date, Value
2016-01-03 , 286
2016-01-04 , 83
2016-01-05 , 112
2016-01-06 , 286
2016-01-06 , 286
2016-01-07 , 379
2016-01-08 , 286
2016-01-09 , 120
2016-01-08 , 85
2016-01-09 , 300

Thanks :)

Time series breaks with ruptures

Estimating confidence that a breakpoint occurs in a 1D array

I have a few hundred 1-D timeseries, each ~120 points. Each timeseries may or may not have one or more breakpoints caused by changes in instrumentation, with the timing and nature of the instrumentation change differing from timeseries to timeseries and not known in advance for any of them. I am interested in estimating some measure of confidence for each timeseries that at least one breakpoint exists for that timeseries (in addition to when that breakpoint occurs).

What would you recommend using for this? This sounds to me like a statistical test based on BinSeg with n=1 breakpoints, but I'm new to breakpoints overall and to the ruptures package and so it's not obvious to me if that's correct conceptually nor how to do that with ruptures. Apologies if I'm moving too quickly and thus missing something clear in the docs.

[Question] Complex penalties

I note in the paper you talk about different penalties in section 6.1. However from looking through the library, it seems that ruptures only supports a fixed linear penalty (ie Beta). Am I right to assume that it doesn't work with more complex penalties linear such as AIC?

Further, if I wanted to implement a method were you normally would calculate a p-value of splitting (ie a likelihood ratio test following Chi squared), is the idea that we just have the error() method return the test statistic without testing for significance (ie the raw likelihood ratio), and the penalty constant implies a p-value? I suppose this makes segmentation fast and flexible, but highly dependent on the choice of penalty.

Multi-Model Change Point Detection

Hi!

I am hoping to use Ruptures for a project where I have a multi-dimensional or multivariate signal (I'm not 100% sure on the terminology) where I expect different models for some of the dimensions to follow different models (e.g. costs). From what I've read of the survey and looking at the current Ruptures implementation it seems like this hasn't been done currently.

I have some ideas for how to do it and have started implementing it. Is this something you would be interested in having a pull request for? Do you have any suggestions of existing work in this area I may have overlooked?

Thanks, for the package by the way, it's great!

Cost Function Examples and Documentation

I'm trying my hand at a few of the cost functions, where I'm interested in both changing variance and mean shifts. I've been reading through the selective review paper and was interested in trying Cost function 3, which I believe is CostNormal in ruptures? Is it possible to expand on the documentation for this cost function in the Docs? For instance it's unclear to me if the second sigma term from equation C3 of the review paper is in the CostNormal function?

return value of predict in pelt

Hi,

I intend to detect change points in a uni-variate time sequence of length 49 using PELT search method. The values returned by predict are: [15, 20, 49]
The value of partition.keys() is [(0, 15), (15, 20), (20, 49)].

ruptures/ruptures/detection/pelt.py

Line 181 in 1a422fc

bkps = sorted(e for s, e in partition.keys())

I want to find the data points which have been detected as change points, and am not sure how to use this result. Are these indexes or positions? because they contain both 0 and 49

Normal cost function docs question

Hi guys,
Love your work!
I'd just like to clarify something with the Normal cost function as mentioned in the User guide here
It says
"This cost function detects changes in the mean and covariance matrix of a sequence of multivariate Gaussian random variables"

But I can't see how the function detects changes in the mean?
This

cov = np.cov(sub.T)
_, val = slogdet(cov)
return val * (end - start)

Calculates the covariance and then its log.
I'm just wondering, where is the change in mean taken into account?

EDIT:
I can see Lavielle 2006 uses the same formula and mentions it detects changes in mean:
"For the detection of changes in the mean vector and/or the covariance matrix of a multivariate
sequence of random variables, this contrast also reduces to"

Thanks,
Rowan

Online algorithms?

Hi (bonjour),

I am a bit of newbie in changepoint detection algorithms, I wanted to know if it is possible to use ruptures for online detection? So far I have the (probably false) impression that it focuses only on offline detection.

Am I right?

All the best,

Julien

Unable to install using pip

Hi,
I am trying to pip install ruptures on Python 3.9.1 but getting below error. Any thoughts?

Thanks,
Rahul

C:\Users\arago>python -m pip install -U ruptures
Collecting ruptures
  Using cached ruptures-1.1.3.tar.gz (235 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Requirement already satisfied: scipy in c:\users\arago\appdata\local\programs\python\python39\lib\site-packages (from ruptures) (1.6.0)
Requirement already satisfied: numpy in c:\users\arago\appdata\local\programs\python\python39\lib\site-packages (from ruptures) (1.19.5)
Building wheels for collected packages: ruptures
  Building wheel for ruptures (PEP 517) ... error
  ERROR: Command errored out with exit status 1:
   command: 'C:\Users\arago\AppData\Local\Programs\Python\Python39\python.exe' 'C:\Users\arago\AppData\Local\Programs\Python\Python39\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\arago\AppData\Local\Temp\tmpnmpua21i'
       cwd: C:\Users\arago\AppData\Local\Temp\pip-install-blzouu8e\ruptures_789a0e7cc16c4cd2b847b39dc0a34b3d
  Complete output (89 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.9
  creating build\lib.win-amd64-3.9\ruptures
  copying src\ruptures\base.py -> build\lib.win-amd64-3.9\ruptures
  copying src\ruptures\exceptions.py -> build\lib.win-amd64-3.9\ruptures
  copying src\ruptures\version.py -> build\lib.win-amd64-3.9\ruptures
  copying src\ruptures\__init__.py -> build\lib.win-amd64-3.9\ruptures
  creating build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costautoregressive.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costclinear.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costcosine.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costl1.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costl2.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costlinear.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costml.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costnormal.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costrank.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\costrbf.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\factory.py -> build\lib.win-amd64-3.9\ruptures\costs
  copying src\ruptures\costs\__init__.py -> build\lib.win-amd64-3.9\ruptures\costs
  creating build\lib.win-amd64-3.9\ruptures\datasets
  copying src\ruptures\datasets\pw_constant.py -> build\lib.win-amd64-3.9\ruptures\datasets
  copying src\ruptures\datasets\pw_linear.py -> build\lib.win-amd64-3.9\ruptures\datasets
  copying src\ruptures\datasets\pw_normal.py -> build\lib.win-amd64-3.9\ruptures\datasets
  copying src\ruptures\datasets\pw_wavy.py -> build\lib.win-amd64-3.9\ruptures\datasets
  copying src\ruptures\datasets\__init__.py -> build\lib.win-amd64-3.9\ruptures\datasets
  creating build\lib.win-amd64-3.9\ruptures\detection
  copying src\ruptures\detection\binseg.py -> build\lib.win-amd64-3.9\ruptures\detection
  copying src\ruptures\detection\bottomup.py -> build\lib.win-amd64-3.9\ruptures\detection
  copying src\ruptures\detection\dynp.py -> build\lib.win-amd64-3.9\ruptures\detection
  copying src\ruptures\detection\kernelcpd.py -> build\lib.win-amd64-3.9\ruptures\detection
  copying src\ruptures\detection\pelt.py -> build\lib.win-amd64-3.9\ruptures\detection
  copying src\ruptures\detection\sanity_check.py -> build\lib.win-amd64-3.9\ruptures\detection
  copying src\ruptures\detection\window.py -> build\lib.win-amd64-3.9\ruptures\detection
  copying src\ruptures\detection\__init__.py -> build\lib.win-amd64-3.9\ruptures\detection
  creating build\lib.win-amd64-3.9\ruptures\metrics
  copying src\ruptures\metrics\hamming.py -> build\lib.win-amd64-3.9\ruptures\metrics
  copying src\ruptures\metrics\hausdorff.py -> build\lib.win-amd64-3.9\ruptures\metrics
  copying src\ruptures\metrics\precisionrecall.py -> build\lib.win-amd64-3.9\ruptures\metrics
  copying src\ruptures\metrics\randindex.py -> build\lib.win-amd64-3.9\ruptures\metrics
  copying src\ruptures\metrics\sanity_check.py -> build\lib.win-amd64-3.9\ruptures\metrics
  copying src\ruptures\metrics\timeerror.py -> build\lib.win-amd64-3.9\ruptures\metrics
  copying src\ruptures\metrics\__init__.py -> build\lib.win-amd64-3.9\ruptures\metrics
  creating build\lib.win-amd64-3.9\ruptures\show
  copying src\ruptures\show\display.py -> build\lib.win-amd64-3.9\ruptures\show
  copying src\ruptures\show\__init__.py -> build\lib.win-amd64-3.9\ruptures\show
  creating build\lib.win-amd64-3.9\ruptures\utils
  copying src\ruptures\utils\bnode.py -> build\lib.win-amd64-3.9\ruptures\utils
  copying src\ruptures\utils\drawbkps.py -> build\lib.win-amd64-3.9\ruptures\utils
  copying src\ruptures\utils\utils.py -> build\lib.win-amd64-3.9\ruptures\utils
  copying src\ruptures\utils\__init__.py -> build\lib.win-amd64-3.9\ruptures\utils
  creating build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\__init__.py -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  creating build\lib.win-amd64-3.9\ruptures\utils\_utils
  copying src\ruptures\utils\_utils\__init__.py -> build\lib.win-amd64-3.9\ruptures\utils\_utils
  running egg_info
  writing src\ruptures.egg-info\PKG-INFO
  writing dependency_links to src\ruptures.egg-info\dependency_links.txt
  writing requirements to src\ruptures.egg-info\requires.txt
  writing top-level names to src\ruptures.egg-info\top_level.txt
  reading manifest file 'src\ruptures.egg-info\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no previously-included files found matching 'CHANGELOG.md'
  warning: no previously-included files found matching 'CONTRIBUTING.md'
  warning: no previously-included files found matching 'mkdocs.yml'
  warning: no previously-included files found matching 'mkdocs_macros.py'
  warning: no previously-included files matching '__pycache__' found anywhere in distribution
  warning: no previously-included files matching '.*' found anywhere in distribution
  writing manifest file 'src\ruptures.egg-info\SOURCES.txt'
  copying src\ruptures\detection\_detection\ekcpd.c -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\ekcpd.pxd -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\ekcpd.pyx -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\ekcpd_computation.c -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\ekcpd_computation.h -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\ekcpd_pelt_computation.c -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\ekcpd_pelt_computation.h -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\kernels.c -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\detection\_detection\kernels.h -> build\lib.win-amd64-3.9\ruptures\detection\_detection
  copying src\ruptures\utils\_utils\convert_path_matrix.c -> build\lib.win-amd64-3.9\ruptures\utils\_utils
  copying src\ruptures\utils\_utils\convert_path_matrix.pxd -> build\lib.win-amd64-3.9\ruptures\utils\_utils
  copying src\ruptures\utils\_utils\convert_path_matrix.pyx -> build\lib.win-amd64-3.9\ruptures\utils\_utils
  copying src\ruptures\utils\_utils\convert_path_matrix_c.c -> build\lib.win-amd64-3.9\ruptures\utils\_utils
  copying src\ruptures\utils\_utils\convert_path_matrix_c.h -> build\lib.win-amd64-3.9\ruptures\utils\_utils
  running build_ext
  building 'ruptures.detection._detection.ekcpd' extension
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  ----------------------------------------
  ERROR: Failed building wheel for ruptures
Failed to build ruptures
ERROR: Could not build wheels for ruptures which use PEP 517 and cannot be installed directly

Error if input data is not float64 for algos implemented in C

To replicate :

import numpy as np
import ruptures as rpt

np_array = np.random.random((1000,5))
print(np_array.dtype)
np_array32 = np_array.astype(np.float32, copy=False)
print(np_array32.dtype)

algo = rpt.KernelCPD(kernel = "linear").fit(np_array)
algo.predict(n_bkps=3) # Works fine
algo = rpt.KernelCPD(kernel = "linear").fit(np_array32)
algo.predict(n_bkps=3) # Returns an error

Error :

File "src/ruptures/detection/_detection/ekcpd.pyx", line 8, in ruptures.detection._detection.ekcpd.ekcpd_L2
    cpdef ekcpd_L2(double[:,:] signal, int n_bkps, int min_size):
ValueError: Buffer dtype mismatch, expected 'double' but got 'float'

Possible explanation :

In the python typed code and/or (to be checked) in the C code, we use double (which might be platform dependent). If incoming data is for some some reasons (data explicitly encoded on 32/16 bits, or because architecture only supports float32), then it gives an error.

Possible Fixes :

In KernelCPD, cast signal at .fit() time to enforce float64 : self.cost.fit(signal.astype(np.double))
Maybe instead of using double, use bitness explicit types (in C and Cython) and cast if necessary.

import problem - incompatibility with numpy

I am using python 3.7.6 and have installed the ruptures module in my conda environment:
(mytfenv) rohitpro@rohits-mbp-64 Phi29MATLAB-master % python -m pip install ruptures Requirement already satisfied: ruptures in /Users/rohitpro/opt/anaconda3/envs/mytfenv/lib/python3.7/site-packages (1.1.3) Requirement already satisfied: scipy in /Users/rohitpro/opt/anaconda3/envs/mytfenv/lib/python3.7/site-packages (from ruptures) (1.6.1) Requirement already satisfied: numpy in /Users/rohitpro/opt/anaconda3/envs/mytfenv/lib/python3.7/site-packages (from ruptures) (1.20.1)

When I try to import the ruptures module in a jupyter notebook, I get a nasty attribute error (showing only the last line here):

import ruptures as rpt
AttributeError: module 'numpy.linalg.lapack_lite' has no attribute '_ilp64'

I tried uninstalling and reinstalling numpy and ruptures but the problem didn't resolve. How can I solve this issue?

Example of vectorizing ruptures with xarray.apply_ufunc

In case others come across a similar need (whether or not it makes sense to implement this into ruptures proper), here's a way of using xarray.apply_ufunc to perform calculations on rupture on xarray.DataArrays.

This particular example is for detecting a single breakpoint; note that if you want to return multiple, the apply_ufunc call signature will have to be modified, because then each call is returning an array rather than a scalar.

import ruptures as rpt
import xarray as xr

def detect_breakpoint(arr, dim, rpt_class=rpt.Binseg, model="l2", n_bkps=1):
    """Use xr.apply_ufunc to broadcast breakpoint detections from ruptures"""

    def _detect_bp(arr):
        """Wrapper to use in apply_ufunc."""
        return rpt_class(model=model).fit(arr).predict(n_bkps=n_bkps)[0]

    inds_bp = xr.apply_ufunc(
        _detect_bp,
        arr,
        input_core_dims=[[dim]],
        vectorize=True,
        dask="parallelized",
    )
    return arr[dim][inds_bp]

(FYI This came up when attempting to detect changepoints at each point of a lat-lon gridded rainfall dataset.)

Segmentation error on KernelCPD with larger min_size than data

import ruptures.detection.kernelcpd as ruptures
import numpy as np

t = [1.15801045, 4.55289317, 3.53014419, 3.55136236, 3.91430448, 3.97625801]
t = np.array(t)

algo = ruptures.KernelCPD(kernel="linear", min_size=10).fit(t)
index = algo.predict(pen=30000)

When the min size is to close to len of data, python crash with various random errors.

munmap_chunk(): invalid pointer
core dumped
free(): invalid next size (fast)

Reported on ruptures 1.1.3

Allow other plot **kwargs for show.display to be set

In ruptures.show.display() there is no avenue to pass along other general plotting options. While a **kwargs argument is taken in, it only sets specific elements found (which isn't clear in the docs). I would propose allowing the subplot call to eat all the extra args. I think this would be easier than having to use the returned fig or ax after the fact.

before:
fig, axarr = plt.subplots(n_features, figsize=figsize, sharex=True)
after:
fig, axarr = plt.subplots(n_features, figsize=figsize, sharex=True, **kwargs)

Of course this would change the way that **kwargs is currently used with setting linewidth, alpha, etc... But maybe those could be more implicit and parsed out or a separate input parameter.

Nonparametric Cost Function Implementation

I am trying to implement (3.2) in https://arxiv.org/abs/1602.01254 using the custom cost function class.

I tested the cost function on a toy example and it failed to pick up an obvious CP leading me to believe the implementation is incorrect.

import ruptures as rpt
from math import log, exp
from ruptures.base import BaseCost
import numpy as np

def F(t, sub):
    
    indicator_sum = 0
    
    for i in range(len(sub)):
        if sub[i] <=t:
            indicator_sum += 1
            
    
    return indicator_sum/len(sub)

def Lnp(t, sub):
    
    l = 0
    
    if (F(t, sub) == 1) or (F(t, sub) == 0):
        l = 0
    else:
        l = len(sub)*(F(t, sub)*log(F(t, sub)) + (1-F(t, sub))*log(1-F(t, sub)))
        
    return l
    
def quantiles(data ,n , K, gamma):
    
    ts = np.zeros(shape = K)
    
    for i in range(K):
        ts[i] = np.quantile(data, 1/(1+(2*n-1)*exp(gamma*(2*(i+1)-1))))
        
    return ts


class MyCost(BaseCost):

    """Custom cost for exponential signals."""

    # The 2 following attributes must be specified for compatibility.
    model = ""
    min_size = 7

    def fit(self, signal):
        """Set the internal parameter."""
        
        self.signal = signal
        self.n = len(signal)
        self.K = int(np.ceil(4*log(self.n)))
        self.gamma = -log(2*self.n-1)/self.K
        
        return self
    
    
    def error(self, start, end):
        """Return the approximation cost on the segment [start:end].

        Args:
            start (int): start of the segment
            end (int): end of the segment

        Returns:
            float: segment cost
        """
        sub = self.signal[start:end]
        
        ts = quantiles(self.signal ,self.n, self.K, self.gamma)
        c = 0
        for i in range(self.K):
            c += Lnp(ts[i], sub)
            
    
        
        return (2*log(2*self.n-1)/self.K)*c




signal = np.concatenate((np.random.normal(loc = 0, scale = 1, size = 50), np.random.normal(loc = 50, scale = 1, size = 50)))
algo = rpt.Pelt(custom_cost=MyCost()).fit(signal)
result = algo.predict(pen=0)
# display
rpt.display(signal,result)

Too many changepoints returned for approximate search methods?

Thanks for making all these changepoint detection methods available in Python!

When I try the examples from the docs for the approximate search method, e.g. Window().predict(), the number of breakpoints returned is one more than the number specified. The final breakpoint is always the index of the last observation in the timeseries. For example, if I run

import numpy as np
import matplotlib.pylab as plt
import ruptures as rpt
# creation of data
n, dim = 500, 3  # number of samples, dimension
n_bkps, sigma = 3, 5  # number of change points, noise standart deviation
signal, bkps = rpt.pw_constant(n, dim, n_bkps, noise_std=sigma)

model = "l2"  # "l1", "rbf", "linear", "normal", "ar"
algo = rpt.Window(width=40, model=model).fit(signal)
my_bkps = algo.predict(n_bkps=3)
my_bkps

my_bkps will actually consist of four values: [120, 250, 375, 500], rather than three as specified with n_bkps=3. 500 is the length of signal in this case. Is this last returned value a real changepoint, or is it there for some other reason (e.g. a helper value for drawing the filled areas with show.display())?

Layman Question - Predict for mean-shift and variance-shift?

Hi,
Layman here - apologies if this is a stupid Q!
I've read through your paper, and a few others.
I'm trying to fit to data (2 signals/dimensions) that has changepoints that may be any of: mean-shift + variance-shift, mean-shift, variance-shift. I do not know the number of changepoints.

I'm getting some good accuracy on all the mean shift changepoints using Pelt and BinSeg with L2 cost function. This cost function is also great because the BIC value works for penalty and there's no supervision required.

However, using L2 doesn't really catch the variance-shift points, instead Rank, rbf (slow) and L1 catch all the changes quite well. But the penalty for these cannot be calculated, as far as I can tell? So need supervision.

I've noticed, looking at the R cpt package guide that cpt offers the methods: var, mean, and meanvar which I assume does what it says on the tin.
It's not immediately obvious which cost function to select to effectively catch these meanvar or mean + variance shifts, do you have any recommendations please?
Thanks
R

Speed up L2 computation

I just want to suggest that you could speed up the computation of L2 cost by precomputing the cumulative sums, so that signal[start:end].sum() becomes equivalent to cumsum[end] - cumsum[start], so you do lookups instead of actually summing. If you do the same for the squares and use the fomula Var(X) = E[X^2] - E[X]^2 that is enough to get to the sample variance.

Here is a sample implementation in the 1d case that worked about 4 times faster in my tests on a 80000 sample signal (35 vs 9 seconds)

class CostL2Fast(BaseCost):
    # The 2 following attributes must be specified for compatibility.
    model = "fastl2"
    min_size = 5

    def fit(self, signal):
        """Set the internal parameter."""
        self.signal = signal
        self.cs = np.r_[[0], signal.cumsum()]
        self.cs2 = np.r_[[0], np.power(signal, 2).cumsum()]
        return self

    def error(self, start, end):
        """Return the approximation cost on the segment [start:end].

        Args:
            start (int): start of the segment
            end (int): end of the segment

        Returns:
            float: segment cost
        """
        n = end - start
        # More efficient computation of the variance based on the cumsums of the values and the values squared
        return (self.cs2[end] - self.cs2[start]) - ((self.cs[end] - self.cs[start]))**2 / n

Detect only increasing trends or changes

Is there a way in the current scheme to detect only positive changes ?

Currently , am using window method.

def rupture_changepoint(points):
    points.values.reshape((points.shape[0],1))
    points = points.dropna()
    model = "l2"
    algo = rpt.Window(width = 10, model=model).fit(points.values)
    my_bkps = algo.predict(pen=60)
    print(my_bkps)
    fig, (ax,) = rpt.display(points, my_bkps, my_bkps, figsize=(10,6))
    plt.show()

Got the break points as [30, 40, 50, 100, 121]

Here, i dont want the decreasing trend to be there as a breakpoint. I am only interested in the [40,50] change.

Please point me to the specific files that needs to be changed if its not straightforward

Thank you,

Is it possible to calculate delta time?

Hi,

I apologize in advance if this is a silly question, but I am attempting calculate in "hours" of electricity data daily load profiles the time it takes for rise & fall times in the data... For example this is a snip from a while paper on analyzing the data for building electricity demand.

I think I can get sort of close using the Binary Segmentation search method as shown in this Gist.

Can I calculate in hours the delta time between each change point? Any tips greatly appreciated thank you for creating a cool repo!

Collaboration with sktime

Hi @deepcharles, just saw your post on MaxBenChrist/awesome_time_series_in_python#26, I'm one of the core developers of sktime.

We haven't implemented any time series annotation functionality yet, but would certainly be interested in collaborating and trying out how well our time series classification module works with ruptures.

FYI We've already added ruptures on our version of the list here.

Implementing c_rank cost function

Thanks for your great work with this library.

I notice that table 2 in the paper says that the Lévy-Leduc paper (citation 28) is implemented in ruptures, but it doesn't seem that a c_rank cost function is implemented. Is this correct?

Are you interested in adding this cost function to ruptures? Would you accept a PR adding it?

Implementing PCA based cost function

Thank you for making this amazing library.

I'm engaging in manufacturing industry and I found that PCA (Principal component analysis) based segmentation algorithm is useful for sensor data in this field.

PCA based segmentation is realized by two types of cost function: Q statistics and T2 statistics.
Q statistics is so-called reconstruction error of PCA and T2 is hotelling's T-squared.
Here is original paper (pp. 12-15) and it says:

The Q reconstruction error can be used to segment the time-series according to the direct change of the correlation between the variables, while the Hotelling's T2 statistics can be utilized to segment the time-series based on the drift of the center of the operating region.

Indeed, ruptures already has mahalanobis implementation which computes cost function by using global structure i.e. inverse of covariance of entire signal. The difference is the paper's method computes inv cov matrix of each segment signal on subdimension at every iteration.

I'd like to ask you if interested in adding these cost function to ruptures and accept PR.
Thank you in advance!

Simpler management of Numpy version depending on OS

@oboulant
Following pypa/cibuildwheel#565 (comment), we could greatly simplify the pyproject.toml using a package maintained by scipy team.
I'll do a PR now.

Many thanks @henryiii

Can't install on MacOS Big Sur (11.0)

I get the following error:

  Using cached ruptures-1.1.1.tar.gz (229 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /Users/useruser/Desktop/scrooge/pytrends-master/.venv/bin/python3 /Users/useruser/Desktop/scrooge/pytrends-master/.venv/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/tmpku27q7pu
       cwd: /private/var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/pip-install-isdzeylx/ruptures
  Complete output (24 lines):
  Traceback (most recent call last):
    File "/Users/useruser/Desktop/scrooge/pytrends-master/.venv/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
      main()
    File "/Users/useruser/Desktop/scrooge/pytrends-master/.venv/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/Users/useruser/Desktop/scrooge/pytrends-master/.venv/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/private/var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/pip-build-env-fjmx14v2/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "/private/var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/pip-build-env-fjmx14v2/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires
      self.run_setup()
    File "/private/var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/pip-build-env-fjmx14v2/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 253, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/private/var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/pip-build-env-fjmx14v2/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 145, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 73, in <module>
      ext_modules=cythonize(
    File "/private/var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/pip-build-env-fjmx14v2/overlay/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 965, in cythonize
      module_list, module_metadata = create_extension_list(
    File "/private/var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/pip-build-env-fjmx14v2/overlay/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 815, in create_extension_list
      for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
    File "/private/var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/pip-build-env-fjmx14v2/overlay/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 114, in nonempty
      raise ValueError(error_msg)
  ValueError: 'ruptures/detection/_detection/ekcpd.pyx' doesn't match any files
  ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/useruser/Desktop/scrooge/pytrends-master/.venv/bin/python3 /Users/useruser/Desktop/scrooge/pytrends-master/.venv/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /var/folders/b9/zxn96m217ml71fhycn_xsvx00000gn/T/tmpku27q7pu Check the logs for full command output.

Compare Two Plots

Hello there,

I wanted to use ruptures as a trend analysis tool to compare forecasted and histroical data - using the histroical data as a template for accuracy. Is it possible to compare two plots/objects and get correlation metrics?

Mac C library makes KernalCPD non-deterministic, but it is deterministic in Linux container

Repro case:

import ruptures as rpt
import numpy as np

new_list = [-0.0155, 0.0194, 0.0289, 0.0071, -0.0059, -0.0102, 0.0046, 0.0218, 0.0153, 0.0491, 0.016, 0.0365, 0.0388, 0.0516, 0.0222, 0.0019, -0.0418, 0.0, -0.0262, 0.0468, 0.0, 0.0311, 0.0341, -0.0, 0.0569, 0.0206, 0.0336, 0.0615]
trend_error = np.asarray(new_list)

results = set({})
for _ in range(10000):
change_points = (
rpt.KernelCPD(kernel="rbf", min_size=7)
.fit(trend_error)
.predict(pen=1.0)
)
results.add(len(change_points))

print(results)

When running this on a mac I'll get two different values in results : 2, 4
But when running this in a linux container it will consistently yield just 2

When running rpt.Pelt(model="rbf", jump=1, min_size=7) I repeatedly get a deterministic result (which matches the result in 2).

Some errors in the window method

ruptures/ruptures/detection/window.py

Line 258 in 86140d0

penalty (float): penalty value

Some minor errors in the predict method of the window algo: penalty arg should be pen and missing epsilon.

If that works for you, I can make a PR this weekend. Let me know. 🐈

Small window sizes throw errors

Hello!
I am using ruptures to detect change points in morphologic time series, e.g. ts.txt
In my original application, the CPD method works fine. Now I am testing the influence of reduced temporal resolution and therefore simultaneously reduced window size in the change point detection. However, this throws an error when width<10. It is not clear to me why this should not work. Could this be a bug, that some dependent value is calculated too small? Shouldn't the calculation of median deviation work down to window sizes of >=3?

My code using ts.txt is, e.g.:

import numpy as np
import ruptures as rpt
import matplotlib.pyplot as plt

ts1d = np.loadtxt('ts.txt')

winsize=10 # smallest value that works
# winsize=6 # throws an error

algo = rpt.Window(width=winsize, model='l1', min_size=1).fit(ts1d)
breakpoints = algo.predict(pen=1.0)

rpt.show.display(ts1d, [], breakpoints,figsize=plt.figaspect(0.5))
plt.show()

If I reduce the width (winsize), the error is:

  File "C:\Python36\lib\site-packages\ruptures\detection\window.py", line 262, in predict
    bkps = self._seg(n_bkps=n_bkps, pen=pen, epsilon=epsilon)
  File "C:\Python36\lib\site-packages\ruptures\detection\window.py", line 175, in _seg
    mode="wrap")
  File "C:\Python36\lib\site-packages\scipy\signal\_peak_finding.py", line 177, in argrelmax
    return argrelextrema(data, np.greater, axis, order, mode)
  File "C:\Python36\lib\site-packages\scipy\signal\_peak_finding.py", line 232, in argrelextrema
    axis, order, mode)
  File "C:\Python36\lib\site-packages\scipy\signal\_peak_finding.py", line 58, in _boolrelextrema
    raise ValueError('Order must be an int >= 1')
ValueError: Order must be an int >= 1

I would appreciate any hint, what might be the issue here. And thanks for the useful change point detection package!

It's now possible to install via conda. Update README

I created a conda-forge recipe and my PR is merged so you can install this package with conda link using

conda install -c conda-forge ruptures

You might consider to update the README accordingly.

Warn when missing values in the signal

Hi,

First, thanks for this very great lib !

Just a small suggestion, it would be very helpfull to warn when there is NaN/null values in the signal.

I had missing values in my signal, I get results but really unexpected chgt point detected. I realized after couple of hours of exploration that it was due to missing values in my signal !

Antoine

Circular Binary segmentation

Hi Charles, and thank you for this great tool. Currently, we do not have a robust implementation of CBS in python. Do you think this is a feasible thing to implement in ruptures?

Thanks a lot,
Roham

Weird behaviour correction when median(K) = 0

During some different tests, we experiment a weird behaviour with PELT (RBF cost function)When median(K) is equal to 0, the segmentation is clearly weird : each segment have an equal length (5).To experiment this you can run the following code

import ruptures
import numpy as np
curve= [0, 528, 503, 0, 0, 0, 541, 542, 0, 542, 0, 0, 0, 542, 
             500, 530, 0, 0, 515, 0, 536, 0, 0, 539, 518, 0, 0, 530, 0, 0, 
             503, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
             0, 0, 0]
curve = np.array(curve)
algo = ruptures.Pelt(model='rbf', min_size=1).fit(curve)
seg = algo.predict(pen=1)
ruptures.display(curve, seg)

Which leads to this segmentation :

Errors management for .predict() method of ruptures.detection.Window class

As I understand, the predict of the ruptures.detection.Window class will fail if the number of samples in the signal is lower than 1.5 x the width parameters of the __init__ method.

Python will raise the following error

~/Documents/env/lib/python3.6/site-packages/ruptures/detection/window.py in _seg(self, n_bkps, pen, epsilon)
    177         peak_inds_arr = np.take(self.inds, peak_inds_shifted)
    178         # sort according to score value
--> 179         _, peak_inds = unzip(sorted(zip(gains, peak_inds_arr)))
    180         peak_inds = list(peak_inds)
    181 

ValueError: not enough values to unpack (expected 2, got 0)

which is a bit ambiguous.

Did I miss something in the documentation ? Or a slight improvement in error management is feasible ?

pip install failing

Hi Oliver,

It seems to have not resolved somehow. Please see the attached screenshots.

Originally posted by @oboulant in #130 (comment)

Feature request: Cost function equivalent to R changepoint library cpt_meanvar

Hi,
apologies in advance if I've missed something obvious, I'm using ruptures more as a black-box user - and have only started to dabble with changepoint analysis relatively recently. Could you outline what's required to implement, or whether you might release support for a cost-function covering both mean and variance that would give similar results to the R cpt_meanvar?
I'm using PELT to examine some observations from tracing the performance of applications when they are in execution.

Thanks,
Andy

calculate summary statistics per change point

Hi Charles,

I recently opened in an issue using Binary Segmentation algorithm on offline electrical datasets. With pandas you helped me calculate delta time. Anyway I am still experimenting with good results and I was curious to ask if its possible to calculate (in addition to delta time) some summary statistics of the electrical demand per change point?

(i think this stuff is pretty cool)

For each month (July shown below) I can randomly sample a few (with numpy random & datetime) days, and apply the Binary Segmentation algorithm. In addition to hours (delta time per change point), is it possible to retrieve the mean/stand deviation value kW per change point?

I apologize for the long winded post here, as well as my at best novice python programming skills...

This is all the code used to produce the plots above.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import ruptures as rpt
import calendar

#read CSV file
df = pd.read_csv('https://raw.githubusercontent.com/bbartling/Building-Demand-Electrical-Load-Profiles/master/School%202013_2014%20KW.csv', 
                 index_col='Date', parse_dates=True)

#remove row of data where kW read zero
df = df[(df[['kW']] != 0).all(axis=1)]

#metric for plotting
maxy = df.kW.max()

#month of feb
july = df.loc[df.index.month.isin([7])]

#metric for ruptures Binary Segmentation
metric = n_bkps=2

print(july.describe())

#function for algorithm
def changPoint(df, dayNum, yTickMax, plotYesorNo):
    df = df.loc[df.index.day.isin([dayNum])]
    arr = np.array(df.kW)

    
    #Define Binary Segmentation search method
    model = "l2"  
    algo = rpt.Binseg(model=model).fit(arr)
    my_bkps = algo.predict(metric)

    # getting the timestamps of the change points
    bkps_timestamps = df.iloc[[0] + my_bkps[:-1] +[-1]].index

    # computing the durations between change points
    durations = (bkps_timestamps[1:] - bkps_timestamps[:-1])
    
    #hours calc
    d = durations.seconds/60/60
    d_f = pd.DataFrame(d)
    df2 = d_f.T
    print(df2)
    
    if plotYesorNo == 'yes':
      
        # show results
        rpt.show.display(arr, my_bkps, figsize=(17, 6))
        #plot metrics
        oneHrs = d_f.values[0][0]
        twoHrs = d_f.values[1][0]
        threeHrs = d_f.values[2][0]

        one = f'morning {round(oneHrs,1)} hours'
        two = f'high load {round(twoHrs,1)} hours'
        three = f'evening {round(threeHrs,1)} hours'

        d = df.index.day_name()[0]
        m = df.index.month_name()[0]

        title = f'Change Point Detection: Binary Segmentation Search Method {d} {m} {dayNum}'

        plt.title(title)
        plt.text(0, yTickMax/2, one)
        plt.text(0, yTickMax/2-10, two)
        plt.text(0, yTickMax/2-20, three)

        plt.ylim(0, yTickMax)
        plt.show()
        
    else:
        return df2

july.plot(figsize=(20, 10))
plt.ylim(5, maxy)

#create plots
changPoint(july, np.random.randint(low=1, high=30, size=1), maxy, 'yes')
changPoint(july, np.random.randint(low=1, high=30, size=1), maxy, 'yes')
changPoint(july, np.random.randint(low=1, high=30, size=1), maxy, 'yes')
changPoint(july, np.random.randint(low=1, high=30, size=1), maxy, 'yes')

I also put this Jupyter notebook file in the same git repo as the electrical data sets.

Thanks for anytime you have in response.
Ben

min_size

min_size is not working for me when using, for example:
algo = rpt.Window(width=10, model='l2', min_size = 40).fit(signal)
my_bkps = algo.predict(n_bkps=4)

I get segments < 40 even though min_size = 40
I am using a 1D time series - numpy array - of a stock's close prices (array length of 796) that were loaded from a CSV file into pandas dataframe such that:

signal = pandas.dataframe['close'].values

doc, Window-based change point detection - description of predict function

In the section "window-based change point detection" I noticed the following in window.py line 248:

    def predict(self, n_bkps=None, pen=None, epsilon=None):
        """Return the optimal breakpoints.
        Must be called after the fit method. The breakpoints are associated with the signal passed
        to fit().
        The stopping rule depends on the parameter passed to the function.
        Args:
            n_bkps (int): number of breakpoints to find before stopping.
            penalty (float): penalty value (>0)
            penalty (float): penalty value
        Returns:
            list: sorted list of breakpoints
        """

Should the arguments not be described as follows?

    def predict(self, n_bkps=None, pen=None, epsilon=None):
        """Return the optimal breakpoints.
        Must be called after the fit method. The breakpoints are associated with the signal passed
        to fit().
        The stopping rule depends on the parameter passed to the function.
        Args:
            n_bkps (int): number of breakpoints to find before stopping.
            pen (float): penalty value (>0)
            epsilon (float): reconstruction budget (>0)
        Returns:
            list: sorted list of breakpoints
        """

Add sponsor button

WDYT about adding a Sponsor button to this repo?

It's done by adding a funding file and enabling the right setting:
https://help.github.com/en/articles/displaying-a-sponsor-button-in-your-repository#displaying-a-sponsor-button-in-your-repository

Thank you again for the hard and appreciated work.

High RAM consumption

Hi all,

Thanks a lot for this library. It is great! However, I found that it takes a lot of RAM for large samples. For en example, Window CPD method requires about 7 GB RAM for 1D time series with 20k samples and 9 change points. And I suppose, that this leads to non-linear dependency of time from n_samples.

example setting the penalty parameter?

Is there an example about how to set the pen parameter for Pelt?

Can CostAR handle multidimensional data?

Hello, thanks a lot for your work. I have one question regarding using vector autoregressive cost to detect change points. I notice in the document the input signal is required to be 1-d for CostAR. However, when I use multidimensional signals, no error occurs and change points can still be detected. Does this mean that CostAR works as a vector autoregressive cost when the input signal is multidimensional? If not, is there any way that I can use vector autoregressive cost in ruptures?

Increase performance using a cached value

Do you thing that caching some values instead of calling the cost functions all the time could help to improve the estimators execution speed? That could be a parameter such as 'jump'.

Story time:
Some times ago I tried to use the Pelt Estimator on a really large datasets (from thousands to millions of elements) to determine stagnation periods on a time series data. It was kinda slow and eventually the change point detection wasn't really what I needed in my case.

In order to do what I wanted, I created a small library inspired by ruptures (I really learned a lot by reading the source code). And one day I tried to launch it on big datasets and it was also slow. I got the idea of doing a system of cache to not compute the cost function too many times and that helped me to reduce the execution time (more than 100 times faster) and allowed me to use my method on really big datasets.

Pull request of the caching on my library. The structure of the code is similar to ruptures

Btw, thanks for that amazing library !

Online changepoint detection (continued)

I've read through #3. Here it was mentioned that the sliding window search methods can be used for online detection. I've vaguely read some things suggesting that PELT and the kernel based search can be used for online detection. Is that true for the ruptures implementations? If yes, is there any interest in adding some useful utilities to ruptures to support this, namely some benchmarking metrics like time to detection/false alarm and ROC curves, as well as some documentation on online detection?

Link to documentation broken ?

Hello,

I am trying to access the documentation of the rupture library from the README link but cannot reach it.
Is the documentation still available somewhere ?

Cheers,
Ilyass

pw_linear return extra dimension?

Why is this the case? It's kind of unexpected and is counter to the other functions where the length of the signal and number of dimensions given returns a dataset that is the same size.

Detecting mean change points

Hello Charles,

Thanks for putting this library together. I am trying to familiarize myself with how the library works and tried a few basic examples. I am keen on detecting mean change points. However, not sure if I am setting any parameter incorrectly to get below output.

import ruptures as rpt
ts = np.array(22*[ 500.] + 13*[1100.]) # Tried a few other timeseries of actual data.
algo = rpt.Dynp(model='l2').fit(ts) # Tried changing to l1 model and other methods too.
bkps = algo.predict(n_bkps=2)
rpt.display(ts, rs)

Best,
Rahul