arundo / adtk Goto Github PK

View Code? Open in Web Editor NEW

1.1K 25.0 143.0 24.05 MB

A Python toolkit for rule-based/unsupervised anomaly detection in time series

Home Page: https://adtk.readthedocs.io

License: Mozilla Public License 2.0

Python 100.00%

anomaly-detection time-series

adtk's Introduction

Anomaly Detection Toolkit (ADTK)

Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection.

As the nature of anomaly varies over different cases, a model may not work universally for all anomaly detection problems. Choosing and combining detection algorithms (detectors), feature engineering methods (transformers), and ensemble methods (aggregators) properly is the key to build an effective anomaly detection model.

This package offers a set of common detectors, transformers and aggregators with unified APIs, as well as pipe classes that connect them together into models. It also provides some functions to process and visualize time series and anomaly events.

See https://adtk.readthedocs.io for complete documentation.

Installation

Prerequisites: Python 3.5 or later.

It is recommended to install the most recent stable release of ADTK from PyPI.

pip install adtk

Alternatively, you could install from source code. This will give you the latest, but unstable, version of ADTK.

git clone https://github.com/arundo/adtk.git
cd adtk/
git checkout develop
pip install ./

Examples

Please see Quick Start for a simple example.

For more detailed examples of each module of ADTK, please refer to Examples section in the documentation or an interactive demo notebook.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update unit tests as appropriate.

Please see Contributing for more details.

License

ADTK is licensed under the Mozilla Public License 2.0 (MPL 2.0). See the LICENSE file for details.

adtk's People

Contributors

Stargazers

Watchers

Forkers

yangzch lulzzz avenkatagiri durbin-watson szeitlin mloning unlimited-bot-works littlehann vijayendra-g yangxiong0903 jingmouren jiadalee laeeth sauln leo23 fagan2888 faisal-w gridl amitpanda123 volpatto burakyilmaz321 chaoshengt tjslezak yangyu-1 mairenzeevenhoven ajdapretnar rantonov 3nchantment avinashshah099 nishantsbi blue-infosec marsywhuang rbs392 zeta1999 odnura zhangbk920209 gcxgracie abhimanyu3-zz dinaabdelrahman greeros reumar ks93 whiteeyehansel stjordanis liaoyuting27 exp-time-series-tools bezova tadeze ggarrett666 genana llonchj naiveghost ahanmr gwmdev poppingcode ipsita-panda-by cbuahin greitzmann engdeep rishirelan sbhadade abv-hub jdjh animeshgoyal9 akashmavle5 rpotnuru fdoperezi arunbaruah gyyixr 0xthreebody dnnyyq ai-ml-cv joalmjoalm earthgecko nicolas-is-nic petitlepton filipecavalcanti1415 sakastlord jmishra01 kethansarma cslele mayvid9 likun1234 feiyunamy hydrogeohc hughkinnear opentld xujin1982 valeman jaqueszanon williamhsu17 benwaldner snorkeldepth akravcuk minyangp-ou overfittingstudyroom prudens kaistha23 dehuazhu fnhweiqs

adtk's Issues

Ways to Ignore Known Outliers

Hello Arundo Team,
I'm wondering if you have any inputs on this question.
I've been using ADTK on some highly seasonal (freq=7) e-commerce data. Now, e-commerce is heavily influenced by holidays. As you can see in the picture attached, there are two large spikes around Thanksgiving and Christmas.

The problem is that I'm not so interested in the holiday spikes, but the smaller outliers in the data. In a way, the two large peaks "drowns" out all other outlier effects. (Assuming something like Quantitle Detector is used).

One of the solution I've been testing is calculating weights for each holiday, then calculating a "fake" number by removing the holiday effects. But such solution seems to be outside the scope of ADTK.

Is there anything inside the ADTK package that might help address this issue? I've tried using DoubleRollingAggregate on the deseasonalize data, but it doesn't seem to help. I can also shorten the time window, but seems like a less than ideal solution.

Have you encounter similar problems before? Any guidance here would be greatly appreciated.

Thank you

Edit:
Attached some data
data.xlsx

runtime error

When running ClassicalDecomposition on the time series data (10 seconds interval), there is an error:

RuntimeError: Series does not follow any known frequency (e.g. second, minute, hour, day, week, month, year, etc.

The time series data has DateTime index.

How to solve the error?

Regards

Theoretical References and Citation

Hello,
I am working on my thesis on anomaly detection on electric grid timeseries data. I am using ADTK as one of the method to detect outliers in the data. I wanted to know if it is possible to get some theoretical references on methods used for detectors, transformers, aggregators, pipeline and pipnet. I would also like to cite ADTK project but could not find any citations. It would be great to have Bibtex entry for citation.
Thank You

[question] Deployment approaches for ADTK

Hi, I have a question about the deployment options I may have with ADTK. I hope it's suitable to ask here. I am enjoying the library a lot, although I'm quite new to this area admittedly. I'm looking to develop an anomaly detection pipeline for network monitoring data. I'd like to have this deployed in an 'online' way such that the model is capable of evolving as new data arrives. I know this is possible with TensorFlow Extended (TFX) and this i quickly becoming the de facto deployment tool for TF models. Is ADTK compatible with tools such as TFX? Is there a recommended deployment approach for 'online' AD with ADTK? Any advice on this would be really appreciated. If someone could point me in the right direction to uncover other tools that might help. I believe this is a common question for using ADTK in production scenarios. Thanks.

Default expand of `expand_events`

Function expand_events does not have default expand values. It makes sense to make it 0, so that the API is easier to use for the case where only one side needs expansion.

SeasonalAD fit_detect failing

Hi! I'm working with monthly data from 2006 to 2019, and I wanted to work with SeasonalAD, but it fails with "ValueError: The time steps are not constant." even after validating the series.

Code

from adtk.detector import SeasonalAD
from adtk.data import validate_series

times = validate_series(times)
seasonal_ad = SeasonalAD(c=3.0, side="both")
anomalies = seasonal_ad.fit_detect(times)

Output

ValueError                                Traceback (most recent call last)
<ipython-input-15-a29b7fb737cd> in <module>
      8     #anomalies = iqr_ad.fit_detect(times)
      9     seasonal_ad = SeasonalAD(c=3.0, side="both")
---> 10     anomalies = seasonal_ad.fit_detect(times)
     11 
     12 

~\Anaconda3\envs\venis\lib\site-packages\adtk\_detector_base.py in fit_predict(self, ts, return_list)
    245 
    246         """
--> 247         self.fit(ts)
    248         return self.detect(ts, return_list=return_list)
    249 

~\Anaconda3\envs\venis\lib\site-packages\adtk\_detector_base.py in fit(self, ts)
    150 
    151         """
--> 152         self._fit(ts)
    153 
    154     def predict(

~\Anaconda3\envs\venis\lib\site-packages\adtk\_base.py in _fit(self, ts)
    152         if isinstance(ts, pd.Series):
    153             s = ts.copy()  # type: pd.Series
--> 154             self._fit_core(s)
    155             self._fitted = 1
    156         elif isinstance(ts, pd.DataFrame):

~\Anaconda3\envs\venis\lib\site-packages\adtk\detector\_detector_1d.py in _fit_core(self, s)
   1154     def _fit_core(self, s: pd.Series) -> None:
   1155         self._sync_params()
-> 1156         self.pipe_.fit(s)
   1157         self.freq_ = self.pipe_.steps["deseasonal_residual"]["model"].freq_
   1158         self.seasonal_ = self.pipe_.steps["deseasonal_residual"][

~\Anaconda3\envs\venis\lib\site-packages\adtk\pipe\_pipe.py in fit(self, ts, skip_fit, return_intermediate)
    891                 results.update({step_name: step["model"].predict(input)})
    892             else:
--> 893                 results.update({step_name: step["model"].fit_predict(input)})
    894 
    895         # return intermediate results

~\Anaconda3\envs\venis\lib\site-packages\adtk\_transformer_base.py in fit_predict(self, ts)
     94 
     95         """
---> 96         self.fit(ts)
     97         return self.predict(ts)
     98 

~\Anaconda3\envs\venis\lib\site-packages\adtk\_transformer_base.py in fit(self, ts)
     47 
     48         """
---> 49         self._fit(ts)
     50 
     51     def predict(

~\Anaconda3\envs\venis\lib\site-packages\adtk\_base.py in _fit(self, ts)
    152         if isinstance(ts, pd.Series):
    153             s = ts.copy()  # type: pd.Series
--> 154             self._fit_core(s)
    155             self._fitted = 1
    156         elif isinstance(ts, pd.DataFrame):

~\Anaconda3\envs\venis\lib\site-packages\adtk\transformer\_transformer_1d.py in _fit_core(self, s)
    709         # get seasonal freq
    710         if self.freq is None:
--> 711             identified_freq = _identify_seasonal_period(s)
    712             if identified_freq is None:
    713                 raise Exception("Could not find significant seasonality.")

~\Anaconda3\envs\venis\lib\site-packages\adtk\transformer\_transformer_1d.py in _identify_seasonal_period(s, low_autocorr, high_autocorr)
    856     # check if the time series has uniform time step
    857     if len(np.unique(np.diff(s.index))) > 1:
--> 858         raise ValueError("The time steps are not constant. ")
    859 
    860     autocorr = acf(s, nlags=len(s), fft=False)

ValueError: The time steps are not constant.

Checks

Time series is a pandas Series.
Index is a pandas DatetimeIndex and is monotonically increasing.
No duplicated or missing timestamps.
Index inferred freq is "MS".
Data is float64.
ADTK v0.6.2

STLDecomposition Implementation

Hello,
First of all thank you for open sourcing the package; it is excellently written.

One potential issue I've noticed: the STL Decomposition seems to be implemented using a rolling window method for detrending data, not the Loess method. Here is a code snippet from transformer_1d.py:

def _remove_trend(self, s):
        s_trend = s.rolling(
            window=(self.freq_ if self.freq_ % 2 else self.freq_ + 1),
            center=True,
        ).mean()
        return s - s_trend

Perhaps I'm missing something? The current implementation seems to be what Hyndman calls Classical Decomposition, and not STL.
There is a python library that does STL decompose and it is relatively lightweight. And if what I'm describing is indeed the behavior, can I work on implementing STL with Loess?

Any feedback would be greatly appreciated,
Thank you

Request for a better form of pipenet step (dict instead of list)

The pipenet steps return a list of dictionaries with the key-value pairs indicating the start, end and the method/model used. That means that to edit any particular step in the pipeline, one has to use index of the list to query the step and then modify it.

One possible solution can be is to use the name as the key, and the values can be a dictionary with keys 'model' and 'input'. It might be more user-friendly to identify and query different nodes.

In the attached image, the pipenet summary can be
{'regression_residual':{'model': <value>, 'input':original}, ...}

Seasonal Decomposition for Multivariate time series

I fit seasonal detector and transformer to a multivariate time series following the documentation.

https://adtk.readthedocs.io/en/stable/api/transformers.html#adtk.transformer.ClassicSeasonalDecomposition

https://adtk.readthedocs.io/en/stable/notebooks/demo.html#SeasonalAD

When I fit a pandas Dataframe with several columns (multivariate time series) to these detector/transformer, the seasonal_ component does not exist anymore.

Would it be possible to get a seasonal_ dataframe in that case ?

adtk.visualization.plot can't work

Hello,

I am new learner to ADTK and trying to run sample codes bellow:

import pandas as pd
from adtk.visualization import plot
from adtk.data import validate_series

df = pd.read_csv("E:/Codes/python/Book1.csv", index_col="TestTime", parse_dates=True, squeeze=True)
GP1_train = df["GP1"]
GP1_train = validate_series(GP1_train)
print(GP1_train)
plot(GP1_train)

I thought it will pop out a graph but there is no graph after the code exicuted without any error reported. I am not sure what is the problem and hopes anyone can help on this issue. Thanks very much for your help.

Let's use type hints!

We like to start using type hints for better practice of python programming. For a first-time contributor, this is probably a nice starting point, as you will go through every part of the code base and familiarize yourself with the code structure.

To-do's:

Add type hints to all functions.
Modify docstrings accordingly, so sphinx-autodoc will automatically grab type info from type hints.
Add unit tests (with mypy?) for type checking.

Have fun!

Adding a third example to DoubleRollingAggregate that shows Spikes

Hello,
The difference between PersistAD / LevelShiftAD, while explained in the documentation, could use a bit more clarification. This is just a personal opinion, please check if others feel the same if possible.
If I'm reading the doc and code correctly, both PersistAD / LevelShiftAD are implemented with DoubleRollingAggregate as transformers, but PersistAD has one window fixed to length of 1, while the other window can be controlled by a window variable. Whereas for LevelShiftAD, both window in the DoubleRollingAggregate are allow to vary via the window variable. Also somewhat importantly, LevelShiftAD only looks at median, which allows it to ignore spikes. Does this sound right?

I propose one of the following changes:

Leave a note in either the PersisAD section or the LevelShiftAD section that explicitly states the difference between the two implementations.
This is my vote: we could add another example to the DoubleRollingAggregate section. You already have an example setup under PersisAD, we could just modify the code a bit and add it in. IMO, this would highlight the window variable in DoubleRollingAggregate, and indirectly lead to a better understanding of PersistAD / LevelShiftAD.

Please let me know your thoughts! I'm happy to open a PR if need be.
Thank you

Deprecation from statsmodels

You are using statsmodels 0.11, 'freq' is deprecated in the ClassicSeasonalDecomposition() function

Quickstart lacks info on required data files

Hi
in the quickstart guide you point ot the nyc taxi dataset on the Numenta Anomaly Benchmark but do not provide the links to the files required (or transformations to generat them). I have located training.csv as NAB/data/realKnownCause/nyc_taxi.csv but have not located known_anomalies.csv
Thanks

Increase test coverage

The current coverage is 75%, which is low and needs to increase.

Anomaly between two series

Hi guys,
Not sure where else to ask this question, but does anybody know the best way to determine an anomaly between two series?
For example, if two thermometers are supposed to read the same data:
thermometer1 : [10 10 20 20 30 30]
thermometer2: [10 10 20 20 30 30]
But over time thermometer 2 becomes inaccurate:
thermometer1: [10 10 20 20 30 30]
thermometer2: [10 10 20 25 35 40]

Does anybody know how you could pick up something such as this?
Thanks for helping a guy out!

Include code format information in README and documentation

It would be helpful to include information regarding the style and format of the code:

choice of formatter and code style
choice of linter

Usage Date Format for ADTK

Hi,

I have multivariate time series data set which I want to use for binary classification with it. My data set has more than %90 -> 0 values, therefore I thought I can use ADTK. I am waiting output like PersistAD.

According to below data set part, the first column for "time". When I try to import my data set like that -> s = pd.read_csv('./data/price_long.csv', index_col="Time", parse_dates=True, squeeze=True) it gives error (TypeError: Index of time series must be a pandas DatetimeIndex object.) . I tried to convert Datetime but I got that error -> ValueError: time data '0' does not match format '%Y%m%d' (match) .

How can I solve this problem? Is it possible to use time as it is? Thanks.

time series1 series 2 series 3
0 0.708849 0.318052 159377.0 1
1 0.728374 0.305667 162063.0 0
2 0.728374 0.305667 162063.0 0
3 0.728374 0.305667 162063.0 0
4 0.728374 0.305667 162063.0 0

Mutivariate anomalie detection.

Is there any possibility if we have 5 data points in a data frame(d1,d2,d3,d4,d5) and if any data point is an anomaly then which data point caused the anomaly and if we can assign some score like d5 was responsible 50% d3 is 30% etc.

Thanks!

Floating Anomaly Score for VolatilityShiftAD

Hello,
I am using VolatilityShiftAD to detect anomalies in one variable. The result of calling the detect method for the VolatilityShiftAD can be nan, 0.0 or 1.0.

Is it possible to retrieve proper anomaly scores from the detector (e.g anomaly score of 0.5624543)? Could not find any solution for that.

Thanks for your help!

Question: unequally spaced timeseries

First, big thanks for a nice library! Very useful!

I am trying to follow your Quick Start for SeasonalAD, but I am encountering a problem. My timeseries seem be to unequally spaced (e.g. 09:05, 09:15, 9:30, 9:55). Hence the SeasonalAD complains:
RuntimeError: Series does not follow any known frequency (e.g. second, minute, hour, day, week, month, year, etc.
How to overcome this?
I have tried rounding my series to 15min, removing duplicates and resampling.

s_train.index = s_train.index.round('15min')
s_train = s_train[~s_train.index.duplicated()]
s_train = s_train.asfreq('15min')

Obviously nothing worked. Any ideas how to solve this? I wish to retain as much granularity as possible.

reference equation behind LevelShiftAD [question]

I'd like to manually validate the equation for LevelShiftAD, does this exist in a friendly way somewhere? Would be great to have this as a reference also for the other detector functions

Warning/note in docstrings on NaN in results

Some models (e.g. those including moving average operation) may return series with NaN on the two ends. We should add a warning or note in the docstrings (and hence docs) to highlight this.

Q: Seasonal Anomaly doesn't consider months

Sorry to bother you again. I am experimenting with SeasonalAD() and it just looks like it cannot detect some obvious seasonal patterns. I have traffic data for 3 years, one measurement per hour. I tried different parameters (c=5, c=10, trend=True, freq=24 (day), freq=720 (month), freq=8760 (year)), but nothing seemed to help - every time I get anomalies for summer months when traffic increases due to tourism. Since the increase is seasonal, I wonder why doesn't SeasonalAD() consider this.

Thanks!

Plot of detected anomalies shows too many anomalies in summer months.

Add requirement.txt

Hi, I'm trying to build a conda-forge package out of this library but I've some troubles. Do you mind to ad a requirements.txt as this?
If neede I can eventually make a PR.

the ReasonalAD has Misjudgment

hey，i have a problem
then i use ReasonalAD to judge time-series data, he tell me error node(True) now.
but, ten mins later im run again
he tell me the node is right
im perplexed
so, how to handle it?

RuntimeError: the model must be trained first

First of all, thx for the great tool ^^

here's the code that produce this RuntimeError:

tmdl.py

from adtk.detector import ThresholdAD
from adtk.detector import QuantileAD
from adtk.detector import InterQuartileRangeAD
from adtk.detector import PersistAD
from adtk.detector import LevelShiftAD
from adtk.detector import VolatilityShiftAD
from adtk.detector import SeasonalAD
from adtk.detector import AutoregressionAD

def get_detector(adname="ThresholdAD"):
    detectors = {"ThresholdAD": ThresholdAD,
                 "QuantileAD": QuantileAD,
                 "InterQuartileRangeAD": InterQuartileRangeAD,
                 "PersistAD": PersistAD,
                 "LevelShiftAD": LevelShiftAD,
                 "VolatilityShiftAD": VolatilityShiftAD,
                 "SeasonalAD": SeasonalAD,
                 "AutoregressionAD": AutoregressionAD,
                 }
   return detectors.get(adname)

\# using adtk anomoly detectors
def ad_detector(dname, train_data=None, test_data=None, **kwargs):
    Ad = get_detector(dname)
    ad = Ad(**kwargs)
    train_anoms = ad.fit_detect(train_data)
    test_anoms = ad.detect(test_data)
    return train_anoms, test_anoms

I wrote these functions to help me quickly doing some experiment with different detectors by changing the detector's name in the main() function. That's what I thought. --!

main.py

...(functons read the data)

s_train, s_test = split_train_test(data, mode=split_mode, n_splits=n_splits)
train_anoms, test_anoms = [], []
for train, test in zip(s_train, s_test):  # the Error show up in this for loop
        train_anom, test_anom = tmdl.ad_detector(dname='SeasonalAD',
                                                 train_data=train,
                                                 test_data=test.squeeze(),
                                                 c=1, side='both')
        # collect the results
        train_anoms.append(train_anom)
        test_anoms.append(test_anom)

When ran this piece of code, it reported RuntimeError: the model must be trained first.

Last but not least, when I followed the Quick Start, the machine did not complain anything.

Any help would be appreciated.

Request for better summarization of pipenet steps

The pipe summary is a list of dictionaries. The entities in the list do not necessarily represent the order of connections and hence It can be challenging to navigate through the list. The visualization is certainly very helpful in that regards.

However, it would be better to use keras like model.summary.

Plots don't show up

Hi,
I can't seem to show the plots made by adtk.visualization, and I don't know why. Here's my code :
`
import pandas as pd
from adtk.detector import ThresholdAD
from adtk.data import validate_series
from adtk.visualization import plot

s = pd.read_csv('../STL/airline-passengers.csv', index_col="Month", parse_dates=True, squeeze=True)
s = validate_series(s)
threshold_ad = ThresholdAD(high=30, low=15)
anomalies = threshold_ad.fit_detect(s)
plot(s, anomaly_pred=anomalies, ts_linewidth=2, ts_markersize=3, ap_markersize=5, ap_color='red', ap_marker_on_curve=True)
`
Thanks for your help :)

ValueError: Time series must have a monotonic time index.

Code is as follows:

from adtk.transformer import ClassicSeasonalDecomposition
s_transformed = ClassicSeasonalDecomposition().fit_transform(s).rename("Seasonal decomposition residual")
plot(pd.concat([s, s_transformed], axis=1), ts_markersize=1);

my data frame has multiple numeric ( float & int columns with date as index ).

and I am getting following error:

ValueError Traceback (most recent call last)
in
1 from adtk.transformer import ClassicSeasonalDecomposition
----> 2 s_transformed = ClassicSeasonalDecomposition().fit_transform(s).rename("Seasonal decomposition residual")
3 plot(pd.concat([s, s_transformed], axis=1), ts_markersize=1);

~/jbooks/notebooks/lib/python3.6/site-packages/adtk/_transformer_base.py in fit_predict(self, ts)
94
95 """
---> 96 self.fit(ts)
97 return self.predict(ts)
98

~/jbooks/notebooks/lib/python3.6/site-packages/adtk/_transformer_base.py in fit(self, ts)
47
48 """
---> 49 self._fit(ts)
50
51 def predict(

~/jbooks/notebooks/lib/python3.6/site-packages/adtk/_base.py in _fit(self, ts)
172 # fit model for each column
173 for col in df.columns:
--> 174 self._models[col].fit(df[col])
175 self._fitted = 2
176 else:

~/jbooks/notebooks/lib/python3.6/site-packages/adtk/_transformer_base.py in fit(self, ts)
47
48 """
---> 49 self._fit(ts)
50
51 def predict(

~/jbooks/notebooks/lib/python3.6/site-packages/adtk/_base.py in _fit(self, ts)
152 if isinstance(ts, pd.Series):
153 s = ts.copy() # type: pd.Series
--> 154 self._fit_core(s)
155 self._fitted = 1
156 elif isinstance(ts, pd.DataFrame):

~/jbooks/notebooks/lib/python3.6/site-packages/adtk/transformer/_transformer_1d.py in _fit_core(self, s)
684 s.index.is_monotonic_increasing or s.index.is_monotonic_decreasing
685 ):
--> 686 raise ValueError("Time series must have a monotonic time index. ")
687 # remove starting and ending nans
688 s = s.loc[s.first_valid_index() : s[::-1].first_valid_index()].copy()

ValueError: Time series must have a monotonic time index.

Any help is highly appreciated...

Support time delta as str in `expand_events`

Function expand_events currently only support time delta as pandas.Timedelta object. It may be difficult to some users who are not advanced in pandas. It would be better if it supports str like "1d", "5min", etc.

Saving the Model

is there any plan for next version of AD?

AD is a wonderful tool for time series anomaly detection. There are so many new methods applied in time series AD, so the authors have some plans in the next version?

Non binary result for Anomaly detector

Is there any easy way for returning non-binary result for anomaly detector?
For example how to make pipeline for QuantileAD(high=0.99, low=0.01) there results will be presented as {-1,0,1}?
I tried this way:

 
def Waprice(df):
    return df["waprice"]

steps = {
    "waprice_1": {
        "model": CustomizedTransformerHD(transform_func=Waprice),
        "input": "original"
    },
    "waprice_2": {
        "model": CustomizedTransformerHD(transform_func=Waprice),
        "input": "original"
    },
    "waprice_neg": {
        "model": QuantileAd(low=0.01),
        "input": "waprice_1"
    },
    "waprice_pos": {
        "model": QuantileAd(high=0.99),
        "input": "waprice_2"
    },
    "final": {
        "model": # ?,
        "input": # ?
    }
}
pipeline = Pipenet(steps)

But I don't know how I can finalize waprice_neg and waprice_pos.
Thank you.

Support binary series as input to `expand_events`

Currently, expand_events only supports an event list as input. It could be helpful if it may also support binary series.

Allow detectors to work on a dataframe with 1 target and multiple passive columns

Problem: I want to map back the labels found by an adtk detector onto my original dataframe. This works via a simple pd.merge as long as there's only one event at each timestamp, but I've got a use case where there are multiple events at a given timestamp, only some of which are anomalous, and then the labels don't make any sense.

Proposed solution: Could we make it easy to specify which column(s) we want to predict on, and exclude the others, when detecting on a dataframe?

Support for Python 3.5 to allow deployment on GCP ML Engine runtime version 1.14

Would like to use ADTK with GCP ML Engine but their current runtime framework (1.14) only supports Python 3.5. ADTK specifies Python >=3.6, so what are the options for making compatible with Python 3.5 or do we need to wait until GCP ML Engine runtime supports Python 3.6?

Series name or dataframe columns of the output of split_train_test

split_train_test returns a list of pandas Series or DataFrame objects with new series names or column names such as "train_0", "A_test_3", which may create difficulties for postprocessing. It would be better if the output is a dict object where keys are split id but series names and column names are kept unchanged.

What is c?

Can you please explain what exactly is c (float, optional) – Factor used to determine the bound of normal range based on historical interquartile range. Default: 3.0. ?
And is it the same for all functions that use it?
Thank you.

Does it returns an anomaly score ?

Do the detectors of ADTK returns anomaly score ? If yes, then using which parameters I can get them ?

Can it save combination of ADs as a model ?

I take a time series and run the following algorithms from ADTK on the series :

Seasonal AD
Persist AD
LevelShift AD

Now I combine the results where if there is any anomaly detected using any of the algorithm I term that instance ( day /hour ) as an anomaly . Now this task needs to be performed everyday , so that I can get the anomalies on the previous day.

Consider the following example :
Let's say today is 14th March 2020
I have a time series from 1st March 2018 at a daily granularity .
Now I run the above algorithms and combine the results till 13th March 2020, I am interested in finding whether there was an anomaly on 13th or not and similarly this will be a task on a daily basis where I will check for 14th Mar , then 15th Mar and so on for each day being an anomaly or not.

My question is can I save the initial thing as a model and have the option of querying the model everyday by giving last days data ( even if I have to give the complete time series that is still okay )
But can I output this as a model ?

Please ask if any clarification is required.

Question: How to simulate a "NotAggregator" in pipenet?

I've got data that I'd like to exclude from anomaly label if certain criteria are met (like a list of supervised labels for excluding incorrectly detected anomalies), regardless of the other detector results. I thought I'd implement it as a separate detector in a pipenet, then aggregate the results.

This would allow me to override other detectors with an aggregation, but can't come up with a way to do so with the two existing aggregators (And/Or) so would need a "NotAggregator".

Does this make sense, or has anyone tried something like this in the past?

Here's an example of what I was thinking the pipenet would look like:

Request for adding an API to "insert" into pipenet

Using built-in features for pipes is great. But it would be helpful to have an ability to insert a custom step/node in the workflow. For instance, in the workflow shown in the attached image, if I want to add another pre-processing step after abs_residual, then one possible way,

First need to navigate the pipenet summary (via steps_),
Find the appropriate index for the corresponding node
Update the corresponding connections to which the new node wishes to attach to/from.

It would be helpful to have an API that can easily add custom nodes/models/steps to this pipenet by providing the new method, the name of the new method and the from/to connections in the pipeline. Referring to the attached image, inserting a node can probably be something like this -
insert_node(method=CustomMethod(), name='custom', from='abs_residual', to='iqr_ad'