Giter Club home page Giter Club logo

Comments (6)

tailaiw avatar tailaiw commented on May 24, 2024

@FGG100y That's strange. I ran the following code which I believe is equivalent to what you described above, and it didn't give me any error.

Which version of ADTK are you using?

import numpy as np
import pandas as pd

from adtk.detector import ThresholdAD
from adtk.detector import QuantileAD
from adtk.detector import InterQuartileRangeAD
from adtk.detector import PersistAD
from adtk.detector import LevelShiftAD
from adtk.detector import VolatilityShiftAD
from adtk.detector import SeasonalAD
from adtk.detector import AutoregressionAD
from adtk.data import split_train_test


def get_detector(adname="ThresholdAD"):
    detectors = {"ThresholdAD": ThresholdAD,
                 "QuantileAD": QuantileAD,
                 "InterQuartileRangeAD": InterQuartileRangeAD,
                 "PersistAD": PersistAD,
                 "LevelShiftAD": LevelShiftAD,
                 "VolatilityShiftAD": VolatilityShiftAD,
                 "SeasonalAD": SeasonalAD,
                 "AutoregressionAD": AutoregressionAD,
                 }
    return detectors.get(adname)

def ad_detector(dname, train_data=None, test_data=None, **kwargs):
    Ad = get_detector(dname)
    ad = Ad(**kwargs)
    train_anoms = ad.fit_detect(train_data)
    test_anoms = ad.detect(test_data)
    return train_anoms, test_anoms

data = pd.Series(np.sin(np.arange(100)), index=pd.date_range(start="2020-02-02", periods=100, freq="D"))

s_train, s_test = split_train_test(data, mode=3, n_splits=2)
train_anoms, test_anoms = [], []
for train, test in zip(s_train, s_test):
    train_anom, test_anom = ad_detector(dname='SeasonalAD',
                                        train_data=data,
                                        test_data=data.squeeze(),
                                        c=1, side='both')
    # collect the results
    train_anoms.append(train_anom)
    test_anoms.append(test_anom)

from adtk.

FGG100y avatar FGG100y commented on May 24, 2024

@tailaiw Thank you for your reply. The adtk version: 0.5.2
I used the same syntheses data as yours, and it reported no error. So I believed it's something wrong with my data.
And this was how I deal with the data(preprocessing):

# replace the NaNs with the median deal to some extreme larger abnormal values
# if not replace the NaNs, adtk reported "NaNs between valid values were not allowed"
quantiles = data.quantile([0.01, 0.99]).values.flatten()
q_high, q_low = quantiles[1], quantiles[0]
data[data[fname.split('_')[-1]] < q_low] = NaN
data[data[fname.split('_')[-1]] > q_high] = NaN
data = data.replace(NaN, data.median())

The split-train-test timeseries:
ts_data_split_mode1

Am I missing something in adtk Docs, or there is something wrong with the data?

from adtk.

FGG100y avatar FGG100y commented on May 24, 2024

@tailaiw
And this was the data that I used in this case:
ts_debug.txt

from adtk.

tailaiw avatar tailaiw commented on May 24, 2024

@FGG100y
It looks the problem is related to the fact that your input is a Dataframe instead of a Series object. I will look into this. It is probably a bug. Thanks for catching this!

I noticed your data is univariate. So before we fix the problem, what you can do is putting your data in a Series instead of a single-column DataFrame. I replaced the synthetic data with your data (i.e. replacing the line of data generation with data = pd.read_csv("./ts_debug.txt", parse_dates=True, squeeze=True, index_col=0). It returns no error. If I load the data with option squeeze=False, i.e. loading the data into a DataFrame, it will hit the RuntimeError you mentioned.

from adtk.

FGG100y avatar FGG100y commented on May 24, 2024

@tailaiw
Like your suggestions^^, and I have solved my problems.
Thanks a lot.

from adtk.

tailaiw avatar tailaiw commented on May 24, 2024

@FGG100y We dived into the problem you mentioned and found the problem is as follows:

ADTK contains univariate models and multivariate models. The models you were using are all univariate models. By design, if a univariate model is applied to a DataFrame, it treats each column of the DataFrame as an independent time series.

If a model is trained with a Series and is applied to a DataFrame, ADTK will apply the model to each column independently and returns a concatenated DataFrame as output.

If a model is trained with DataFrame (say with columns "A", "B", and "C"), what happens on the backend is that ADTK trains 3 models respectively. If the model object is then applied to a DataFrame with the same column names, ADTK will match the trained models with columns automatically. We found this design convenient for the case where a certain type of model is applied to a large number of time series.

If a model trained by a DataFrame is applied to a Series or a DataFrame with different column names, ADTK will throw an error because it cannot find the matching. This is what caused the error you encountered (note that your training data is a DataFrame while your testing data is a Series because you used squeeze method).

This logic was not well tested or documented, and the error message was misleading. Thanks to your issue, we noticed this problem and fixed it. We just released a patch v0.5.4 to address this issue. If you see anything missed, please feel free to reopen this issue.

from adtk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.