Giter Club home page Giter Club logo

tsad-evaluator's Introduction

TSAD-Evaluator

This software package, publicly released under the MIT License, implements the customizable evaluation model for time series anomaly detection presented in the following paper:

"Precision and Recall for Time Series", Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, Justin Gottschlich, 32nd Annual Conference on Neural Information Processing Systems (NeurIPS'18), Montreal, Canada, December 2018. (https://arxiv.org/abs/1803.03639/)

Building

cd src
make clean
make

Running

There are two alternative ways to run TSAD-Evaluator:

./evaluate {-v} [-c | -t | -n] <real_data_file> <predicted_data_file>
./evaluate {-v} [-c | -t | -n] <real_data_file> <predicted_data_file> <beta> <alpha_r> <gamma> <delta_p>
<delta_r>

Here is a description of all command line options, inputs, and parameters:

-v : Produce verbose output.
-c : Compute classical metrics.
-t : Compute time series metrics.
-n : Compute numenta-like metrics.
<real_data_file> : File with real data labels.
<predicted_data_file> : File with predicted data labels. 
<beta> : F-Score parameter (relative importance of Recall vs. Precision).
         Positive real number, Default = 1, Most common = 1.
<alpha_r> : Relative weight of existence reward for Recall.
            Real number in [0 .. 1], Default = 0, Most common = 0.
<gamma> : Customizable overlap cardinality function for Precision&Recall.
          Values = {one, reciprocal, udf_gamma}.
          Default = one, Most common = reciprocal.
<delta_p> : Customizable positional bias function for Precision.
            Values = {flat, front, middle, back, udf_delta}.
            Default = flat, Most common = flat.
<delta_r> : Customizable positional bias function for Recall.
            Values = {flat, front, middle, back, udf_delta}.
            Default = flat, Most common = {flat, front, back}.

When no parameters are specified (like in the first usage above), then the following default values are used:

./evaluate {-v} [-c | -t | -n] <real_data_file> <predicted_data_file> 1 0 one flat flat 

To produce verbose output (i.e., to print the list of all real and predicted anomaly ranges), please use the -v option.

It is important to note that the use of -v is optional, whereas the metric option (-c or -t or -n) must always be specified.

When the -c option is used, then both anomaly intervals are represented as unit-size intervals (i.e, as points), and one of the following parameter settings are expected:

./evaluate -c <real_data_file> <predicted_data_file> <beta> 0 one flat flat 
./evaluate -c <real_data_file> <predicted_data_file> <beta> 1 x x x 

In the second usage above, x means that the value of this parameter doesn't really matter.

When the -t option is used, then both anomaly intervals are represented as ranges, and parameters can be customly set as required by the application. For example:

./evaluate -t examples/simple/simple.real examples/simple/simple.pred 1 0 reciprocal flat front

When the -n option is used, then the predicted anomaly intervals are represented as unit-size intervals (i.e, as points) and the real anomaly intervals are represented as ranges, and one of the following parameter settings are expected:

./evaluate -n <real_data_file> <predicted_data_file> 1 0 one flat front
./evaluate -n <real_data_file> <predicted_data_file> 0.5 0 one flat front
./evaluate -n <real_data_file> <predicted_data_file> 2 0 one flat front

The first mimics Numenta-Standard, the second mimics Numenta-Reward-Low-FP, and the third mimics Numenta-Reward-Low-FN, respectively.

Additional Usage Notes

  • <real_data_file> and <predicted_data_file> are CSV files with 0/1 anomaly labels that correspond to each timestamp. In v1.0, these files contain only the labels, simply assuming label=1 at line=t indicates the presence of an anomaly at timestamp=t. More sophisticated input file formats can be supported in future releases. For example input files, please see: examples/*/*.real and examples/*/*.pred.

  • An anomaly range is represented as a pair of timestamps <start,end>. In v1.0, we simply assume timestamps are integers >= 0. For example, <1,5> denotes the time period from 1 to 5, inclusive on both ends.

  • A collection of anomaly ranges is represented as a vector of anomaly ranges, ordered by ascending timestamps. For example, [<1,5>,<10,12>] denotes a vector of two anomalous ranges.

  • The application developer can provide user-defined functions for <gamma>, <delta_p>, and <delta_r>, if the pre-defined choices are not sufficient for his/her purpose. In order to do this, please select <udf_gamma> and/or <udf_delta> for the corresponding command line arguments above. Furthermore, please make sure to define these custom functions in evaluator.cpp by filling in the following templates provided in this file and rebuilding afterwards:

double evaluator::udf_gamma_def(int overlap_count, e_metric m)
double evaluator::udf_delta_def(timestamp t, timestamp anomaly_length, e_metric m)

References

Contact

tsad-evaluator's People

Contributors

tatbul avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.