Giter Club home page Giter Club logo

ts-ad-with-2d-caes's Introduction

TimeSeries-AnomalyDetection-with-2D-CAEs

Code Structure

The code is divided into 4 parts: Part 1: Create Encodings, Part 2: Train the Network, Part 3: Evaluate Residuals, Part 4: Performance Assessement

The project is designed to be used in the following way: In part 1 the user can choose one of the 6 available encodings and run the script to encode the training and testing data. In part 2 the network gets trained using the encoded data, which has been created in Part 1. In part 3 the residuals (=errors) of the encoding images from training and testing data are computed and saved. In part 4 the errors are used to compute the decision threshold and a ROC Curve gets plotted to analyze the model's performance against the ground truth.

PART 1 - Create Encodings

For each of the following encodings an individual script is available to create them:

  1. Gramian Angular Field (GAF)
  2. Markov Transition Field (MTF)
  3. Recurrence Plot (RP)
  4. Spectrogram (SP)
  5. Scalogram (SC)
  6. Gray Scale Images (GS)

In general the default parameters in each script have been selected as to produce encoding images of size 64x64, which is what the default network expects for training. Creating images of different size can for some encodings be non trivial. This is the case for SP, SC and GS. Make sure to understand the encoding mechanism to readjust parameters.

Encoding file sizes for image size 64x64:

  1. Training Data: 6.59 GB (except for GS: 5.93 GB)
  2. Testing Data: 2.34 GB (except for GS: 2.1 GB)

Creation times for each encoding based on default training and validation datasets: Reference Machine: Mac Book Pro 2018, 16 GB RAM, 2.2 GHz Intel Core i7

  1. GAF: Training set : 8 min
    Validation set : 2.5 min

  2. MTF: Training set : 12 min
    Validation set : 3.5 min

  3. RP: Training set : 11 min
    Validation set : 3 min

  4. SP: Training set : 1 min
    Validation set : 0.3 min

  5. SC: Training set : 33 min
    Validation set : 11 min

  6. GS: Training set : 0.3 min
    Validation set : 0.1 min

PART 2 - Train the Network

github-small

For basic modifications change the values of the argument parser variables.

  • Train a new model: If you want to train a completely new model then just make sure that mode='new_encoding' and run the main script. Also make sure to choose one of the available encodings. Here is an example:

argp = parser.parse_args(
['--path_data','../Part1_Encoding',
'--mode','new_training',
'--dataset','training',
'--cycles','500000',
'--conv_kernel_size_1','4',
'--conv_stride_1','2',
'--pool_kernel_size','2',
'--pool_stride','2',
'--nr_channels_1','32',
'--bottleneck_size','160',
'--batch_size','100',
'--batch_size_testing','50',
'--performance_eval_steps','10',
'--checkpoint_save_steps','10000',
'--encoding','GAF'])

In this case a new model is trained on GAF data. The model will perform 50'000 gradient descent steps using 100 encoding images in each iteration. The loss gets computed after every 10 iterations and a checkpoint of the graph is saved after every 10'000 iterations. To inspect the loss or the graph use tensorboard by running the following command in terminal:
tensorboard --logdir = < path to tensorboard summary >

  • Continue to train a model: If you want to continue to train a model whose checkpoint has been saved before, then just change mode='continue_training'. Note that all other parameters have to be equal to the ones saved in the checkpoint.

  • Test a model: If you want to visually inspect if the reconstructions are good enough then just change mode='testing'. Also choose to either inspect training or testing images. The batch_size_testing value determines how many images will be displayed. Here is an example:

argp = parser.parse_args(
['--path_data','../Part1_Encoding',
'--mode','testing',
'--dataset','training',
'--cycles','500000',
'--conv_kernel_size_1','4',
'--conv_stride_1','2',
'--pool_kernel_size','2',
'--pool_stride','2',
'--nr_channels_1','32',
'--bottleneck_size','160',
'--batch_size','100',
'--batch_size_testing','50',
'--performance_eval_steps','10',
'--checkpoint_save_steps','10000',
'--encoding','GAF'])

In this case 50 random images from the training data will be reconstructed and plotted.

Part 3 - Compute the Residuals

The Residuals.py script is almost identical to the main.py script of Part 2. The goal here is to compute the residuals of the training and testing reconstructions. Both are used in Part 4 to measure the model's performance. Note the graph parameters should not change from the ones saved in the checkpoint in Part 2.

  • Compute training residuals: In order to compute all the residuals stemming from the training data, the script has to be run twice. In the first run setting the argument parser variable part='part 1' and in the second run setting part='part 2' respectively. This was required due to limited RAM of the used machine. For both runs set the variable dataset='training'.

  • Compute testing residuals: In order to compute the residuals stemming from the testing data, the script must only be run once by setting dataset='testing'.

Part 4 - Model Performance Assessement

In order to assess the model's performance, make sure to have the ground truth saved in a specific folder and specify the directory when calling the function:
measure_performance( ... ,path_ground_truth= < path to ground truth >.csv)

Run the Performance.py script to see the roc curve.

github-small

ts-ad-with-2d-caes's People

Contributors

gabriel-rodriguez-garcia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.