netmanaiops / omnianomaly Goto Github PK

View Code? Open in Web Editor NEW

693.0 11.0 189.0 107.11 MB

KDD 2019: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network

License: MIT License

Python 100.00%

code dataset

omnianomaly's Introduction

OmniAnomaly

Anomaly Detection for Multivariate Time Series through Modeling Temporal Dependence of Stochastic Variables

OmniAnomaly is a stochastic recurrent neural network model which glues Gated Recurrent Unit (GRU) and Variational auto-encoder (VAE), its core idea is to learn the normal patterns of multivariate time series and uses the reconstruction probability to do anomaly judgment.

Getting Started

Clone the repo

git clone https://github.com/smallcowbaby/OmniAnomaly && cd OmniAnomaly

Get data

SMD (Server Machine Dataset) is in folder ServerMachineDataset.

You can get the public datasets (SMAP and MSL) using:

wget https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip

cd data && wget https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv

Install dependencies (with python 3.5, 3.6)

(virtualenv is recommended)

pip install -r requirements.txt

Preprocess the data

python data_preprocess.py <dataset>

where <dataset> is one of SMAP, MSL or SMD.

Run the code

python main.py

If you want to change the default configuration, you can edit ExpConfig in main.py or overwrite the config in main.py using command line args. For example:

python main.py --dataset='MSL' --max_epoch=20

Data

Dataset Information

Dataset name	Number of entities	Number of dimensions	Training set size	Testing set size	Anomaly ratio(%)
SMAP	55	25	135183	427617	13.13
MSL	27	55	58317	73729	10.72
SMD	28	38	708405	708420	4.16

SMAP and MSL

SMAP (Soil Moisture Active Passive satellite) and MSL (Mars Science Laboratory rover) are two public datasets from NASA.

For more details, see: https://github.com/khundman/telemanom

SMD

SMD (Server Machine Dataset) is a new 5-week-long dataset. We collected it from a large Internet company. This dataset contains 3 groups of entities. Each of them is named by machine-<group_index>-<index>.

SMD is made up by data from 28 different machines, and the 28 subsets should be trained and tested separately. For each of these subsets, we divide it into two parts of equal length for training and testing. We provide labels for whether a point is an anomaly and the dimensions contribute to every anomaly.

Thus SMD is made up by the following parts:

train: The former half part of the dataset.
test: The latter half part of the dataset.
test_label: The label of the test set. It denotes whether a point is an anomaly.
interpretation_label: The lists of dimensions contribute to each anomaly.

concatenate

Processing

With the default configuration, main.py follows these steps:

Train the model with training set, and validate at a fixed frequency. Early stop method is applied by default.
Test the model on both training set and testing set, and save anomaly score in train_score.pkl and test_score.pkl.
Find the best F1 score on the testing set, and print the results.
Init POT model on train_score to find the threshold of anomaly score, and using this threshold to predict on the testing set.

Training loss

The figure below are the training loss of our model on MSL and SMAP, which indicates that our model can converge well on these two datasets.

omnianomaly's People

Contributors

Stargazers

Watchers

Forkers

yangspeaking snowisland1 pratyushdeka zhouhaocomeon1 ducnx yelianjin leonnewton hongminwu huoyy1209 hdy-daniel sxjpage gp-yuege isenilov xinding136 j12138 marine01 kcrics crazybirl tungk ducdh1210 caiguajy 911steven willqq taomingming lianghaoran shangxuepku ming-sun booleer robertleeprice nsechb luchang66 khm915 mercuriuz xiangwenjun89 victorsungminyou xbingsun zhouc-adrem pyio michrow sajavadi imcom liuchen-er xlqiang tyronexie muleina yfamy123 kcacozhang caoyuanpu divfor julyuan alacrity2001 valeman hdjkfhkj greitzmann flynnoct wangss97 application-user2 gyyixr zhangdabao96 jxzhangjhu agnes-yang ohou-csu arunvellat mydre kapitsa2811 ryanliangy floricaaa viking714 gcxgracie tuanhiep chrisby jingmouren jinyang88 hongjea-park cfeng783 wibruce freshklauser parksoy 1170300409 dxdezhanghao shliujing yidancai bmsknight zhangchuangnankai huilin-zhu motto1314 spandanchoudhary hyydrra yiyg510 cchallu hhhercules vishalbelsare hoanglam-novobi changhwan-joe fancy1573 zhehuizhang cse-msstate changhwanjoe mortal12138 liangminghan0228

omnianomaly's Issues

How I run it using Nvidia cuda image

Run OmniAnomaly using Nividia cuda9 image on Ubuntu

Prepare your machine: install nvidia-docker2

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt update

sudo apt install -y nvidia-docker2

sudo systemctl restart docker

docker pull nvidia/cuda:9.0-cudnn7-devel

docker run -it --gpus all -t nvidia/cuda:9.0-cudnn7-devel bash

Create the env to run OmniAnomaly (in the container)

apt-get install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \
libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev \
libgdbm-dev libnss3-dev libedit-dev libc6-dev

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

bash Miniconda3-latest-Linux-x86_64.sh

source .bashrc

conda create -n tf12 python=3.6

conda activate tf12

apt install -y unzip git

export PYTHONIOENCODING=utf8 && export CUDA_HOME=/usr/local/cuda && export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH

ln -f -s /usr/local/cuda/lib64/stubs/libcuda.so.1 /usr/local/cuda/lib64/libcuda.so.1

Done. Follow the steps in OmniAnomaly ReadMe.

References

Is there any way to train OmniAnomaly for multiple servers?

Is there any way to train OmniAnomaly for multiple servers? #12

A simple Question about SMD Dataset

Thanks for you disclose SMD dataset.
I want know why former half part of dataset for training don’t have labels? Does this mean that all data in train dataset is normal？

libcublas.so.9.0: cannot open shared object file:No such file or directory. Failed to load the native tensorflow runtime.

libcublas.so.9.0: cannot open shared object file:No such file or directory.
Failed to load the native tensorflow runtime.

Any repo with updated packages and python version?

Request: tfsnippet version in requirements.txt not available

Hello,

Thank you for this repo.
It seems the version of the repo required for the OmniAnomaly repo is no longer available. Can you kindly host the v0.2.0-alpha1 of the tfsnippet repo or send via email? opooladz at ucla dot edu

Thank you in advance

I meet some problems when i run 'python main.py'!!!!!!!!!!

The problem happened in wrapper.py 'init()' function

Data Preprocessing

In the README you specify the code:python data_process.py but there is no file called data_process however, the file data_preprocess.py exists hence a tiny correction is needed

Why are different telemetry channels trained by the same model ?

First of all, thank you for your great work.

I am a bit confused that all channels of the SMAP are merged together into one dataset, and then trained with one model.
However, the authors of the MSL ands SMAP datasets, fit a individual model for each channel.
And this makes also much sense to me since the pattern of an temperature time-series looks different than a radiation time-series.

Code can't run in win10/ubuntu18

Hello, running your code, we encountered a lot of problems that have not been resolved yet. The details are as follows:

TensorFlow version problem that the library cannot be found. I follow requestments.txt install the dependent libraries. The problem looks like TFSnippet problem? I also tried to install a different version of TFSnippet, but still couldn't fix the problem.

Environmental information:

win10
cudn10.0/cudnn11.1
RTX 3070
conda python3.6
requestments.txt(including branch of matser and dependabot/pip/tensorflow-gpu-2.4.0)

Errors message:

$ python main.py
2021-04-17 13:55:52.732294: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Traceback (most recent call last):
  File "main.py", line 13, in <module>
    from tfsnippet.examples.utils import MLResults, print_with_title
  File "C:\Users\LIU\anaconda3\envs\OmniAnomaly2\lib\site-packages\tfsnippet\__init__.py", line 4, in <module>
    from . import (dataflows, datasets, distributions, layers, ops,
  File "C:\Users\LIU\anaconda3\envs\OmniAnomaly2\lib\site-packages\tfsnippet\dataflows\__init__.py", line 1, in <module>
    from .array_flow import *
  File "C:\Users\LIU\anaconda3\envs\OmniAnomaly2\lib\site-packages\tfsnippet\dataflows\array_flow.py", line 4, in <module>
    from tfsnippet.utils import minibatch_slices_iterator
  File "C:\Users\LIU\anaconda3\envs\OmniAnomaly2\lib\site-packages\tfsnippet\utils\__init__.py", line 17, in <module>
    from .session import *
  File "C:\Users\LIU\anaconda3\envs\OmniAnomaly2\lib\site-packages\tfsnippet\utils\session.py", line 71, in <module>
    def get_variables_as_dict(scope=None, collection=tf.GraphKeys.GLOBAL_VARIABLES):
AttributeError: module 'tensorflow' has no attribute 'GraphKeys'

of course, there are some small problems we have solved during the process.

Then I switching different graphics cards environment hope to solve the problem. However it happen some new problem.

Environmental information:

ubuntu
cudn10.2
RTX 2080 TI
conda python3.6
requestments.txt(in branch of matser)

Errors message:

$ python main.py 
Traceback (most recent call last):
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/nsai/anaconda3/envs/lj_dl/lib/python3.6/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/nsai/anaconda3/envs/lj_dl/lib/python3.6/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 12, in <module>
    import tensorflow as tf
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/jingliu/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/nsai/anaconda3/envs/lj_dl/lib/python3.6/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/nsai/anaconda3/envs/lj_dl/lib/python3.6/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

In branch of dependabot/pip/tensorflow-gpu-2.4.0, after executing pip install -r requestments：

INFO: pip is looking at multiple versions of zhusuan to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of tfsnippet to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r requirements.txt (line 11), -r requirements.txt (line 13), -r requirements.txt (line 14), -r requirements.txt (line 7) and six==1.11.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested six==1.11.0
    tfsnippet 0.2.0a1 depends on six>=1.11.0
    zhusuan 0.4.0 depends on six
    fs 2.3.0 depends on six~=1.10
    tensorflow-gpu 2.4.0 depends on six~=1.15.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

In a word, we spent a lot of time running your code and it failed.
is it convenient for you to provide the running environment of docker version? What's your hardware conditions and information like CUDN that you can run?

No such file or directory: 'processed/MSL_train.pkl

I run this command:
"python main.py --dataset='MSL' --max_epoch=20" and I got this error

Traceback (most recent call last):
  File "main.py", line 220, in <module>
    main()
  File "main.py", line 97, in main
    test_start=config.test_start)
  File "/media/harry/harry/ML/LSTM/OmniAnomaly/omni_anomaly/utils.py", line 59, in get_data
    f = open(os.path.join(prefix, dataset + '_train.pkl'), "rb")
IOError: [Errno 2] No such file or directory: 'processed/MSL_train.pkl

Please help me, How can I run Server Machine Dataset?

Problem with training parameters about SMAP and MSL.

Dear Author,
In your paper and the project, I can see the different parameter“level” with SMAP = 0.07 and MSL = 0.01.
Can you tell me the other different parameters setting about the two datasets training?
Thanks.

Groups of Entities (SMD)

Dear authors,

It it stated that the SMD (Server Machine Dataset) contains 3 groups of entities. Each of them is named by machine-<group_index>-.

May I know what do each of the 3 groups mean, and why are they grouped in this way?

Thank you.

The code provide the reconstructed probability but how to get the reconstructed sequence

Error While training

I got this error while tranining SMAP

26860, 207.18948957385916], -371.0)
('cur thr: ', -199.0, [0.6784833256741009, 0.9701897913384155, 0.5216469874941413, 29291, 370467, 900, 26860, 207.18948957385916], [0.6794118610225376, 0.973996607929373, 0.5216469874941413, 29291, 370585, 782,
26860, 207.18948957385916], -371.0)
('cur thr: ', -149.0, [0.6784518943728867, 0.9700612681006586, 0.5216469874941413, 29291, 370463, 904, 26860, 192.475273927267], [0.6794118610225376, 0.973996607929373, 0.5216469874941413, 29291, 370585, 782, 26860, 207.18948957385916], -371.0)
('cur thr: ', -99.0, [0.6900573719198619, 0.966784338585756, 0.536499795099553, 30125, 370332, 1035, 26026, 159.45766892637948], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', -49.0, [0.683723519711516, 0.9395354626877341, 0.5374080603128336, 30176, 369425, 1942, 25975, 111.03955584177663], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', 1.0, [0.6625702902960295, 0.7697275382366061, 0.581610300692488, 32658, 361597, 9770, 23493, 59.91650023194379], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', 51.0, [0.6276050282163165, 0.6815108512768132, 0.581610300692488, 32658, 356105, 15262, 23493, 56.472065355374006], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', 101.0, [0.23218566283936742, 0.13134183823531775, 0.9999999998219087, 56151, 0, 371367, 0, 0.0], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', 151.0, [0.23218566283936742, 0.13134183823531775, 0.9999999998219087, 56151, 0, 371367, 0, 0.0], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', 201.0, [0.23218566283936742, 0.13134183823531775, 0.9999999998219087, 56151, 0, 371367, 0, 0.0], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', 251.0, [0.23218566283936742, 0.13134183823531775, 0.9999999998219087, 56151, 0, 371367, 0, 0.0], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', 301.0, [0.23218566283936742, 0.13134183823531775, 0.9999999998219087, 56151, 0, 371367, 0, 0.0], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
('cur thr: ', 351.0, [0.23218566283936742, 0.13134183823531775, 0.9999999998219087, 56151, 0, 371367, 0, 0.0], [0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
([0.6904290381906943, 0.9682447848268425, 0.536499795099553, 30125, 370379, 988, 26026, 165.87430885704643], -106.0)
Initial threshold : 26.704884
Number of peaks : 1350
Traceback (most recent call last):
  File "main.py", line 220, in <module>
    main()
  File "main.py", line 174, in main
    pot_result = pot_eval(train_score, test_score, y_test[-len(test_score):], level=config.level)
  File "/media/harry/harry/ML/LSTM/OmniAnomaly/omni_anomaly/eval_methods.py", line 137, in pot_eval
    s.initialize(level=level, min_extrema=True)  # initialization step
  File "/media/harry/harry/ML/LSTM/OmniAnomaly/omni_anomaly/spot.py", line 207, in initialize
    g, s, l = self._grimshaw()
  File "/media/harry/harry/ML/LSTM/OmniAnomaly/omni_anomaly/spot.py", line 345, in _grimshaw
    n_points, 'regular')
TypeError: unbound method _rootsFinder() must be called with SPOT instance as first argument (got function instance instead)

What is happening?

Project fails to run on M1 mac processor

M1 mac processor ultimately relies on Tensorflow v2.x. The packages that this project is built on, such as tfsnippet, require Tensorflow v1.x . I have been trying for many hours to get it to run on my M1-processor including attempts to enforce tfsnippet's TF v1.x API calls consistent with TF v2.x's but to no avail. Can you consider making your project compatible to run on new mac processors? Thanks.

Physical meaning of each column of the server machine dataset

I really appreciate you sharing the ServerMachineDataset. But could you please specify the exact physical meaning of each column? That will be really helpful for understanding the dataset.

How can I install tensorflow 1.12.0?

pip install tensorflow-gpu==1.12.0

ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.12.0 (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0)
ERROR: No matching distribution found for tensorflow-gpu==1.12.0

Seems that version 1.12.0 is not longer provided by pip

This instruction fails on colab: pip install -r requirements.txt

Hello, the command fails if run on colab.

This is what I get:

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/thu-ml/zhusuan.git (from -r requirements.txt (line 13))
Cloning https://github.com/thu-ml/zhusuan.git to /tmp/pip-req-build-csj2aoui
Running command git clone --filter=blob:none --quiet https://github.com/thu-ml/zhusuan.git /tmp/pip-req-build-csj2aoui
Resolved https://github.com/thu-ml/zhusuan.git to commit 4386b2a12ae4f4ed8e694e504e51d7dcdfd6f22a
Preparing metadata (setup.py) ... done
Collecting git+https://github.com/haowen-xu/[email protected] (from -r requirements.txt (line 14))
Cloning https://github.com/haowen-xu/tfsnippet.git (to revision v0.2.0-alpha1) to /tmp/pip-req-build-iwf0sus7
Running command git clone --filter=blob:none --quiet https://github.com/haowen-xu/tfsnippet.git /tmp/pip-req-build-iwf0sus7
Running command git checkout -q 7b43abdbdd29f1914dbc11b961b5d45b9de76653
Resolved https://github.com/haowen-xu/tfsnippet.git to commit 7b43abdbdd29f1914dbc11b961b5d45b9de76653
Preparing metadata (setup.py) ... done
Collecting six==1.11.0
Downloading six-1.11.0-py2.py3-none-any.whl (10 kB)
Collecting matplotlib==3.0.2
Downloading matplotlib-3.0.2.tar.gz (36.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.5/36.5 MB 9.9 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting numpy==1.15.4
Downloading numpy-1.15.4.zip (4.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.5/4.5 MB 16.0 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting pandas==0.23.4
Downloading pandas-0.23.4.tar.gz (10.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.5/10.5 MB 17.4 MB/s eta 0:00:00
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

adjust_predicts in eval_method.py

This function currently adjusts predictions based on ground truth labels. This would increase the F1 score for example. However, during the inference stage, there won't be any labels available. Should predictions be adjusted by other method, or they remain as they are?

Python version should >= 3.7

According to README.md:

Install dependencies (with python 3.5, 3.6)

the error protobuf requires Python '>=3.7' but the running Python is 3.6.10 after install dependency pip install -r requirements.txt

What does anomaly interpretation use for?

What does anomaly interpretation use for? #13

ValueError: Input tensor enters the loop with shape (?, 3, 1), but has shape (?, 3, ?) after one iteration.

I got this error while using SMAP dataset

Instructions for updating:
Use keras.layers.dense instead.
Traceback (most recent call last):
  File "main.py", line 213, in <module>
    main()
  File "main.py", line 110, in main
    valid_step_freq=config.valid_step_freq)
  File "/media/harry/harry/ML/LSTM/OmniAnomaly/omni_anomaly/training.py", line 121, in __init__
    x=self._input_x, n_z=n_z)
  File "/media/harry/harry/ML/LSTM/OmniAnomaly/omni_anomaly/model.py", line 138, in get_training_loss
    chain = self.vae.chain(x, n_z=n_z, posterior_flow=self._posterior_flow)
  File "/media/harry/harry/ML/LSTM/OmniAnomaly/omni_anomaly/vae.py", line 362, in chain
    observed={'x': x}
  File "/usr/local/lib/python3.6/site-packages/tfsnippet/bayes.py", line 403, in variational_chain
    latent_axis=latent_axis,
  File "/usr/local/lib/python3.6/site-packages/tfsnippet/variational/chain.py", line 55, in __init__
    model.local_log_probs(iter(model)))
  File "/usr/local/lib/python3.6/site-packages/tfsnippet/bayes.py", line 297, in local_log_probs
    ret.append(self._stochastic_tensors[name].log_prob(name=ns))
  File "/usr/local/lib/python3.6/site-packages/tfsnippet/stochastic.py", line 156, in log_prob
    self.tensor, self.group_ndims, name=name)
  File "/media/harry/harry/ML/LSTM/OmniAnomaly/omni_anomaly/wrapper.py", line 75, in log_prob
    log_prob, _, _, _, _, _, _ = self._distribution.forward_filter(given)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_probability/python/distributions/linear_gaussian_ssm.py", line 716, in forward_filter
    initializer=initial_state)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/functional_ops.py", line 724, in scan
    maximum_iterations=n)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3556, in while_loop
    return_same_structure)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3087, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3059, in _BuildLoop
    next_vars.append(_AddNextAndBackEdge(m, v))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 677, in _AddNextAndBackEdge
    _EnforceShapeInvariant(m, v)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 620, in _EnforceShapeInvariant
    "less-specific shape." % (input_t.name, input_t.shape, n_shape))
ValueError: Input tensor 'model/trainer/loss/training_loss/VAE.chain/VariationalChain/model_log_joint/z.log_prob/forward_filter/add:0' enters the loop with shape (?, 3, 1), but has shape (?, 3, ?) after one iteration. To allow the shape to vary across iterations, use the `shape_invariants` argument of tf.while_loop to specify a less-specific shape.

Why is Best F1 search method not preferred over POT method?

The comparison between the two is shown in the paper also, and the threshold given by best F1 search method performs better then POT method, still why do many other time-series anomaly detection papers also use POT for thresholding?

Calculating Scores from Estimations

Hello,

I want to get the estimated distributions of x_ts so, I'm dumping the estimated mu and sigma using the following code lines from model.py line 191:

mu = pnet['x'].distribution.mean[:, -1, :]
sigma = pnet['x'].distribution.sigma[:, -1, :]

to verify, I'm calculating the score on a python notebook using mu and sigma

score = np.sum(norm.logpdf(x, mu, sigma), axis=1)

but my scores are actually different than the scores calculated by your code.

I couldn't really debug the problem, so I'm asking: Is my approach correct ?

Many thanks.

Tensor had NaN values

请问有没有人遇到过Tensor had NaN values的问题，全程都按照readme里操作的

What is the exactly meaning of each column of the 38 columns of SMD

I am now get your Server Machine Dataset. As written in your paper, there are CPU load , network usage, memory usage record for each machine record. But there is no information of each column and I am now trying to use this dataset for analysis. What's more, is each row sampled in a equal time intervals? Thanks!

Code doesn't run

When i run the following :

python data_preprocess.py SMD

I get the following error

ImportError: DLL load failed: Le module spécifié est introuvable.
Failed to load the native TensorFlow runtime.

I see that other people have trouble running the code.

Does anyone know how to fix it or know a version of the code that works ?

requesting an open-source license (eg. MIT)

Thanks for the great work. Have you considered adding a license to your code so that people can reuse the code, modify it and distribute?

Github's default license is that no one has the right to reuse, modify or redistribute your code - https://choosealicense.com/no-permission/ unless you choose a different open-source license (such as MIT).

some questions about dataset

Please support us !!

I have not seen any replied from the author. Please support us answering those questions having issues.
Thank you!

Data preprocessing

During data preprocessing, data sets are converted through data standardization. How does data standardization work?

Fitting on two different Scaler

The code have a major flaw : if I decide to normalize the data, a MinMaxScaler is used on trainset and testset, but it's not the same for the two dataset. You have to fit on one of the dataset, and transform on both of them with the same.

OmniAnomaly is unsupervised learning method?

As I have read the paper. The OmniAnomaly is an unsupervised learning method, however, your Server Machine dataset has labels?

Combining entity wise results into data set wise results

Dear authors,
in your paper you report results for SMAP, MSL and SMD however the anomaly detection is done at the entity level. Could you explain how you obtained precision, recall and f1 score for the whole data sets from the anomaly predictions you had for each of the entities that composed the data sets?
Best

SMD Dataset - Default settings same as results reported in paper?

Hi authors,

I am trying to replicate the results reported in your paper. I would like to confirm whether the default hyperparameter settings provided in the code are exactly the same as the experiments you reported in your paper?

I think another issue I'm having in replicating similar Pot-F1 scores is the issue mentioned in #17, how do you combine the Pot-F1 from 28 different entities into a single score? I have attached the results I by running the code: omnianom_results.txt.

Thank you!

A problem about timestep of SMD datasets

Hi, very lucky to read this wonderful paper and thanks for your efforts to collect the SMD dataset. A simple question is does the timestep in SMD keep continuous? For example, the time of the first datapoint is 2020.1.1 1:00, the second is 2020.1.1 1:01, and the last is after 700k minutes. Then the testset starts from the 700k+1 minutes. Is this correct?

Thanks and Looking forward to your reply!

Possible leak from the test set for POT

Dear authors,

Thank you for sharing this work.

I tried random numbers from the normal distribution with mean 0 and std 0.1 as model scores and evaluated using the pot_eval function. The best-f1 score seems fine, yet, the pot-f1 score seems buggy.

I attached main_random.py to reproduce the results.
main_random.txt
Github does not allow uploading .py, thus, rename main_random.txt to main_random.py and place it on the main directory. The installation is the same with this project.

For the MSL dataset,
best-f1: 0.3184
pot-f1: 0.8987 (The score in paper is 0.8989 in Table 3)
To reproduce, run python main_random.py --dataset='MSL'.

For the SMAP dataset,
best-f1: 0.3691
pot-f1: 0.9610 (The score in the paper is 0.8434 in Table 3)
To reproduce, run python main_random.py --dataset='SMAP'

These pot-f1 scores are higher than the ones in the paper. If I am not doing anything wrong, the results indicate that there is a leak from the for the POT method and its corresponding scores in the paper. Issue #15 also seems related.

May you help to explain why this happens? Thank you.

怎么加载训练好的tf模型文件去实现预测？

你好：
模型训练完成后会生成TF模型文件。我想加载模型文件去做预测，可以如何去实现呢？如何加载模型替换训练的过程呢？
谢谢！

Cannot install and run the code with tf2.9.3

Hi,

I saw your update on tf 2.9.3 compatible but I tried

pip install -r requirement.txt

and it fails. So I remove the version of some package and it seems like tfsnippet has some problems with import

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/tfsnippet/__init__.py", line 4, in <module>
    from . import (dataflows, datasets, distributions, evaluation, layers,
  File "/usr/local/lib/python3.8/dist-packages/tfsnippet/dataflows/__init__.py", line 1, in <module>
    from .array_flow import *
  File "/usr/local/lib/python3.8/dist-packages/tfsnippet/dataflows/array_flow.py", line 4, in <module>
    from tfsnippet.utils import minibatch_slices_iterator, generate_random_seed
  File "/usr/local/lib/python3.8/dist-packages/tfsnippet/utils/__init__.py", line 20, in <module>
    from .session import *
  File "/usr/local/lib/python3.8/dist-packages/tfsnippet/utils/session.py", line 71, in <module>
    def get_variables_as_dict(scope=None, collection=tf.GraphKeys.GLOBAL_VARIABLES):
AttributeError: module 'tensorflow' has no attribute 'GraphKeys'

Any solutions?

Interpretation_label for ServerMachineDataset

Hi,

Thank you for collecting and making ServerMachineDataset public. While reading this data, I have a small question about interpretation_label:

How does interpretation_label relate to test_label if they are related at all? Why they don't seem to match exactly? For example, the interpretation_label for machine-1-1 indicates that 15849-16368 is an anomaly. However, when checking the test_label for machine-1-1, it indicates that the values at rows 15850-16395 is an anomaly.

Do the indices in interpretation_label also indicate anomaly locations in the train set?

Update to Code

I'm getting a lot of errors when I try running the code for all three datasets, particularly:

Tensorflow is raising a few deprecation warnings because they updated their API. Can you carry out the updates to your code (tf.GraphKeys & tf.train.AdamOptimizer)
the file 'machine-1-1' is breaking the entire code. For the SMD this is because there is no folder named test_label inside ServerMachineDataset (The test_labels that is there is inside data and it is empty). For the other two datasets, a NoneType error is raised because I'm guessing it doesn't know which folder to reference for the files or it's referencing the wrong folder

Implemented model architecture does not match the paper

In your paper (Robust anomaly detection for multivariate time series through stochastic recurrent neural network) in Figure 3 b1 you describe the network architecture of qnet, which begins as follows:

$x_t$ -> GRU -> $e_t$ (GRU hidden output) -> concatenate $e_t$ with $z_{t-1}$ -> Dense layer $h^\phi$ -> ...

However, the implemented code does not follow this architecture:

After the RNN cell, you immediately put the RNN's hidden output $e_t$ through (two) linear layers:

wrapper.py (lines 113-120)

try:
    outputs, _ = rnn.static_rnn(fw_cell, x, dtype=tf.float32)
except Exception:  # Old TensorFlow version only returns outputs not states
    outputs = rnn.static_rnn(fw_cell, x, dtype=tf.float32)
outputs = tf.stack(outputs, axis=time_axis)
for i in range(hidden_dense):
    outputs = tf.layers.dense(outputs, dense_dim)
return outputs

So this code shows first issue:

As you are not specifying the activation type in tf.layers.dense() function, these layers have linear activation, not ReLU as you claim in the paper. Also, did you use hidden_dense=2 in the experiments described in the paper or was it 1?

Second:

In the paper you say that you concatenate $e_t$ with $z_t$ and after that apply a dense layer with ReLU activation, which is not done in your implementation:

recurrent_distribution.py (lines 43-46):

input_q = tf.concat([input_q_n, z_previous], axis=-1)
mu_q = self.mean_q_mlp(input_q, reuse=tf.AUTO_REUSE)  # n_sample * batch_size * z_dim

std_q = self.std_q_mlp(input_q)  # n_sample * batch_size * z_dim

The code above show that the concatenated output is directly given to mu and std layers, not to a dense layer with ReLU activation before this.

So what your current implementation does in the qnet is more like following:
$x_t$ -> GRU -> $e_t$ (GRU hidden output) -> x (two?) linear layers -> concatenate $e_t$ with $z_{t-1}$ -> Get mu and std using the concatenation as an input for the mu layer and std layer -> ...

What is the reason for this difference in the actual implementation and the one described in the original paper? Which architecture was used in the experiments reported in the paper?

Is there a way to separate the training and testing process?

Questions about the SMD dataset

Hello. I am a fellow researcher working on interpretable time series anomaly detection.
I have some questions about your work.

As I am working with the SMD dataset and the given interpretation label to measure the anomaly interpretation performance, I got some doubts about the integrity of the dataset. First of all, the start and the end timestamp in the interpretation label does not correctly match to the test label. Also, there are some missing or extra interpretations in the dataset. How did you deal with the inconsistencies when conducting the experiment?

I would appreciate it if you could clarify these points for me. Again thank you for the nice work.

question about your paper's results

Hi! Dear authors,

I have a question about your reported results.

I have tested your OmniAnomaly model on the dataset MSL and I can get 89% Pot-F1 which is much closer to the result reported in your paper.
But, when I set the model's anomaly score as random numbers from [0, 1], the pot-F1 can reach above 89.8833%. it is confusing since these random "anomaly scores" are not obtained from the model.

I think this issue should be caused by the point-adjust approach mentioned in your paper.
Actually, I also try to evaluate a simple RNN with your codes and settings (same data and same evaluation), its best F1 can also be above 90%.

Can you help to explain my question? Many thanks.

Server Machine Dataset description

Could you please give me the original dataset of "server machine" dataset? Or please describe how first the data looks like?
I have no idea how the data looks like. I opened one file in train folder
Here is the first line of the file:
0.032258,0.039195,0.027871,0.024390,0.000000,0.915385,0.343691,0.000000,0.020011,0.000122,0.106312,0.081081,0.027397,0.060266,0.085018,0.122516,0.000000,0.000000,0.062195,0.041221,0.043242,0.031607,0.533195,0.010224,0.011195,0.009274,0.000000,0.036625,0.000000,0.004298,0.029993,0.022131,0.000000,0.000045,0.034677,0.034747,0.000000,0.000000

netmanaiops / omnianomaly Goto Github PK

omnianomaly's Introduction

OmniAnomaly

Anomaly Detection for Multivariate Time Series through Modeling Temporal Dependence of Stochastic Variables

Getting Started

Clone the repo

Get data

Install dependencies (with python 3.5, 3.6)

Preprocess the data

Run the code

Data

Dataset Information

SMAP and MSL

SMD

Processing

Training loss

omnianomaly's People

Contributors

Stargazers

Watchers

Forkers

omnianomaly's Issues

Run OmniAnomaly using Nividia cuda9 image on Ubuntu

Prepare your machine: install nvidia-docker2

Create the env to run OmniAnomaly (in the container)

References

Recommend Projects

Recommend Topics

Recommend Org