xuhongzuo / deepod Goto Github PK

View Code? Open in Web Editor NEW

378.0 9.0 46.0 3.62 MB

Deep learning-based outlier/anomaly detection

Home Page: https://deepod.readthedocs.io/

License: BSD 2-Clause "Simplified" License

Python 100.00%

anomaly-detection deep-anomaly-detection outlier-detection

deepod's Introduction

Glad to see you here !

🌱 I’m a CS Ph.D. student at National University of Defense Technology, China.
🔭 My current research interests include unsupervised(self-supervised)/weakly-supervised outlier/anomaly detection/interpretation for tabular/time series/graph data.
📫 Contact me via hongzuoxu [aT] 126 [dot] com

deepod's People

Contributors

Stargazers

Watchers

deepod's Issues

tuple index out of range

To reproduce:

#!/usr/bin/env python3
import numpy
from deepod.models.time_series.dif import DeepIsolationForestTS

arr = numpy.empty(shape=(3, 3))
arr.fill(1.0)

dif = DeepIsolationForestTS(device=None)
dif.fit(arr)

Error message:

Traceback (most recent call last):
  File "reproduce.py", line 10, in <module>
    dif.fit(arr)
  File "$VIRTUAL_ENV/lib/python3.9/site-packages/deepod/models/time_series/dif.py", line 74, in fit
    self.n_samples, self.n_features = X_seqs.shape[0], X_seqs.shape[2]
IndexError: tuple index out of range

I have a workaround:

dif = DeepIsolationForestTS(device=None, seq_len=min(100, arr.shape[0])

which suggests the fix.

Issue in testbed/utils.py/read_data function

Hi, when the split is set to 50% normal. The y_test is generated such a way the anomalies are always grouped.

Just use this command " print(np.where(y_test==1)) " and you can see the problem.

For example, this is the result of above command when applied on '01-thyroid.csv'

(array([1840, 1841, 1842, 1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850,
1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861,
1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872,
1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883,
1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894,
1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905,
1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916,
1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927,
1928, 1929, 1930, 1931, 1932]),)

Due to this grouping in the testdata we are getting good results. When we shuflle x_test and y_test externally and apply any algorithm the results are poor.

Please remove the bias in test data splitting. Shuffle the anomalies.

Are there any restrictions on test data?

I use DeepSADTS to train. My training data is 8,000. When I tested, I wanted to use single row data to make predictions but an error occurred.
I found that at least 100 rows of data are required for testing. Is there any limit on the number of pieces of test data?
Please help me answer it, thank you

The following is the error message

Traceback (most recent call last):
File "/home/chris/PycharmProjects/pythonProject/numops/DeepOD/LSTM.py", line 55, in
scores = clf.decision_function(df_test)
File "/home/chris/PycharmProjects/pythonProject/numops/DeepOD/deepod/core/base_model.py", line 326, in decision_function
z, scores = self._inference()
File "/home/chris/PycharmProjects/pythonProject/numops/DeepOD/deepod/core/base_model.py", line 478, in _inference
z = torch.cat(z_lst).data.cpu().numpy()
RuntimeError: torch.cat(): expected a non-empty list of Tensors

weakly-supervised methods learning error

When using the code below
"
from deepod.models.devnet import DevNet
clf = DevNet()
clf.fit(X_train, y=semi_y) # semi_y uses 1 for known anomalies, and 0 for unlabeled data
"
At this part
"clf.fit(X_train, y=semi_y)"

"Float division by zero" error occurs

How can I resolve the above error?
I put a two-dimensional array in the "x_train" variable, and a list of zeroes in the "y" variable because they are all normal data.

DeepIsolationForest with CUDA

How can I train the DeepIsolationForest model on my GPU?

Neither this

from deepod.models.tabular import DeepIsolationForest

clf_dif = DeepIsolationForest(device = 'cuda')

nor this

import torch
from deepod.models.tabular import DeepIsolationForest

device = torch.device("cuda")

clf_dif = DeepIsolationForest(device = device)

works.

dif.py uses deprecated hidden dims syntax

The constructor for DeepIsolationForestTS contains the following line:

DeepOD/deepod/models/time_series/dif.py

Line 27 in c2c7566

rep_dim=128, hidden_dims='100,50', bias=False,

However, this causes emission of a ginormous number of warnings:

Start Training...
  0%|                                                                                              | 0/50 [00:00<?, ?it/s]
$VIRTUAL_ENV/lib/python3.9/site-packages/deepod/core/networks/network_utility.py:23: UserWarning: use the first hidden num, the rest hidden numbers are deprecated
  warnings.warn('use the first hidden num, '
  2%|█▋

Fix is very easy of course, just change hidden_dims='100,50' to hidden_dims=100.

fit_auto_hyper() error

I tried to use fit_auto_hyper() by writing the code as follows, but an error occurred.

clf = TimesNet()

hyper = clf.fit_auto_hyper(train_value)

2024-04-06 08:42:46,524	WARNING worker.py:2006 -- Warning: The actor ImplicitFunc is very large (10 MiB). Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use ray.put() to put large objects in the Ray object store.
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00000 completed. Last result: 
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00001 completed. Last result: 
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00002 completed. Last result: 
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00003 completed. Last result: 
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00004 completed. Last result: 
2024-04-06 08:43:05,303	INFO tune.py:1148 -- Total run time: 19.75 seconds (19.72 seconds for the tuning loop).
2024-04-06 08:43:05,306	WARNING experiment_analysis.py:916 -- Failed to read the results for 5 trials:
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00000_0_2024-04-06_08-42-45
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00001_1_2024-04-06_08-42-45
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00002_2_2024-04-06_08-42-45
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00003_3_2024-04-06_08-42-45
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00004_4_2024-04-06_08-42-45
2024-04-06 08:43:05,306	WARNING experiment_analysis.py:783 -- Could not find best trial. Did you pass the correct `metric` parameter?

--> [278] print(f"Best trial config: {best_trial.config}")

AttributeError: 'NoneType' object has no attribute 'config'

Can you tell me what the problem is? Or is there something wrong with my code?

How to check the parameters of TimesNet?

I import TimesNet by "from deepod.models.timeseries import TimesNet"
I need to to check the parameters of TimesNet but "AttributeError: 'TimesNet' object has no attribute 'parameters'".
I can not find a solution in https://deepod.readthedocs.io/en/latest/.
could you please tell me how can i check the parameters of TimesNet?
Thank you very much

Unable to Import Module 'deepod.metrics.affiliation' in Testbed_unsupervised_tsad.py

Issue Description:
When attempting to run the testbed_unsupervised_tsad.py script from the DeepOD library's Testbed, I encountered an ImportError related to the module 'deepod.metrics.affiliation'.

Error Traceback:

Traceback (most recent call last):
  File "E:\AD\DeepOD\testbed\testbed_unsupervised_tsad.py", line 16, in <module>
    from deepod.metrics import ts_metrics, point_adjustment
  ...
  File "D:\ruanjian\anaconda\envs\DeepOD\lib\site-packages\deepod\metrics\_anomaly_detection.py", line 3, in <module>
    from deepod.metrics.affiliation.generics import convert_vector_to_events
ModuleNotFoundError: No module named 'deepod.metrics.affiliation'

Steps to Reproduce:

Clone the DeepOD repository.
Follow the Testbed documentation for setting up the environment.
Execute the testbed_unsupervised_tsad.py script with the provided example command.

Additional Information:
DeepOD version: 0.4.1
Python version: 3.9
Operating System: Windows 11 Pro

I would appreciate any guidance or suggestions to resolve this issue and successfully run the testbed_unsupervised_tsad.py script.

save model parameters

hello,

I would like to start by thanking you for your amazing effort in this library.

I would like to suggest a feature or collaborate in some way to add this feature, but first I wanted to make sure that I was really unable to find it. I was wondering if there is a save_model feature or save model weights or state_dict so the trained model can be deployed or retrained on other dataset

Thanks again for your time and efforts

testbed-utils error

I found that in utils.py, the data_split function returns x_train, y_train, x_test, and y_test with mismatched shapes when the split parameter is set to 60%. This is due to the reversal of the parameter assignment order in the train_test_split method call.

Training process

about the train process, why in deepod there is no validation dataset?
for the decision function

clf = ...
clf.fit(X_train)
scores = clf.decision_function(X_test)

then i use roc_curve(y_test, scores) the get the best threshold, then use this threshold as parameter for later use. is this right?

Add paper

周小晖,王意洁,徐鸿祚,刘铭宇.基于融合学习的无监督多维时间序列异常检测[J/OL].计算机研究与发展:1-14.
I am interested with this work; will you add this code in the future?

How can I set hyperparameters for trainning?

Thank you for your excellent work!

However, I've noticed that the models may not converge on some datasets when using the fit() function.

Could you explain how to set the hyperparameters, such as learning rate, epochs, etc.?

DevNet: reproducing the same results as the paper

Hi there,
First of all: thank you very much for this great repo.
I just wanted to ask how could I use your DevNet class to recreate the results that were mentioned in the original paper.

All scores are zero except the last, which is a NaN

To reproduce:

#!/usr/bin/env python3
import numpy
from deepod.models.time_series.dif import DeepIsolationForestTS

arr = numpy.empty(shape=(33, 741))
arr.fill(1.0)

# Also reproduces the issue:
# arr = numpy.random.rand(33, 741)

dif = DeepIsolationForestTS(device=None, seq_len=min(arr.shape[0], 100), max_samples=min(arr.shape[0], 256), hidden_dims=100)
dif.fit(arr)
scores = dif.decision_function(arr)
print(scores)

Output:

$VIRTUAL_ENV/lib/python3.9/site-packages/deepod/models/tabular/dif.py:256: RuntimeWarning: invalid value encountered in divide
  scores = 2 ** (-depth_sum / (len(clf.estimators_) * _average_path_length([clf.max_samples_])))
[00:00<00:00, 1088.64it/s]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. nan]

Do you support onnx deployment?

Can I complete model onnx deployment using only this tool?

RuntimeError, when using ICL model

RuntimeError: Trying to create tensor with negative dimension -1: [100, -1]

x_train shape [4320, 1]
x_test shape [4320, 1]

modelL = {0: DeepSVDD, 1: REPEN, 2: RDP, 3: RCA, 4: GOAD, 5: ICL}
model_name = ["SVDD", "REPEN", "RDP", "RCA", "GOAD", "ICL"]
for index in range(1, 6):
    model = modelL[index]()
    model.fit(x_train, y=None)
    score = model.decision_function(x_test)

        self.enc_f_net = MLPnet(
            n_features=n_features-kernel_size,
            n_hidden=hidden_dims,
            n_output=rep_dim,
            mid_channels=len(self.all_idx),
            batch_norm=True,
            activation=f_act,
            bias=bias,
        )

here, n_features=1 and kernel_size=2, and then it got -1

Project license

Hi everybody,

Could you include a license for this repository so I know if I could use your work in my personal projects?

Thank you for this amazing work!

unable to install correctly (neither "pip install deepod" nor "pip install .")

When I pip install deepod, deepod.metrics is not available. There is no deepod.models.tabular either, but models are directly in deepod.models. So examples usage can be ran if we change that.

When I try to pip install . from the git folder, I cannot have a working installation. First I think that the installation failed when using the last python version today.
And when I set python==3.10, it installs, but then when I do import deepod, I have this error:

>>> import deepod
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/deepod/__init__.py", line 2, in <module>
    from . import core, models, metrics
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/deepod/core/__init__.py", line 4, in <module>
    from .base_model import BaseDeepAD
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/deepod/core/base_model.py", line 17, in <module>
    from ray import tune
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/tune/__init__.py", line 2, in <module>
    from ray.tune.tune import run_experiments, run
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/tune/tune.py", line 27, in <module>
    from ray.air import CheckpointConfig
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/air/__init__.py", line 1, in <module>
    from ray.air.checkpoint import Checkpoint
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/air/checkpoint.py", line 22, in <module>
    from ray.air._internal.remote_storage import (
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/air/_internal/remote_storage.py", line 8, in <module>
    from pkg_resources import packaging
ImportError: cannot import name 'packaging' from 'pkg_resources' (/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/pkg_resources/__init__.py)
>>>

xuhongzuo / deepod Goto Github PK

deepod's Introduction

Glad to see you here !

deepod's People

Contributors

Stargazers

Watchers

Forkers

deepod's Issues

Recommend Projects

Recommend Topics

Recommend Org