Giter Club home page Giter Club logo

deepod's Introduction

Glad to see you here !

  • šŸŒ± Iā€™m a CS Ph.D. student at National University of Defense Technology, China.

  • šŸ”­ My current research interests include unsupervised(self-supervised)/weakly-supervised outlier/anomaly detection/interpretation for tabular/time series/graph data.

  • šŸ“« Contact me via hongzuoxu [aT] 126 [dot] com

Github Stats

trophy

deepod's People

Contributors

bettyzry avatar crishna0401 avatar elsheikh21 avatar nathompson avatar nuhdv avatar real-lhj avatar xuhongzuo avatar yyysjz1997 avatar zhouyilee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepod's Issues

tuple index out of range

To reproduce:

#!/usr/bin/env python3
import numpy
from deepod.models.time_series.dif import DeepIsolationForestTS

arr = numpy.empty(shape=(3, 3))
arr.fill(1.0)

dif = DeepIsolationForestTS(device=None)
dif.fit(arr)

Error message:

Traceback (most recent call last):
  File "reproduce.py", line 10, in <module>
    dif.fit(arr)
  File "$VIRTUAL_ENV/lib/python3.9/site-packages/deepod/models/time_series/dif.py", line 74, in fit
    self.n_samples, self.n_features = X_seqs.shape[0], X_seqs.shape[2]
IndexError: tuple index out of range

I have a workaround:

dif = DeepIsolationForestTS(device=None, seq_len=min(100, arr.shape[0])

which suggests the fix.

Issue in testbed/utils.py/read_data function

Hi, when the split is set to 50% normal. The y_test is generated such a way the anomalies are always grouped.

Just use this command " print(np.where(y_test==1)) " and you can see the problem.

For example, this is the result of above command when applied on '01-thyroid.csv'

(array([1840, 1841, 1842, 1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850,
1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861,
1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872,
1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883,
1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894,
1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905,
1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916,
1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927,
1928, 1929, 1930, 1931, 1932]),)

Due to this grouping in the testdata we are getting good results. When we shuflle x_test and y_test externally and apply any algorithm the results are poor.

Please remove the bias in test data splitting. Shuffle the anomalies.

Are there any restrictions on test data?

I use DeepSADTS to train. My training data is 8,000. When I tested, I wanted to use single row data to make predictions but an error occurred.
I found that at least 100 rows of data are required for testing. Is there any limit on the number of pieces of test data?
Please help me answer it, thank you

The following is the error message

Traceback (most recent call last):
File "/home/chris/PycharmProjects/pythonProject/numops/DeepOD/LSTM.py", line 55, in
scores = clf.decision_function(df_test)
File "/home/chris/PycharmProjects/pythonProject/numops/DeepOD/deepod/core/base_model.py", line 326, in decision_function
z, scores = self._inference()
File "/home/chris/PycharmProjects/pythonProject/numops/DeepOD/deepod/core/base_model.py", line 478, in _inference
z = torch.cat(z_lst).data.cpu().numpy()
RuntimeError: torch.cat(): expected a non-empty list of Tensors

weakly-supervised methods learning error

When using the code below
"
from deepod.models.devnet import DevNet
clf = DevNet()
clf.fit(X_train, y=semi_y) # semi_y uses 1 for known anomalies, and 0 for unlabeled data
"
At this part
"clf.fit(X_train, y=semi_y)"

"Float division by zero" error occurs

How can I resolve the above error?
I put a two-dimensional array in the "x_train" variable, and a list of zeroes in the "y" variable because they are all normal data.

DeepIsolationForest with CUDA

How can I train the DeepIsolationForest model on my GPU?

Neither this

from deepod.models.tabular import DeepIsolationForest

clf_dif = DeepIsolationForest(device = 'cuda')

nor this

import torch
from deepod.models.tabular import DeepIsolationForest

device = torch.device("cuda")

clf_dif = DeepIsolationForest(device = device)

works.

dif.py uses deprecated hidden dims syntax

The constructor for DeepIsolationForestTS contains the following line:

rep_dim=128, hidden_dims='100,50', bias=False,

However, this causes emission of a ginormous number of warnings:

Start Training...
  0%|                                                                                              | 0/50 [00:00<?, ?it/s]
$VIRTUAL_ENV/lib/python3.9/site-packages/deepod/core/networks/network_utility.py:23: UserWarning: use the first hidden num, the rest hidden numbers are deprecated
  warnings.warn('use the first hidden num, '
  2%|ā–ˆā–‹

Fix is very easy of course, just change hidden_dims='100,50' to hidden_dims=100.

fit_auto_hyper() error

I tried to use fit_auto_hyper() by writing the code as follows, but an error occurred.

clf = TimesNet()

hyper = clf.fit_auto_hyper(train_value)
2024-04-06 08:42:46,524	WARNING worker.py:2006 -- Warning: The actor ImplicitFunc is very large (10 MiB). Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use ray.put() to put large objects in the Ray object store.
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00000 completed. Last result: 
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00001 completed. Last result: 
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00002 completed. Last result: 
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00003 completed. Last result: 
(raylet) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
Trial _training_ray_a3cad_00004 completed. Last result: 
2024-04-06 08:43:05,303	INFO tune.py:1148 -- Total run time: 19.75 seconds (19.72 seconds for the tuning loop).
2024-04-06 08:43:05,306	WARNING experiment_analysis.py:916 -- Failed to read the results for 5 trials:
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00000_0_2024-04-06_08-42-45
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00001_1_2024-04-06_08-42-45
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00002_2_2024-04-06_08-42-45
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00003_3_2024-04-06_08-42-45
- /root/ray_results/_training_ray_2024-04-06_08-42-45/_training_ray_a3cad_00004_4_2024-04-06_08-42-45
2024-04-06 08:43:05,306	WARNING experiment_analysis.py:783 -- Could not find best trial. Did you pass the correct `metric` parameter?
--> [278] print(f"Best trial config: {best_trial.config}")

AttributeError: 'NoneType' object has no attribute 'config'

Can you tell me what the problem is? Or is there something wrong with my code?

Unable to Import Module 'deepod.metrics.affiliation' in Testbed_unsupervised_tsad.py

Issue Description:
When attempting to run the testbed_unsupervised_tsad.py script from the DeepOD library's Testbed, I encountered an ImportError related to the module 'deepod.metrics.affiliation'.

Error Traceback:

Traceback (most recent call last):
  File "E:\AD\DeepOD\testbed\testbed_unsupervised_tsad.py", line 16, in <module>
    from deepod.metrics import ts_metrics, point_adjustment
  ...
  File "D:\ruanjian\anaconda\envs\DeepOD\lib\site-packages\deepod\metrics\_anomaly_detection.py", line 3, in <module>
    from deepod.metrics.affiliation.generics import convert_vector_to_events
ModuleNotFoundError: No module named 'deepod.metrics.affiliation'

Steps to Reproduce:

  1. Clone the DeepOD repository.
  2. Follow the Testbed documentation for setting up the environment.
  3. Execute the testbed_unsupervised_tsad.py script with the provided example command.

Additional Information:
DeepOD version: 0.4.1
Python version: 3.9
Operating System: Windows 11 Pro

I would appreciate any guidance or suggestions to resolve this issue and successfully run the testbed_unsupervised_tsad.py script.

save model parameters

hello,

I would like to start by thanking you for your amazing effort in this library.

I would like to suggest a feature or collaborate in some way to add this feature, but first I wanted to make sure that I was really unable to find it. I was wondering if there is a save_model feature or save model weights or state_dict so the trained model can be deployed or retrained on other dataset

Thanks again for your time and efforts

testbed-utils error

I found that in utils.py, the data_split function returns x_train, y_train, x_test, and y_test with mismatched shapes when the split parameter is set to 60%. This is due to the reversal of the parameter assignment order in the train_test_split method call.

Training process

  1. about the train process, why in deepod there is no validation dataset?

  2. for the decision function

clf = ...
clf.fit(X_train)
scores = clf.decision_function(X_test)

then i use roc_curve(y_test, scores) the get the best threshold, then use this threshold as parameter for later use. is this right?

Add paper

å‘Øå°ę™–,ēŽ‹ę„ę“,徐éøæē„š,刘铭宇.åŸŗäŗŽčžåˆå­¦ä¹ ēš„ę— ē›‘ē£å¤šē»“ꗶ闓åŗåˆ—异åøøę£€ęµ‹[J/OL].č®”ē®—ęœŗē ”ē©¶äøŽå‘展:1-14.
I am interested with this work; will you add this code in the future?

How can I set hyperparameters for trainning?

Thank you for your excellent work!

However, I've noticed that the models may not converge on some datasets when using the fit() function.

Could you explain how to set the hyperparameters, such as learning rate, epochs, etc.?

All scores are zero except the last, which is a NaN

To reproduce:

#!/usr/bin/env python3
import numpy
from deepod.models.time_series.dif import DeepIsolationForestTS

arr = numpy.empty(shape=(33, 741))
arr.fill(1.0)

# Also reproduces the issue:
# arr = numpy.random.rand(33, 741)

dif = DeepIsolationForestTS(device=None, seq_len=min(arr.shape[0], 100), max_samples=min(arr.shape[0], 256), hidden_dims=100)
dif.fit(arr)
scores = dif.decision_function(arr)
print(scores)

Output:

$VIRTUAL_ENV/lib/python3.9/site-packages/deepod/models/tabular/dif.py:256: RuntimeWarning: invalid value encountered in divide
  scores = 2 ** (-depth_sum / (len(clf.estimators_) * _average_path_length([clf.max_samples_])))
[00:00<00:00, 1088.64it/s]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. nan]

RuntimeError, when using ICL model

RuntimeError: Trying to create tensor with negative dimension -1: [100, -1]

x_train shape [4320, 1]
x_test shape [4320, 1]

modelL = {0: DeepSVDD, 1: REPEN, 2: RDP, 3: RCA, 4: GOAD, 5: ICL}
model_name = ["SVDD", "REPEN", "RDP", "RCA", "GOAD", "ICL"]
for index in range(1, 6):
    model = modelL[index]()
    model.fit(x_train, y=None)
    score = model.decision_function(x_test)
        self.enc_f_net = MLPnet(
            n_features=n_features-kernel_size,
            n_hidden=hidden_dims,
            n_output=rep_dim,
            mid_channels=len(self.all_idx),
            batch_norm=True,
            activation=f_act,
            bias=bias,
        )

here, n_features=1 and kernel_size=2, and then it got -1

Project license

Hi everybody,

Could you include a license for this repository so I know if I could use your work in my personal projects?

Thank you for this amazing work!

unable to install correctly (neither "pip install deepod" nor "pip install .")

When I pip install deepod, deepod.metrics is not available. There is no deepod.models.tabular either, but models are directly in deepod.models. So examples usage can be ran if we change that.

When I try to pip install . from the git folder, I cannot have a working installation. First I think that the installation failed when using the last python version today.
And when I set python==3.10, it installs, but then when I do import deepod, I have this error:

>>> import deepod
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/deepod/__init__.py", line 2, in <module>
    from . import core, models, metrics
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/deepod/core/__init__.py", line 4, in <module>
    from .base_model import BaseDeepAD
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/deepod/core/base_model.py", line 17, in <module>
    from ray import tune
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/tune/__init__.py", line 2, in <module>
    from ray.tune.tune import run_experiments, run
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/tune/tune.py", line 27, in <module>
    from ray.air import CheckpointConfig
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/air/__init__.py", line 1, in <module>
    from ray.air.checkpoint import Checkpoint
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/air/checkpoint.py", line 22, in <module>
    from ray.air._internal.remote_storage import (
  File "/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/ray/air/_internal/remote_storage.py", line 8, in <module>
    from pkg_resources import packaging
ImportError: cannot import name 'packaging' from 'pkg_resources' (/home/ntits/micromamba/envs/deepod_env/lib/python3.10/site-packages/pkg_resources/__init__.py)
>>> 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.