minqi824 / adbench Goto Github PK
View Code? Open in Web Editor NEWOfficial Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.
License: BSD 2-Clause "Simplified" License
Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.
License: BSD 2-Clause "Simplified" License
May I ask if you have considered making it into an integrated library in the future? Thanks!
It there going to be a platform so I can evaluate my method on it?
The requirements.txt file restricts the version of PyOD to 1.0.0, but not any of the other libraries. However, the newest version of scikit-learn and tensorflow throws errors for some models (LODA and DeepSVDD for example). You should either restrict scikit-learn and tensorflow to previous versions or use the newest version of PyOD. It makes dealing with ADBench very annoying with creating my requirements.txt for my project. It is related to this issue yzhao062/pyod#406.
Hi Dear,
I found in the code, DataGenerator.generator() can not generate data properly. The parameters:
Thank you for your assistance.
Bryan
ELKI, which can easily be invoked from command line, provides many additional algorithms missing from this benchmark, such as:
In other cases, it may be desirable to compare the performance of different implementations:
下载模型时报错 :
PS D:\PyCharm> git clone https://github.com/Minqi824/ADBench.git
Cloning into 'ADBench'...
remote: Enumerating objects: 1074, done.
remote: Counting objects: 100% (189/189), done.
remote: Compressing objects: 100% (94/94), done.
error: RPC failed; curl 18 HTTP/2 stream 5 was reset0 KiB/s
error: 995 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
Hello guys!
Super amazing job! Thank you.
I have tried first examples, but some don´t run well, could you help me please?
thank you so much.
CODE:
# customized model on ADBench's datasets
from adbench.run import RunPipeline
from adbench.baseline.Customized.run import Customized
# notice that you should specify the corresponding category of your customized AD algorithm
# for example, here we use Logistic Regression as customized clf, which belongs to the supervised algorithm
# for your own algorithm, you can realize the same usage as other baselines by modifying the fit.py, model.py, and run.py files in the adbench/baseline/Customized
pipeline = RunPipeline(suffix='ADBench', parallel='supervise', realistic_synthetic_mode=None, noise_type=None)
results = pipeline.run(clf=Customized)
# customized model on customized dataset
import numpy as np
dataset = {}
dataset['X'] = np.random.randn(1000, 20)
dataset['y'] = np.random.choice([0, 1], 1000)
results = pipeline.run(dataset=dataset, clf=Customized)
print(results)
KIND OF REPETITIVE OUTPUT:
generating duplicate samples for dataset 39_vertebral...
current noise type: None
{'Samples': 1000, 'Features': 6, 'Anomalies': 138, 'Anomalies Ratio(%)': 13.8}
Error in model fitting. Model:Customized, Error: scikit-learn estimators should always specify their parameters in the signature of their __init__ (no varargs). <class 'adbench.baseline.Customized.model.LR'> with constructor (self, *args, **kwargs) doesn't follow this convention.
Current experiment parameters: ('39_vertebral', 1.0, 2), model: Customized, metrics: {'aucroc': nan, 'aucpr': nan}, fitting time: None, inference time: None
python 3.10.11
pyod = 1.0.0
MAC M2, Ventura 13
I FOUND THAT probably has to do with how parameters are feed, but i really dont think this could be the solution in t his ca se
https://stackoverflow.com/questions/40025406/inherit-from-scikit-learns-lassocv-model
Thank you again for your help
Hi,
I am a little new to Anomaly detection but I was curious about what is the right way to do cross validation while using ADBench as the test and train samples are already split via datagenerator. An easy way will be to concatenate test and train datasets and then put them in the CV loop, but is there a cleaner way possible?
Data sets with 50% anomalies are not anomaly detection!
More data sets does not mean more meaningful results, because "garbage in, garbage out".
One of the big problems with current anomaly detection research is that we do not use good data sets to evaluate results, hence everything works sometimes by chance, and there is little systematic benefits observable because the data sets are not properly labeled as anomalies.
I am by now convinced that from most of the commonly used data sets, you cannot draw meaningful conclusions because of unsuitable labeling.
Shall we add a setup.py to ensure that all the dependency are installed?
I needed to install it.
Shall we avoid passing ``ratio=sum(self.data['y_test']) / len(self.data['y_test'])''
Lines 206 to 207 in f3a9e94
Thanks for the great job! I wonder if it's possible to provide the link/source of the dataset so we can know more about them? Thanks a lot.
I am getting errors when running synthetic dependency anomalies for multiple datasets. I found this remark in data_generator.py "# we found that copula function may occur error in some datasets". How did you overcome this issue? The dependency anomalies fail to generate.
ImportError: cannot import name 'DataGenerator' from 'data_generator' (/Users/xxxx/opt/miniconda3/envs/py3.9/lib/python3.9/site-packages/data_generator/init.py)
Any suggestion on how to fix it?
Great and Enormours Work!
Do we have paralle computing setting to process large-scale data? Cause in reality, big data condition is more common and difficult.
This link is broken:
Line 62 in 783cf9f
I don't find any description of ALOI dataset in ADBench paper. And only an inference link of paper https://arxiv.org/pdf/1503.01158.pdf , and I can't find "ALOI" key words in this paper. Can you give more description about ALOI Dataset ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.