Giter Club home page Giter Club logo

kenchi's Introduction

image

image

image

image

image

image

image

image

image

image

image

kenchi

This is a scikit-learn compatible library for anomaly detection.

Dependencies

Installation

You can install via pip

pip install kenchi

or conda.

conda install -c y_ohr_n kenchi

Algorithms

  • Outlier detection
    1. FastABOD1
    2. LOF2 (scikit-learn wrapper)
    3. KNN3,4
    4. OneTimeSampling5
    5. HBOS6
  • Novelty detection
    1. OCSVM7 (scikit-learn wrapper)
    2. MiniBatchKMeans
    3. IForest8 (scikit-learn wrapper)
    4. PCA
    5. GMM (scikit-learn wrapper)
    6. KDE9 (scikit-learn wrapper)
    7. SparseStructureLearning10

Examples

import matplotlib.pyplot as plt
import numpy as np
from kenchi.datasets import load_pima
from kenchi.outlier_detection import *
from kenchi.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

np.random.seed(0)

scaler = StandardScaler()

detectors = [
    FastABOD(novelty=True, n_jobs=-1), OCSVM(),
    MiniBatchKMeans(), LOF(novelty=True, n_jobs=-1),
    KNN(novelty=True, n_jobs=-1), IForest(n_jobs=-1),
    PCA(), KDE()
]

# Load the Pima Indians diabetes dataset.
X, y = load_pima(return_X_y=True)
X_train, X_test, _, y_test = train_test_split(X, y)

# Get the current Axes instance
ax = plt.gca()

for det in detectors:
    # Fit the model according to the given training data
    pipeline = make_pipeline(scaler, det).fit(X_train)

    # Plot the Receiver Operating Characteristic (ROC) curve
    pipeline.plot_roc_curve(X_test, y_test, ax=ax)

# Display the figure
plt.show()

References


  1. Kriegel, H.-P., Schubert, M., and Zimek, A., "Angle-based outlier detection in high-dimensional data," In Proceedings of SIGKDD, pp. 444-452, 2008.

  2. Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J., "LOF: identifying density-based local outliers," In Proceedings of SIGMOD, pp. 93-104, 2000.

  3. Angiulli, F., and Pizzuti, C., "Fast outlier detection in high dimensional spaces," In Proceedings of PKDD, pp. 15-27, 2002.

  4. Ramaswamy, S., Rastogi, R., and Shim, K., "Efficient algorithms for mining outliers from large data sets," In Proceedings of SIGMOD, pp. 427-438, 2000.

  5. Sugiyama, M., and Borgwardt, K., "Rapid distance-based outlier detection via sampling," Advances in NIPS, pp. 467-475, 2013.

  6. Goldstein, M., and Dengel, A., "Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm," KI: Poster and Demo Track, pp. 59-63, 2012.

  7. Scholkopf, B., Platt, J. C., Shawe-Taylor, J. C., Smola, A. J., and Williamson, R. C., "Estimating the Support of a High-Dimensional Distribution," Neural Computation, 13(7), pp. 1443-1471, 2001.

  8. Liu, F. T., Ting, K. M., and Zhou, Z.-H., "Isolation forest," In Proceedings of ICDM, pp. 413-422, 2008.

  9. Parzen, E., "On estimation of a probability density function and mode," Ann. Math. Statist., 33(3), pp. 1065-1076, 1962.

  10. Ide, T., Lozano, C., Abe, N., and Liu, Y., "Proximity-based anomaly detection using sparse structure learning," In Proceedings of SDM, pp. 97-108, 2009.

kenchi's People

Contributors

y-ohr-n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

kenchi's Issues

Add a fetch_aloi function

References

Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., "Interpreting and unifying outlier scores," In Proceedings of SDM, pp. 13-24, 2011.

Add notebook examples

  • plot the ROC curves
  • compare various methods using 2-dimensional data
  • measure execution time of various methods
  • tune hyperparameters using the Lee-Liu metric
  • use UMAP and HDBSCAN
  • use explanation models

Implement Iterative sampling

References

Wu, M., and Jermaine, C., "Outlier detection by sampling with accuracy guarantees," In Proceedings of SIGKDD, pp. 767-772, 2006.

Add a load_wilt function

References

Goix, N., "How to evaluate the quality of unsupervised anomaly detection algorithms?" In ICML Anomaly Detection Workshop, 2016.

Implement iForest

References

Liu, F. T., Ting K. M., and Zhou, Z.-H., "Isolation forest," In Proceedings of ICDM, pp. 413-422, 2008.

Investigate SVDD

Tax, D. M. J., and Duin, R. P. W., "Support Vector Data Description," Machine Learning, 54, pp. 45-66, 2004.

Implement LoOP

References

Kriegel, H.-P., Kroger, P., Schubert E., and Zimek A., "LoOP: local outlier probabilities," In Proceedings of CIKM, pp. 1649-1652, 2009.

Add a fetch_adult function

References

Goix, N., "How to evaluate the quality of unsupervised anomaly detection algorithms?" In ICML Anomaly Detection Workshop, 2016.

Add a load_annthyroid function

References

Goix, N., "How to evaluate the quality of unsupervised anomaly detection algorithms?" In ICML Anomaly Detection Workshop, 2016.

Implement FBOD

References

Lazarevic, A., and Kumar, V., "Feature bagging for outlier detection," In Proceedings of SIGKDD, pp. 157–166, 2005.

Use Pipenv

Description

matplotlib and networkx are optional.

Implement AKNN

References

Angiulli, F., and Pizzuti, C., "Fast outlier detection in high dimensional spaces," In Proceedings of PKDD, pp. 15-27, 2002.

Add a load_pima function

References

Goix, N., "How to evaluate the quality of unsupervised anomaly detection algorithms?" In ICML Anomaly Detection Workshop, 2016.

Add common tests for outlier detection

Description

  • test_check_estimator
  • test_fit_predict
  • test_fit
  • test_predict
  • test_decision_function
  • test_score_samples
  • test_anomaly_score
  • test_plot_anomaly_score
  • test_plot_roc_curve
  • test_predict_notfitted
  • test_decision_function_notfitted
  • test_score_samples_notfitted
  • test_anomaly_score_notfitted
  • test_plot_anomaly_score_notfitted
  • test_plot_roc_curve_notfitted

Implement HBOS

References

Goldstein, M., and Dengel, A., "Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm," KI: Poster and Demo Track, pp. 59-63, 2012.

Add a fetch_kddcup99 function

References

Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., "Interpreting and unifying outlier scores," In Proceedings of SDM, pp. 13-24, 2011.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.