Giter Club home page Giter Club logo

anomaly_detection_using_databricks's Introduction

This demo uses the kaggle CC dataset to show it can be applied for anomaly detection. The specific components used are

-> The DBR ML runtime 
-> DBR managed MLFlow
-> Databricks ML Serving
-> Optimized Pyspark Pandas UDFs for batch scoring.

There are a couple of examples here in this repo (as of June 1 2022).

If you are just getting started with anomaly detection, start from anomaly_detection_using_iforest_dlt folder. In there you will find the following assets. If you are already familiar with anomaly detection and you want to implement a custom anomaly detection model, head over to the next section of this document.

  1. iforest_train_register.py - quickly trains an isolation forest model and registers it to the model registry. Tweak it as you see fit for your use.
  2. json_record_generator.py - shows a way to generate dummy json files should you choose to use the demo data from kaggle.
  3. dlt_task_0_iforest_dlt_udf.py - contains code that wrap the model into a pandas udf for scoring inside DLT
  4. dlt_task_1_anomaly_detector.sql - shows a simplified end to end DLT workflow that reads the json files created in step 2 and scores that with the trained anomaly detection model registered as a udf on step 3.

To read more about DLT and help getting started with DLT - visit the following link

Scoring custom models for anomaly detection using DLT

  1. anomaly_detection_deepsvdd_dbr.py - shows how to implement a custom MLFlow model on databricks for anomaly detection.
  2. json_record_generator.py - shows a way to generate dummy json files should you choose to use the demo data from kaggle.
  3. anomaly_detector_dlt.sql - shows a part of the working dlt pipeline that must be used with the code from the step below
  4. Define_and_register_inference_udf.py- registers and uses the model inside the dlt pipeline

anomaly_detection_using_databricks's People

Contributors

avisoori-databricks avatar sathishgang-db avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.