Giter Club home page Giter Club logo

aai-institute / tfl-training-practical-anomaly-detection Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 55.31 MB

Repository of the Tranferlab Practical Anomaly Detection workshop

Home Page: https://transferlab.ai/trainings/practical-anomaly-detection

License: Creative Commons Attribution Share Alike 4.0 International

Dockerfile 0.01% Shell 0.01% Python 0.08% Jupyter Notebook 69.16% CSS 0.08% JavaScript 0.06% HTML 30.58% TeX 0.01%
transferlab anomaly-detection training machine-learning-course ml-course

tfl-training-practical-anomaly-detection's Introduction

Workshop: Practical Anomaly Detection

We've uploaded the full course on youtube.

Target Audience

Entry to mid-level data scientists / machine learning engineers / ...

Goals

This workshop introduces the participants to the topic of anomaly detection. Inparticular, it provides initial answers for the following questions:

  1. What is an anomaly?
  2. Where can anomaly detection be applied?
  3. What methods are available?
  4. How can I evaluate the performance of an anomaly detection system?
  5. What is Extreme Value Theory (EVT)?
  6. How EVT contributes to anomaly detection?

Successful participants will acquire the basic theoretical knowledge and the practical skills to perform anomaly detection on simple use cases using state-of-the-art methods. While mostly covering introductory material, the workshop tries to provide enough depth to be interesting for participants with some experience in anomaly detection.

Prerequisites

We try to keep the prerequisites as low as possible. Some experience with standard machine learning methods and basic knowledge of the python machine learning stack are however highly recommended.

Setup

Besides setting up the environment yourself, we provide a devcontainer that can either be used locally or inside a GitHub Codespace. To quickly spin up an instance, holding the training's content and the necessary environment, click the green button "Code" in the top right corner of the repository and select "Codespaces" rather than local development.

If you prefer to work locally, you can set up the environment as follows:

We recommend to install rise with conda (installation with pip may cause problems). We also use the spellchecker and equation-numbering extensions.

To configure everything, activate a conda env and run

conda install -c conda-forge notebook rise jupyter_contrib_nbextensions jupyter_nbextensions_configurator
conda install -c conda-forge ffmpeg
python ./configure_spellcheck_dict.py
jupyter nbextension enable spellchecker/main
jupyter nbextension enable equation-numbering/main

Use the extension-configurator for customizing your slideshow as described here.

The hide_code extension is useful to see the slides in presentation mode. You can install it by typing the following lines

pip install hide_code
jupyter nbextension install --py hide_code
jupyter nbextension enable --py hide_code
jupyter serverextension enable --py hide_code

Finally, clone repository, change into the directory of the cloned repository and type

pip install -e .

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

tfl-training-practical-anomaly-detection's People

Contributors

fariedabuzaid avatar mgaja42 avatar mischapanch avatar turnmanh avatar xuzzo avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tfl-training-practical-anomaly-detection's Issues

General cleanup

This issue gathers some things that popped up after a cursory glance at the repo:

Finally, please always keep in mind that we want to publish the notebooks eventually. For that they have to be orders of magnitude better than your run-of-the-mill medium post or kaggle notebook. This not only involves designing good examples and doing a sound analysis but telling a story (I am aware that this is mostly done orally during the workshop, but maybe we can advance a bit in the direction of a playbook for the notebooks we have?), avoiding any typos, being consistent in the code, etc. The more work is done now, the less it will be before publication.

Fix VAE training

VAE training yields a suboptimal model. We're expecting better performance and think about the following:

  • increase batch size while keeping the rel. high learning rate
  • add Dense layer

Fix notebook version

With notebook version 7.x the classic Jupyter notebook has changed to JupyterLab. Therefore, some dependencies will break. An example is RISE. There's a specific version jupyterlab-rise. This differs from the used rise package.

Therefore, decide which notebook version to use and accordingly adapt the installation of rise.

Fix animation in EVT Notebook

somehow ffmpeg is not available while running the animation when installed with pip. A solution to this should be found, otherwise the cell doesn't run on jhub.

Correct VAE KL derivation

The derivation of the ELBO given in the VAE example is the usual derivation where p(x) can be assumed to be constant. Add a few lines that explain the difference in the VAE setup (simultaneous minimization of KL divergence and increasing the likelihood through maximising the lower bound).

Typo in `README.md` on installing the pkg

In the README.md it says

pip install -r requirements.txt
pip install -e src

However, it should not refer to src but the local dir .., as there's no setup.py in src.

Clean requirements

There are a lot of requirements that are just unused. This slows down env creation, sometimes blocking it into compatibility search hell. We should go through requirements.txt and purge all stuff that isnt strictly essential.

Publish pages after build

Since we don't have GH enterprise, we can't use GH pages. Either we wait until we do, or we use netlify.

Add dev container

Add a dev container to the repo. s.t. it can be easily run locally and using codespaces.

Test git LFS

We had issues with indexing being too slow for new repo clones with large LFS stores

  • Test this
  • Move to accsr if required? (new issue, consider bandwidth costs, etc.)

Number notebooks

The notebooks are still unnumbered (we discussed this in the past). How is someone supposed to know the order in which they must be read? This information should be included in a detailed version of the agenda), but also we should use a prefix like 00 - this and that.ipynb

Check NYC Taxi data

There are two versions of the NYC taxi data, one is used in EVT and the other in Time Series. Check if they can be aligned to read from only one data set and make the appropriate changes in the respective data imports.

Fix KaTeX Parse Error

Currently, we get an parser error by defining new LaTeX commands using the newcommand instruction. Change to renewcommand to fix.

image

Fix Spellcheck

build_scripts/configure_spellcheck_dict.py fails when downloading the dictionary.
Should be fixed in thesan template and then also fixed here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.