Repository of the Tranferlab Practical Anomaly Detection workshop

Home Page: https://transferlab.ai/trainings/practical-anomaly-detection

License: Creative Commons Attribution Share Alike 4.0 International

Dockerfile 0.01% Shell 0.01% Python 0.08% Jupyter Notebook 69.16% CSS 0.08% JavaScript 0.06% HTML 30.58% TeX 0.01%

transferlab anomaly-detection training machine-learning-course ml-course

tfl-training-practical-anomaly-detection's Introduction

Workshop: Practical Anomaly Detection

We've uploaded the full course on .

Target Audience

Entry to mid-level data scientists / machine learning engineers / ...

Goals

This workshop introduces the participants to the topic of anomaly detection. Inparticular, it provides initial answers for the following questions:

What is an anomaly?
Where can anomaly detection be applied?
What methods are available?
How can I evaluate the performance of an anomaly detection system?
What is Extreme Value Theory (EVT)?
How EVT contributes to anomaly detection?

Successful participants will acquire the basic theoretical knowledge and the practical skills to perform anomaly detection on simple use cases using state-of-the-art methods. While mostly covering introductory material, the workshop tries to provide enough depth to be interesting for participants with some experience in anomaly detection.

Prerequisites

We try to keep the prerequisites as low as possible. Some experience with standard machine learning methods and basic knowledge of the python machine learning stack are however highly recommended.

Setup

Besides setting up the environment yourself, we provide a devcontainer that can either be used locally or inside a GitHub Codespace. To quickly spin up an instance, holding the training's content and the necessary environment, click the green button "Code" in the top right corner of the repository and select "Codespaces" rather than local development.

If you prefer to work locally, you can set up the environment as follows:

We recommend to install rise with conda (installation with pip may cause problems). We also use the spellchecker and equation-numbering extensions.

To configure everything, activate a conda env and run

conda install -c conda-forge notebook rise jupyter_contrib_nbextensions jupyter_nbextensions_configurator
conda install -c conda-forge ffmpeg
python ./configure_spellcheck_dict.py
jupyter nbextension enable spellchecker/main
jupyter nbextension enable equation-numbering/main

Use the extension-configurator for customizing your slideshow as described here.

The hide_code extension is useful to see the slides in presentation mode. You can install it by typing the following lines

pip install hide_code
jupyter nbextension install --py hide_code
jupyter nbextension enable --py hide_code
jupyter serverextension enable --py hide_code

Finally, clone repository, change into the directory of the cloned repository and type

pip install -e .

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

tfl-training-practical-anomaly-detection's People

Contributors

Stargazers

Watchers

tfl-training-practical-anomaly-detection's Issues

Chapter reconstruction error issues in JupyterBook

Chapter uses huge amount of memory and is unusably slow on my laptop
Html the rendered chapter in Sphinx looks fine, however.
=> Low priority for now

Add a license file

stating that redistribution is forbidden, etc. (as discussed)

General cleanup

This issue gathers some things that popped up after a cursory glance at the repo:

Finally, please always keep in mind that we want to publish the notebooks eventually. For that they have to be orders of magnitude better than your run-of-the-mill medium post or kaggle notebook. This not only involves designing good examples and doing a sound analysis but telling a story (I am aware that this is mostly done orally during the workshop, but maybe we can advance a bit in the direction of a playbook for the notebooks we have?), avoiding any typos, being consistent in the code, etc. The more work is done now, the less it will be before publication.

Correct spelling mistakes

Notebook on EVT does not contain aai styling

The notebook is missing the presentation magic.

Fix VAE training

VAE training yields a suboptimal model. We're expecting better performance and think about the following:

increase batch size while keeping the rel. high learning rate
add Dense layer

Ignore `.devcontainer`

Fix notebook version

With notebook version 7.x the classic Jupyter notebook has changed to JupyterLab. Therefore, some dependencies will break. An example is RISE. There's a specific version jupyterlab-rise. This differs from the used rise package.

Therefore, decide which notebook version to use and accordingly adapt the installation of rise.

Fix animation in EVT Notebook

somehow ffmpeg is not available while running the animation when installed with pip. A solution to this should be found, otherwise the cell doesn't run on jhub.

Correct VAE KL derivation

The derivation of the ELBO given in the VAE example is the usual derivation where p(x) can be assumed to be constant. Add a few lines that explain the difference in the VAE setup (simultaneous minimization of KL divergence and increasing the likelihood through maximising the lower bound).

Remove leftovers from the template

E.g. in docs/getting-started.rst and docs/index.rst. This looks unprofessional.

Typo in `README.md` on installing the pkg

In the README.md it says

pip install -r requirements.txt
pip install -e src

However, it should not refer to src but the local dir .., as there's no setup.py in src.

Clean requirements

There are a lot of requirements that are just unused. This slows down env creation, sometimes blocking it into compatibility search hell. We should go through requirements.txt and purge all stuff that isnt strictly essential.

Publish pages after build

Since we don't have GH enterprise, we can't use GH pages. Either we wait until we do, or we use netlify.

Add dev container

Add a dev container to the repo. s.t. it can be easily run locally and using codespaces.

Test git LFS

We had issues with indexing being too slow for new repo clones with large LFS stores

Test this
Move to accsr if required? (new issue, consider bandwidth costs, etc.)

Delete unused branches

There are a lot of lot branches hanging around most of them can likely be deleted

Number notebooks

The notebooks are still unnumbered (we discussed this in the past). How is someone supposed to know the order in which they must be read? This information should be included in a detailed version of the agenda), but also we should use a prefix like 00 - this and that.ipynb

TypeError in last ex. of notebook 2

Following error when running notebook 2. Ocurrs in the last ex.

TypeError: Could not convert string 'kde' to numeric

Fix:

Add refs. or remove the link.

aai-institute / tfl-training-practical-anomaly-detection Goto Github PK