A Python package for monitoring dataset drift in production ML pipelines.
Built to run in any environment without uploading your data to external services.
More background on learning machines.
- Python 3.9
To install the latest version, run the following:
pip install -U learning-machines-drift
A simple example along with the below:
from learning_machines_drift import Dataset, Display, FileBackend, Monitor, Registry
from learning_machines_drift.datasets import example_dataset
# Make a registry to store datasets
registry = Registry(tag="tag", backend=FileBackend("backend"))
# Save example reference dataset of 100 samples
registry.save_reference_dataset(Dataset(*example_dataset(100, seed=0)))
# Log example dataset with 80 samples
with registry:
registry.log_dataset(Dataset(*example_dataset(80, seed=1)))
# Monitor to interface with registry and load datasets
monitor = Monitor(tag="tag", backend=registry.backend).load_data()
# Measure drift and display results as a table
Display().table(monitor.metrics.scipy_kolmogorov_smirnov())
For a local copy:
git clone [email protected]:alan-turing-institute/learning-machines-drift
cd learning-machines-drift
To install:
poetry install
To install with dev
and docs
dependencies:
poetry install --with dev,docs
Run:
poetry run pytest
Run:
poetry run pre-commit run --all-files
To run checks before every commit, install as a pre-commit hook:
poetry run pre-commit install
An overview of what else exists and why we have made something different:
-
Cloud based
-
Python
-
ML pipelines: End to end machine learning lifecycle
- No vendor lock in
- Run on any platform, in any environment (your local machine, cloud, on-premises)
- Work with existing Python frameworks (e.g. scikit-learn)
- Open source