The learning-machines-drift from alan-turing-institute

Update SDMetrics

Current dependencies are here.

I am creating this issue because SDMetrics has been updated significantly from 0.6 to 0.8.

The plan is:

Create new branch my/update-sdv
Delete poetry.lock,
Update pyproject.toml sdmetrics = "^0.6.0" to sdmetrics = "0.8.0"
Run poetry update

I see that the poetry.lock file has been updated and sdmetrics is now at version 0.8.

Run pytests.

Tests run fine. Simple example works.

Capturing model explainability, predictions and uncertainty within monitor

Consider how to capture model meta-features (like feature importances) as part of the monitor.

Handle overall drift measure cases with missing columns

Calculate the the overall drift scores for the overall dataset measures only on columns common to both dataframes.

Add a class which filters logged data

We have the following classes:

Monitor: Logs data and saves to datastore
Registry: Loads all the logged data and then passes to HypothesisTest which applies drift metrics.

It would be good to have a new class that sits in the middle of Registry and HypothesisTest which allows you to query and filter data in the Registry:

Examples might include:

Filter by the date the data was logged
Filter by the feature value

You may also want to do split-apply-combine operations, for example:

Split the data into weeks and compare each week to the reference dataset.

Onboard Sam

Starter drift literature
Background for Alzheimer's project
Background and ethics
Proposal for Ministry of Justice
Team notes on drift
Add Sam to Learning Machines
Run Learning Machines Drift (Co-working?)

Publish the package on PyPI

Identify requirements for publishing the package

Initial thoughts:

Complete missing docstrings ("TODO PEP257"). Do we have a docstring style we're using?
Formal documentation for API (e.g. sphinx)
New repo to remove data from previous commits
Distinct examples for demostrating the package

Measuring drift in temporal/longitudinal datasets

SD metrics implements a GMM for probabilistic modelling of continuous static data. Held-out data (registered/synthetic/etc) can then be scored with a likelihood given the fit on the reference dataset.

A similar approach could be considered for temporal data with probabilistic models such as HMM.

SD metrics also implements so time series metrics.

Tool for generating tabular/graphical drift outputs from passed data

Aim: to convert the prototyping work from cases into a flexible drift report generating script.

Make a script that parses an input config and csv and outputs html/pdf tables and graphs reporting drift trends from the data.

Input interface (csv, flags) and config file (e.g. toml)
Code for the iterating through the data with some trend schema (e.g. year by year) to generate output measures
Identify what the report outputs should exactly look like (e.g. tables of test statistic values, graphs of p-values)

Packages to consider using: unitreport

Display multiple metrics in single output

Add functionality to Metrics and Display that provides output for multiple metrics in a single call.

Monitor outcomes and other information gathered with outcomes

Extend documentation for measures

Decide which metrics to include in first iteration of package

What metrics do we want for the demo, and which packages provide implementations

SDV:
- Likelihood
Interpretability https://www.youtube.com/watch?v=9MJ3nxyE1Lc

Add fake data set for tests and designing API

A couple of simple datasets to cover:

Covariate shift (change in the relationship between input features)
Concept drift (change in the relationship between features and labels)

Fix python dependency to include `python=3.9`

Currently dependency specified as:

[tool.poetry.dependencies]
python = ">3.9,<3.10"

Installation is currently failing and should be updated to include python=3.9:

[tool.poetry.dependencies]
python = ">=3.9,<3.10"

Sketch out API for drift package

API design

This is a very rough first sketch of what the LM drift detection might look like.

The key feature of the drift detection library is it will compare datasets seen in production to a reference dataset (normally the dataset used to train the production model).

There are a few things to keep in mind:

We need to support logging a single row of data at a time. In the Alzheimer's dataset, only one patient will be logged at once.
We will need to persist data to storage. In the demo this will be the filesystem, but could support databases. As such we need some level of abstraction over backends.

An example of using the package

A simple API design based on whylogs might look like this. This first example assumes we have a trained classifier we are using in production.

with DriftDetector(tag="test_tag", expect_features = True, expect_labels = True, expect_latent = False) as detector:

    # Normally X and Y would come from a model fit. Here we use a sample dataset
    X, Y = datasets.logistic_model()

    detector.log_features(X)
    detector.log_labels(Y)

Let's assume this stores the features and labels in some persistent storage (e.g. disk, database). The tag argument to DriftDetector is a unique tag for the model being monitored. The arguments expect_{features | labels | latent} are optional, if set to True an exception will be raised if the features | labels | latent hasn't been logged by the time the context manager exits.

We must register a reference dataset to compare drift against. This is normally the dataset used to train the model. This can be registered before model inference, in which case we could extend our output above to provide information on drift at model inference time:

register_reference(tag = "prod_model", features = X_ref, labels = Y_ref)

with DriftDetector(tag="prod_model", expect_features = True, expect_labels = True, expect_latent = False) as detector:

    # Normally X and Y would come from a model fit. Here we use a sample dataset
    X, Y = datasets.logistic_model()

    detector.log_features(X)
    detector.log_labels(Y)
   
    # Write drift detection metrics to stdout
    detector.summary()

or we could register it afterwards

register_reference(tag = "prod_model", features = X_ref, labels = Y_ref)

with DriftDetector(tag="prod_model",) as detector:
     # Write drift detection metrics to stdout
    detector.summary()

Summary

We can load the data associated with a tag, ensuring the reference dataset is loaded.

with DriftDetector(tag="prod_model", expect_reference = True) as detector:
     # Write drift detection metrics to stdout
    detector.summary()

Rather than just writing to stdout we probably want to return in a data structure (e.g. Dictionary or data class).

Make initial visualisation class

Make a class to plot the scores from a returned hypothesis tests dict.

For example, function could be like:

def visualise(dict_of_variables: dict[str, dict[str, float]]) -> plt.axes:
    ...

Plot should have x-axis as keys of dict and y-axes as score values in the dict from key.

Check backend class is functioning as expected

Check whether the class Backend(Protocol) is functioning as expected (as an interface)
Consider whether it would be helpful to implement as an abstract base class.

alan-turing-institute / learning-machines-drift Goto Github PK

learning-machines-drift's Introduction

Learning Machines

Background

Getting started

Requirements

Install

Example usage

Development

Install

Tests

pre-commit checks

Other tools

What LM does differently

learning-machines-drift's People

Contributors

Stargazers

Watchers

Forkers

learning-machines-drift's Issues

What metrics do we want for the demo, and which packages provide implementations

API design

An example of using the package

Summary

Recommend Projects

Recommend Topics

Recommend Org