Giter Club home page Giter Club logo

ageml's Introduction

ageML

ageML is a framework designed to study the temporal performance degradation of machine learning models. The goal of this project is to facilitate the exploration of performance degradation by providing tools that allow users to easily test how their models would evolve over time when trained and tested on different periods of their data.

Disclaimer: This project is still in its early stages, so the code interface might change in the future, and some elements might be hardcoded. However, the idea is to improve it over time, making it more user-friendly.

temporal degradation plot of a linear regressor on the avocado sales dataset

Features

Currently, ageML implements one test to study the "aging" process that machine learning models can experience when in production due to covariate or concept shift.

Temporal Degradation Test

Examines how various models perform when trained on different samples of the same dataset. This framework is based on the aging framework developed by Vela et al. in 2022.

temporal degradation test

WIP: Continuous Retraining Test

Simulates a fixed-schedule retraining process of a machine learning model in production.

continuous retraining test

Installation

The package hasn't been published on PyPI yet, which means you cannot install it via the regular Python channels. Instead, you'll have to clone the repository and install it from your local copy.

git clone https://github.com/santiviquez/ageml.git
cd ageml
pip install .

Quickstart

from ageml import TemporalDegradation
from ageml.datasets import load_avocado_sales
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

data = load_avocado_sales()

experiment = TemporalDegradation(
    timestamp_column_name='inference_time',
    target_column_name='demand',
    n_train_samples=52,
    n_test_samples=12,
    n_prod_samples=24,
    n_simulations=10)

experiment.run(data, model=LinearRegression())

experiment.plot(
    freq='W',
    metric=mean_absolute_error,
    min_test_error=1e7,
    plot_name='Model Ageing Chart: Avocado Sales Prediction - LinearRegression')

results = experiment.get_results(
    freq='W',
    metric=mean_absolute_error,
    min_test_error=1e7)

print(results)

Quickstart

Check out the issues page if you want to start building this with me ๐Ÿ˜Š

Author

ageml's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ageml's Issues

Perform hyperparameter tuning when running the TemporalDegradation test

Motivation: describe the problem to be solved
Right now, in each simulation run in the TemporalDegradation test, the model is fitted using the default parameters. It would be nice if each simulation were fitted on some optimal parameters that are defined automatically inside the flow. In this case, the aging experiment would resemble a real-life scenario where only optimal models are analyzed, and we get to study how this hyperparameter becomes invalid at some point. Hence, we see some performance changes.

Describe the solution you'd like
Use something like optuna to automate the hyperparameter search, possibly optuna.integration.OptunaSearchCV.

The user can provide the search space in the TemporalDegradation.run() method, together with how many tries they want to perform.

example:

experiment = TemporalDegradation(
    timestamp_column_name='inference_time',
    target_column_name='demand',
    n_train_samples=52,
    n_test_samples=12,
    n_prod_samples=24,
    n_simulations=10)

random_forest_params = {
    'n_estimators': optuna.distributions.IntDistribution(100, 400, 1),
    'max_depth': optuna.distributions.IntDistribution(1, 13),
    'min_samples_split': optuna.distributions.IntDistribution(2, 10)}

experiment.run(
    data, 
    model=RandomForestRegressor(),
    hyperparameter_distributions=random_forest_params,
    n_hyperparameter_tunning_trials=10)

Additional context
I tried something like this in an old commit; it might work as inspiration, but the code there is very hard code, so take it with a grain of salt.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.