Giter Club home page Giter Club logo

autotim's Introduction

AutoTiM

Train, store and use machine learning models for (multivariate) time series classification.

About AutoTiM

This Python application was developed as part of the Service-Meister project. The goal of AutoTiM is to train, store and use machine learning models for (multivariate) time series classification. Both training and feature engineering are fully automated. The user simply uploads a previously processed time series dataset via the /store endpoint and can use the /train endpoint to train a set of different model architectures from which the best performing model will be persisted. Finally, the user can test or use the model with the /predict endpoint. (For more information on how to use this service, see the Tutorial)

This project is part of the collaboration between KROHNE Messtechnik GmbH and inovex GmbH within the Service-Meister research project.

Contact: https://www.inovex.de/en/contact-us/

Paper: https://doi.org/10.1007/978-3-031-34107-6_21

GitHub: https://github.com/inovex/AutoTiM

Table of Contents

1. Setup and local execution
2. Service components and architecture
2.1 Architecture overview
2.2 Endpoints
      2.2.1 Store
      2.2.2 Train
      2.2.3 Predict
3. Tutorial
4. Troubleshooting
5. Project Organization

1. Setup and local execution

For starting the local execution of the service, there is a autotim/autotim_execution/docker-compose.yml which contains all the necessary services.

Change Environment Variales (Optional)

Configs can be changed in the ./autotim/autotim_execution/.env directory. Each service has a separate .env file. New environment variables can be easily added or existing ones modified. Mandatory variables are marked accordingly.

In some Docker alternatives (e.g. lima nerdctl), the inclusion of .env files does not work. In this case, the variables must be added directly in the docker-compose file under the respective server in the environment tab.


If you want to use Google Cloud Platform in local execution for storage, you must store a gcp-service.json key. The template for this can be found at autotim/autotim_execution/.env/gcp-service.json.template.

Setup local execution

  1. Install Docker-compose (or an equivalent).

    The commercial usage of Docker is licensed. (Find a suitable alternative for your OS, if using it commercially)

  2. Change directory into ./autotim/autotim_execution.

  3. Build all necessary containers:

    docker-compose build

    If you want to completely rebuild it, use:

    docker build --no-cache

Start local execution

  1. Start all services and volumes:

    docker-compose up 

    This will start 3 separate docker containers:

    • AutoTiM service to /store datasets, /train models and /predict on your new data;
    • MlFlow service to log models;
    • PostgreSQL database for the model metadata.

    You can access the services under the following links.

    Service Address Note
    AutoTiM http://localhost:5004 username=admin , password=password
    MlFlow http://localhost:5000 if it does not work try: http://0.0.0.0:5000

Reset local execution

Each of the docker containers have a labeled docker volume to persist data between sessions. To delete data and reinitialize your environment (execute these commands from the ./autotim/autotim_execution/ directory):

stop all running containers:

docker-compose down

stop all running containers and delete persisted volumes (data and models):

docker-compose down --volumes

(Extra feature) Start local execution with Google Cloud Storage (GCS)

The autotim service is compatible with the GCS (especially the Google buckets) if a suitable infrastructure is available. You can change the env-variable "STORAGE" in autotim.env from "local" to "GCS" and fill in the other gcp-variables in the specified file accordingly. If so, the gcp-service.json.template must also be set in order to be authorized to connect to the GCP and thus the Google Bucket. It is crucial that that gcp-service.json has a Google Service Account, which has the rights to list buckets, create new buckets, read from and write to them. In this case the execution is local, only the storage of models and datasets is moved to the Google Cloud Platform (GCP).

2. Service components and architecture

2.1 Architecture overview

Architecture overview

Our AutoTiM service provides the endpoints /store, /train and /predict. Datasets are stored in the local file system (or GCS), trained models and associated metadata are logged via MLFlow, a 3rd-party tool.

2.2 Endpoints

For each of the three endpoints of the AutoTiM-Service this section provides:

  • an architecture diagram, showcasing which components of the overall service are influenced by requests sent to the endpoint;
  • explanations for the parameters and a request example (with curl);
  • eventflow within the endpoint implementation and explanation for the HTTP-Responses the endpoint throws.

2.2.1 Store

Store Workflow

The /store-endpoint is used to upload datasets that can later be used for training. This requires specifying the name of the use case for which the dataset is intended and a name for the identifier of the dataset.

Required parameters

use_case_name (str): Name of the experiment / project
dataset_identifier (str): Name of the dataset within your project
file: csv-file containing the dataset to be stored

Example use with curl
curl -i -X POST --user <username>:<password> -F "file=@<local path to the data file>" -F "use_case_name=<use case / experiment name>"
-F "dataset_identifier=<dataset name>" <URL to the /store endpoint>
Event Flow & HTTP-Responses

Store Eventflow

2.2.2 Train

Train Workflow

The /train-endpoint is used to train a model. For this, the use case as well as the data identifier must be specified. Optionally, the parameters below can be set. The /train-endpoint uses a dataset previously uploaded via /store, performs automatic feature-engineering, and conducts an automatic training and tuning of many models. The best model from this training is versioned in MLFLow and compared to models previously trained on the same dataset. The best of these models is marked with the production flag. The model with the production flag is by default used for predictions for the respective use case and dataset combination, if no other version is explicitly requested.

Required parameters

use_case_name (str): Name of the experiment / project
dataset_identifier (str): Name of the dataset within your project

Optional parameters

column_id: Name of the id column, which assigns each row to a time series (default column name: id)
column_label: Name of the column containing the classification labels (default: label)
column_sort: Name of the column that contains values which allow to sort the time series, e.g. time stamps (default: time)
column_value: Name of the column that contains the actual values of the time series, e.g. sensor data (default: None)
column_kind: Name of the column that indicates the names of the different time series types, e.g. different sensors (default: None)
train_size: Proportion of the dataset to include in the train data when performing the train-test-split (default: 0.6)
recall_average: Metric to be used to calculate the recall and precision score (default: micro; possible metrics are: micro, macro, samples, weighted, binary or None)
metric: Metric to be used for the model selection (default: accuracy; possible metrics are: accuracy, balanced_accuracy, recall_score, precision_score)
max_features: Maximum number of features used for training (default: 1000)
features_decrement: Decrement step of features when a recursion error occurs.
If smaller then 1 this will be percentage based otherwise it will be an absolute value (default: 0.9)
max_attempts: Maximum number of attempts for training when failing due to a recursion error. (default: 5)
train_time: Time in minutes used for training the model (time for feature engineering is excluded). If not specified it uses the dynamic training time (between 2 and 30 Minutes). (default: dynamic)
evaluation_identifier: Name of the dataset within your project, only used for evaluation (test dataset). If specified, data from dataset_identifier is only used for training, instead of being used for train and test.

Example use
https://<URL to the AutoTiM-Service>/train?use_case_name=<use case>&dataset_identifier=<dataset name>
Event Flow & HTTP-Responses

Train Eventflow

2.2.3 Predict

Predict Workflow

The /predict-endpoint returns predictions from one or more given data points for the selected use case. By default, the model with the production flag is used, but the model version can be specified if needed.

Required parameters

use_case_name (str): Name of the experiment / project
dataset_identifier (str): Name of the dataset within your project
file: csv-file containing one or more time series instances for the prediction

Optional parameters

model_version (int): Version of the model (as listed in MLFLow) to be used for prediction (default: production model)

Example use with curl
curl -i -X POST --user <username>:<password> -F "file=@<local path to the data file>" -F "use_case_name=<use case / experiment name>" -F "dataset_identifier=<dataset version>" <URL to the /predict endpoint>
Event Flow & HTTP-Responses

Predict Eventflow

3. Tutorial

Our tutorial shows you how to use all endpoints to train your first model with this service, using a sample dataset. You can find it here.

4. Troubleshooting

To run the unittests, run coverage run -m unittest
(You need to have coverage installed for this)

5. Project Organization

├── AUTHORS.md                 <- List of developers and maintainers.
├── CHANGELOG.md               <- Changelog to keep track of new features and fixes.
├── LICENSE.txt                <- License as chosen on the command-line.
├── README.md                  <- The top-level README for developers.
├── doc                        <- Directory for README files.
├── autotim/autotim-execution  <- Directory for local execution.          
├── autotim                    <- Actual Python code where the main functionality goes.
├── tests_autotim              <- Unit tests.
├── .pylintrc                  <- Configuration for pylint tests.
├── doc/tutorial               <- Tutorial for using the service.
├── Dockerfile                 <- Dockerfile for AutoTiM service.
└── requirements.txt           <- Configuration of used python packages.

Note

This project uses the package tsfresh for feature engineering and h2o to generate the models. For the local execution, postgres11 and mlflow container are used.

autotim's People

Contributors

jakobke avatar thomas-caffin-sune avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.