vanderschaarlab / temporai Goto Github PK

View Code? Open in Web Editor NEW

89.0 7.0 18.0 4.64 MB

TemporAI: ML-centric Toolkit for Medical Time Series

Home Page: https://www.temporai.vanderschaar-lab.com/

License: Apache License 2.0

Python 68.46% Jupyter Notebook 31.54%

machine-learning medicine time-series automl

temporai's Introduction

TemporAI

⚗️ Status: This project is still in alpha, and the API may change without warning.

📃 Overview

TemporAI is a Machine Learning-centric time-series library for medicine. The tasks that are currently of focus in TemporAI are: time-to-event (survival) analysis with time-series data, treatment effects (causal inference) over time, and time-series prediction. Data preprocessing methods, including missing value imputation for static and temporal covariates, are provided. AutoML tools for hyperparameter tuning and pipeline selection are also available.

How is TemporAI unique?

🏥 Medicine-first: Focused on use cases for medicine and healthcare, such as temporal treatment effects, survival analysis over time, imputation methods, models with built-in and post-hoc interpretability, ... See methods.
🏗️ Fast prototyping: A plugin design allowing for on-the-fly integration of new methods by the users.
🚀 From research to practice: Relevant novel models from research community adapted for practical use.
🌍 A healthcare ecosystem vision: A range of interactive demonstration apps, new medical problem settings, interpretability tools, data-centric tools etc. are planned.

Key concepts

🚀 Installation

Instal with `pip`

From the Python Package Index (PyPI):

$ pip install temporai

Or from source:

$ git clone https://github.com/vanderschaarlab/temporai.git
$ cd temporai
$ pip install .

Install in a conda environment

While have not yet published TemporAI on conda-forge, you can still install TemporAI in your conda environment using pip as follows:

Create and activate conda environment as normal:

$ conda create -n <my_environment>
$ conda activate <my_environment>

Then install inside your conda environment with pip:

$ pip install temporai

💥 Sample Usage

(▶️ Expand to view the sections below.)

List the available plugins

from tempor import plugin_loader

print(plugin_loader.list())

Use a time-to-event (survival) analysis model

from tempor import plugin_loader

# Load a time-to-event dataset:
dataset = plugin_loader.get("time_to_event.pbc", plugin_type="datasource").load()

# Initialize the model:
model = plugin_loader.get("time_to_event.dynamic_deephit")

# Train:
model.fit(dataset)

# Make risk predictions:
prediction = model.predict(dataset, horizons=[0.25, 0.50, 0.75])

Use a temporal treatment effects model

import numpy as np

from tempor import plugin_loader

# Load a dataset with temporal treatments and outcomes:
dataset = plugin_loader.get(
    "treatments.temporal.dummy_treatments",
    plugin_type="datasource",
    temporal_covariates_missing_prob=0.0,
    temporal_treatments_n_features=1,
    temporal_treatments_n_categories=2,
).load()

# Initialize the model:
model = plugin_loader.get("treatments.temporal.regression.crn_regressor", epochs=20)

# Train:
model.fit(dataset)

# Define target variable horizons for each sample:
horizons = [
    tc.time_indexes()[0][len(tc.time_indexes()[0]) // 2 :] for tc in dataset.time_series
]

# Define treatment scenarios for each sample:
treatment_scenarios = [
    [np.asarray([1] * len(h)), np.asarray([0] * len(h))] for h in horizons
]

# Predict counterfactuals:
counterfactuals = model.predict_counterfactuals(
    dataset,
    horizons=horizons,
    treatment_scenarios=treatment_scenarios,
)

Use a missing data imputer

from tempor import plugin_loader

dataset = plugin_loader.get(
    "prediction.one_off.sine", plugin_type="datasource", with_missing=True
).load()
static_data_n_missing = dataset.static.dataframe().isna().sum().sum()
temporal_data_n_missing = dataset.time_series.dataframe().isna().sum().sum()

print(static_data_n_missing, temporal_data_n_missing)
assert static_data_n_missing > 0
assert temporal_data_n_missing > 0

# Initialize the model:
model = plugin_loader.get("preprocessing.imputation.temporal.bfill")

# Train:
model.fit(dataset)

# Impute:
imputed = model.transform(dataset)
temporal_data_n_missing = imputed.time_series.dataframe().isna().sum().sum()

print(static_data_n_missing, temporal_data_n_missing)
assert temporal_data_n_missing == 0

Use a one-off classifier (prediction)

from tempor import plugin_loader

dataset = plugin_loader.get("prediction.one_off.sine", plugin_type="datasource").load()

# Initialize the model:
model = plugin_loader.get("prediction.one_off.classification.nn_classifier", n_iter=50)

# Train:
model.fit(dataset)

# Predict:
prediction = model.predict(dataset)

Use a temporal regressor (forecasting)

from tempor import plugin_loader

# Load a dataset with temporal targets.
dataset = plugin_loader.get(
    "prediction.temporal.dummy_prediction",
    plugin_type="datasource",
    temporal_covariates_missing_prob=0.0,
).load()

# Initialize the model:
model = plugin_loader.get("prediction.temporal.regression.seq2seq_regressor", epochs=10)

# Train:
model.fit(dataset)

# Predict:
prediction = model.predict(dataset, n_future_steps=5)

Benchmark models, time-to-event task

from tempor.benchmarks import benchmark_models
from tempor import plugin_loader
from tempor.methods.pipeline import pipeline

testcases = [
    (
        "pipeline1",
        pipeline(
            [
                "preprocessing.scaling.temporal.ts_minmax_scaler",
                "time_to_event.dynamic_deephit",
            ]
        )({"ts_coxph": {"n_iter": 100}}),
    ),
    (
        "plugin1",
        plugin_loader.get("time_to_event.dynamic_deephit", n_iter=100),
    ),
    (
        "plugin2",
        plugin_loader.get("time_to_event.ts_coxph", n_iter=100),
    ),
]
dataset = plugin_loader.get("time_to_event.pbc", plugin_type="datasource").load()

aggr_score, per_test_score = benchmark_models(
    task_type="time_to_event",
    tests=testcases,
    data=dataset,
    n_splits=2,
    random_state=0,
    horizons=[2.0, 4.0, 6.0],
)

print(aggr_score)

Serialization

from tempor.utils.serialization import load, save
from tempor import plugin_loader

# Initialize the model:
model = plugin_loader.get("prediction.one_off.classification.nn_classifier", n_iter=50)

buff = save(model)  # Save model to bytes.
reloaded = load(buff)  # Reload model.

# `save_to_file`, `load_from_file` also available in the serialization module.

AutoML - search for the best pipeline for your task

from tempor.automl.seeker import PipelineSeeker

dataset = plugin_loader.get("prediction.one_off.sine", plugin_type="datasource").load()

# Specify the AutoML pipeline seeker for the task of your choice, providing candidate methods,
# metric, preprocessing steps etc.
seeker = PipelineSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    return_top_k=3,
    num_iter=100,
    tuner_type="bayesian",
    static_imputers=["static_tabular_imputer"],
    static_scalers=[],
    temporal_imputers=["ffill", "bfill"],
    temporal_scalers=["ts_minmax_scaler"],
)

# The search will return the best pipelines.
best_pipelines, best_scores = seeker.search()  # doctest: +SKIP

📖 Tutorials

Data

User Guide

Extending TemporAI

📘 Documentation

See the full project documentation here.

Note on documentation versions:

If you have installed TemporAI from PyPI, you should refer to the stable documentation.
If you have installed TemporAI from source, you should refer to the latest documentation.

See the Instal with pip section for reference.

🌍 TemporAI Ecosystem (Experimental)

We provide additional tools in the TemporAI ecosystem, which are in active development, and are currently (very) experimental. Suggestions and contributions are welcome!

These include:

temporai-clinic: A web app tool for interacting and visualising TemporAI models, data, and predictions.
temporai-mivdp: A MIMIC-IV-Data-Pipeline adaptation for TemporAI.

🔑 Methods

(▶️ Expand to view the sections below.)

Time-to-Event (survival) analysis over time

Risk estimation given event data (category: time_to_event)

Name	Description	Reference
`dynamic_deephit`	Dynamic-DeepHit incorporates the available longitudinal data comprising various repeated measurements (rather than only the last available measurements) in order to issue dynamically updated survival predictions	Paper
`ts_coxph`	Create embeddings from the time series and use a CoxPH model for predicting the survival function	---
`ts_xgb`	Create embeddings from the time series and use a SurvivalXGBoost model for predicting the survival function	---

Treatment effects

One-off

Treatment effects estimation where treatments are a one-off event.

Regression on the outcomes (category: treatments.one_off.regression)

Name	Description	Reference
`synctwin_regressor`	SyncTwin is a treatment effect estimation method tailored for observational studies with longitudinal data, applied to the LIP setting: Longitudinal, Irregular and Point treatment.	Paper

Temporal

Treatment effects estimation where treatments are temporal (time series).

Classification on the outcomes (category: treatments.temporal.classification)

Name	Description	Reference
`crn_classifier`	The Counterfactual Recurrent Network (CRN), a sequence-to-sequence model that leverages the available patient observational data to estimate treatment effects over time.	Paper

Regression on the outcomes (category: treatments.temporal.regression)

Name	Description	Reference
`crn_regressor`	The Counterfactual Recurrent Network (CRN), a sequence-to-sequence model that leverages the available patient observational data to estimate treatment effects over time.	Paper

Prediction

One-off

Prediction where targets are static.

Classification (category: prediction.one_off.classification)

Name	Description	Reference
`nn_classifier`	Neural-net based classifier. Supports multiple recurrent models, like RNN, LSTM, Transformer etc.	---
`ode_classifier`	Classifier based on ordinary differential equation (ODE) solvers.	---
`cde_classifier`	Classifier based Neural Controlled Differential Equations for Irregular Time Series.	Paper
`laplace_ode_classifier`	Classifier based Inverse Laplace Transform (ILT) algorithms implemented in PyTorch.	Paper

Regression (category: prediction.one_off.regression)

Name	Description	Reference
`nn_regressor`	Neural-net based regressor. Supports multiple recurrent models, like RNN, LSTM, Transformer etc.	---
`ode_regressor`	Regressor based on ordinary differential equation (ODE) solvers.	---
`cde_regressor`	Regressor based Neural Controlled Differential Equations for Irregular Time Series.	Paper
`laplace_ode_regressor`	Regressor based Inverse Laplace Transform (ILT) algorithms implemented in PyTorch.	Paper

Temporal

Prediction where targets are temporal (time series).

Classification (category: prediction.temporal.classification)

Name	Description	Reference
`seq2seq_classifier`	Seq2Seq prediction, classification	---

Regression (category: prediction.temporal.regression)

Name	Description	Reference
`seq2seq_regressor`	Seq2Seq prediction, regression	---

Preprocessing

Feature Encoding

Static data (category: preprocessing.encoding.static)

Name	Description	Reference
`static_onehot_encoder`	One-hot encode categorical static features	---

Temporal data (category: preprocessing.encoding.temporal)

Name	Description	Reference
`ts_onehot_encoder`	One-hot encode categorical time series features	---

Imputation

Static data (category: preprocessing.imputation.static)

Name	Description	Reference
`static_tabular_imputer`	Use any method from HyperImpute (HyperImpute, Mean, Median, Most-frequent, MissForest, ICE, MICE, SoftImpute, EM, Sinkhorn, GAIN, MIRACLE, MIWAE) to impute the static data	Paper

Temporal data (category: preprocessing.imputation.temporal)

Name	Description	Reference
`ffill`	Propagate last valid observation forward to next valid	---
`bfill`	Use next valid observation to fill gap	---
`ts_tabular_imputer`	Use any method from HyperImpute (HyperImpute, Mean, Median, Most-frequent, MissForest, ICE, MICE, SoftImpute, EM, Sinkhorn, GAIN, MIRACLE, MIWAE) to impute the time series data	Paper

Scaling

Static data (category: preprocessing.scaling.static)

Name	Description	Reference
`static_standard_scaler`	Scale the static features using a StandardScaler	---
`static_minmax_scaler`	Scale the static features using a MinMaxScaler	---

Temporal data (category: preprocessing.scaling.temporal)

Name	Description	Reference
`ts_standard_scaler`	Scale the temporal features using a StandardScaler	---
`ts_minmax_scaler`	Scale the temporal features using a MinMaxScaler	---

🔨 Tests and Development

Install the testing dependencies using:

pip install .[testing]

The tests can be executed using:

pytest -vsx

For local development, we recommend that you should install the [dev] extra, which includes [testing] and some additional dependencies:

pip install .[dev]

For development and contribution to TemporAI, see:

✍️ Citing

If you use this code, please cite the associated paper:

@article{saveliev2023temporai,
  title={TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks for Medicine},
  author={Saveliev, Evgeny S and van der Schaar, Mihaela},
  journal={arXiv preprint arXiv:2301.12260},
  year={2023}
}

temporai's People

Contributors

Stargazers

Watchers

Forkers

hertera1 tztsai scxsunchenxi criskgr karbazhyev techthiyanes xueyagaga rhobar9 wenjuanw martincooperbiz divyanshchoubisa mantejgill lrq3000 drsecond54 julianklug evanwu19 xs018

temporai's Issues

[Enhancement] Integrate jaxtyping for advanced parameter validation #120

Description

An improvement on top of pydantic would be to integrate jaxtyping, which allows for validating tensor shapes as well
jaxtyping supports PyTorch tensors and numpy arrays.

Example

from jaxtyping import Array, Float, PyTree

# Accepts floating-point 2D arrays with matching dimensions
def matrix_multiply(x: Float[Array, "dim1 dim2"],
                    y: Float[Array, "dim2 dim3"]
                  ) -> Float[Array, "dim1 dim3"]:
    ...

def accepts_pytree_of_ints(x: PyTree[int]):
    ...

def accepts_pytree_of_arrays(x: PyTree[Float[Array, "batch c1 c2"]]):
    ...

https://github.com/google/jaxtyping

Create issue templates

Create issue templates for the repository as described here

dynamic deephit plugin does not work on a simple dataset

Hi,

I created a simple dataset according to the example in data format tutorial.

time_series_df = pd.DataFrame(
    {
        "sample_idx": ["sample_0", "sample_0", "sample_0", "sample_1", "sample_1", "sample_1", "sample_2", "sample_2", "sample_2"],
        "time_idx": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "t_feat_0": [11, 12, 13, 14, 21, 22, 31, 28, 26],
        "t_feat_1": [1.1, 1.2, 1.3, 1, 2.1, 2.2, 3.1, 2.3, 2.0],
    }
)

# Set the 2-level index:
time_series_df.set_index(keys=["sample_idx", "time_idx"], drop=True, inplace=True)

# Create a static data dataframe.
static_df = pd.DataFrame(
    {
        "s_feat_0": [100, 200, 300],
        "s_feat_1": [-1.1, -1, -1.3],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

# Create an event dataframe.

event_df = pd.DataFrame(
    {
        "e_feat_0": [(10, True), (12, False), (13, True)],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

# Create a dataset of time-to-event analysis task:
data = TimeToEventAnalysisDataset(
    time_series=time_series_df,
    static=static_df,
    targets=event_df,
)

But the dynamic_deephit plugin does not work on the above dataset with the following errors.

It seems that the dataset was not fit-ready. I checked that I have all the components for time to event dataset. Could you help with this please?

Thanks very much,
Wenjuan

[Epic] Preprocessing plugins

Description

Preprocessing plugins, for scaling or dimensionality reduction.

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

drop constant features - TODO
handle multicollinearity - TODO
drop low variance features - TODO
encode data

[Enhancement] Evaluation: Add more metrics for evaluating survival analysis tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating survival analysis tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

The metrics should be reported by each evaluation time horizon, and aggregated(mean, std).

Important metrics to cover here:
[X] c_index : The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.
[X] brier_score: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.

aucroc : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
sensitivity: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.
specificity: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.
PPV: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.
NPV: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

Fix the problem of accessing static/class methods on plugins

[Enhancement] Fix RNN contiguous memory warning (serialization)

After serializing and deserializing some models that include RNNs, the following warning is received:

UserWarning: RNN module weights are not part of single contiguous chunk of memory"

Serialization mechanism needs to be improved to fix this problem.

Add clairvoyance2 dataset conversions

Use sphinx-immaterial theme

Currently we are using sphinx-material docs theme. However, sphinx-immaterial theme is similar but better supported, so we should migrate.

See if this also resolves the problem of nav bar logo not working with the actual temporai logo.

[Testing] Add testing datasets

Some useful datasets for running unit tests: https://github.com/vanderschaarlab/synthcity/tree/main/src/synthcity/utils/datasets/time_series

[AutoML] Add AutoML objective evaluation for survival analysis tasks

Feature Description

[AutoML] Add AutoML objective evaluation for classification tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #15
The evaluation is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/risk_estimation.py

Serialization support for plugins and pipelines

Description

The plugins and pipelines should be easy to serialize/deserialize.

cloudpickle is a good starting point.

[Epic] Imputation models

Description

Add imputation plugins.

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

bfill/ffill
...

[Epic] Prediction models

Description

Add prediction models

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

Add PyPI release workflows

Feature Description

The library should be automatically uploaded to PyPI on release.

Reference release script https://github.com/vanderschaarlab/autoprognosis/blob/main/.github/workflows/release.yml

Fix the problem of accessing static/class methods on plugins

[AutoML] Create pipeline from hyperparameters

Feature Description

For AutoML search, is important to be able to sample hyperparameters, and to recreate the pipeline from those hyperparameters.

AutoPrognosis implements the following strategy:

For each search task, the user can select imputation, preprocessing and prediction plugins.
For each pipeline, the prediction plugin "drives" the whole pipeline selection.
To that end, we artificially extend the predictor hyperparameters to include the imputation and preprocessing plugins. In other words, the user samples from predictor's [imputation plugin 1, 2, 3, ...] + [preprocessing plugin 1, 2, 3, ...] + hyperparam_space. This simplifies the sampling process.
Given a sampled preprocessing/imputed plugin, and a set of hyperparameters, the user must be able to create the complete pipeline

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/core/selector.py

Blocked by: #27

Issues with MethodSeeker and Dynamic DeepHit when running with PBC dataset

Hi,
I have errors when trying to run dynamic deephit with MethodSeeker with PBC dataset:
To reproduce the errors, please do the following:

from tempor.utils.dataloaders import PBCDataLoader
dataset = PBCDataLoader(random_state=42).load()

# Provide a custom hyperparameter space to search for each type of model.

hp_space = {
    "dynamic_deephit": [
        IntegerParams(name="n_iter", low=200, high=200),
        IntegerParams(name="batch_size", low=30, high=100),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "ts_xgb": [
        IntegerParams(name="n_iter", low=200, high=200),
        IntegerParams(name="batch_size", low=100, high=100),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
}

# Initialize a `MethodSeeker` and provide `override_hp_space`.
seeker = MethodSeeker(
    study_name="my_automl_study",
    task_type="time_to_event",
    estimator_names=[
        "dynamic_deephit",
        "ts_xgb",
    ],
    metric="c_index",   
    dataset=dataset,
    horizon=[1,5,9],
    return_top_k=2,
    num_iter=3,  # For the sake of speed of this example, only 3 AutoML iterations.
    tuner_type="bayesian",
    # Override hyperparameter space:
    override_hp_space=hp_space,
)

best_methods, best_scores = seeker.search()

The error is as follows:

Thanks for the help!

Best wishes,
Wenjuan

[Feat] Reproducibility

Feature Description

Every plugin/pipeline or API call should support fixing the random seed.

There are methods for setting a global random seed. For example

# stdlib
import random

# third party
import numpy as np
import torch


def enable_reproducible_results(random_state: int = 0) -> None:
    np.random.seed(random_state)
    torch.manual_seed(random_state)
    random.seed(random_state)

[Enhancement] Evaluation: Add more metrics for benchmarking - treatment effects

Example from catenets, adapted here:

https://github.com/AliciaCurth/CATENets/blob/main/catenets/experiment_utils/torch_metrics.py

[Enhancement] Add AutoML objective evaluation for ensembles

Feature Description

Given a set of K optimal pipelines selected by the AutoML logic given an objective, the next step is to evaluate ensembles of top of the candidate pipelines.

For the weighted ensemble, a separate AutoML search can be executed, to evaluate various weights.
The process benchmarks all the supported ensemble setups(weighted, stacked, voting etc), and returns the optimal solution.

depends on #8, #7, #6, #5, #13

AP references:

Handle SyncTwin predict_counterfactuals

why not contribute this to sktime?

It would seem that this package is "cloning" a number of core aspects of sktime, including data format, base class design, etc.

It does add some novel aspects, but there aren't too many differences at the moment.
So, why develop this in complete detachment from the pydata ecosytem?

Long-term, it will be much harder to maintain if you insist on trying to build a parallel ecosystem targeted at medical doctors.

I understand the academic sensitivities of wanting to "own", but that's not really how open source works - the more you give and let go of, the more you get back, and the more successful you will be.

For instance, why not contribute this to sktime?

[Enhancement] Prediction models: Landmarking

Feature Description

Add the landmarking prediction model

Reference

[Enhancement] Introduce plugin types

Introduce plugin types which can be listed separately, so that we can have clear separation between models/metrics/data sources etc. plugins.

[Epic] Add more datasets

Enhance `DataSet` with methods for data splitting

[Bug] Catboost is required, but fails to build (macOS 13.5)

Describe the bug
Building the temporai package from pip or from github fails as catboost is required.
This is probably linked to #72 were the catboost dependency should have been removed and to catboost/catboost#2371 (comment)

Platform:

macOS 13.5
python 3.11.5

To Reproduce
pip install temporai
or
pip install "temporai @ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3"

Expected behavior
Build should not fail, and catboost should probably not be required.

Results
temporai build fails.

Collecting catboost>=1.0.5 (from hyperimpute>=0.1.17->temporai@ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3)
  Using cached catboost-1.2.2.tar.gz (60.1 MB)

``` Building wheel for catboost (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for catboost (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [218 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/monoforest.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/plot_helpers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/metrics.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/version.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/text_processing.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/datasets.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/core.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/dev_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/metrics_plotter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/ipythonwidget.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/callbacks.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/catboost_evaluation.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_model.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_readers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/log_config.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_splitter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/execution_case.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_storage.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/factor_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/evaluation_result.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_models_handler.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
running build_ext
Buildling _catboost with cmake and ninja
target_platform=darwin-x86_64. Building targets _catboost with PIC
Running "cmake /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src -B /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain --log-level=VERBOSE -DCMAKE_POSITION_INDEPENDENT_CODE=On -DCATBOOST_COMPONENTS=python-package -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0 -DHAVE_CUDA=no -DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects"
-- The C compiler identification is AppleClang 14.0.3.14030022
-- The CXX compiler identification is AppleClang 14.0.3.14030022
-- The ASM compiler identification is Clang
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /Users/jk1/opt/anaconda3/envs/treatment_effects/bin/python3.1 (found version "3.11.5") found components: Interpreter
-- CMAKE_C_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- CMAKE_CXX_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -Woverloaded-virtual -Wimport-preprocessor-directive-pedantic -Wno-undefined-var-template -Wno-return-std-move -Wno-defaulted-function-deleted -Wno-pessimizing-move -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-enum-enum-conversion -Wno-deprecated-enum-float-conversion -Wno-ambiguous-reversed-operator -Wno-deprecated-volatile -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- Conan: checking conan executable
-- Conan: Found program /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan
-- Conan: Version found Conan version 1.59.0
-- Conan executing: /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan install /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src --remote conancenter --install-folder /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 --build missing --env CONAN_CMAKE_GENERATOR=Ninja --settings build_type=Release --settings compiler=apple-clang --settings compiler.version=14.0 --settings compiler.libcxx=libc++ --settings compiler.cppstd=20 --conf tools.cmake.cmaketoolchain:generator=Ninja
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=apple-clang
compiler.cppstd=20
compiler.libcxx=libc++
compiler.version=14.0
os=Macos
os_build=Macos
[options]
[build_requires]
[env]
CONAN_CMAKE_GENERATOR=Ninja
[conf]
tools.cmake.cmaketoolchain:generator=Ninja

  Version ranges solved
      Version range '>=1.2.11 <2' required by 'pcre/8.45' resolved to 'zlib/1.3' in local cache
  
  conanfile.txt: Installing package
  Requirements
      libiconv/1.15 from 'conancenter' - Cache
      openssl/1.1.1t from 'conancenter' - Cache
  Packages
      libiconv/1.15:e1ef30a7ac2ff8c218173fdf49ec961a5c046a36 - Cache
      openssl/1.1.1t:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
  Build requirements
      bzip2/1.0.8 from 'conancenter' - Cache
      pcre/8.45 from 'conancenter' - Cache
      ragel/6.10 from 'conancenter' - Cache
      swig/4.0.2 from 'conancenter' - Cache
      yasm/1.3.0 from 'conancenter' - Cache
      zlib/1.3 from 'conancenter' - Cache
  Build requirements packages
      bzip2/1.0.8:b9b85a7c8f543b96385e1da9e174853f1fb08e0c - Cache
      pcre/8.45:842afe377248eac66b64b538531df2b005d57959 - Cache
      ragel/6.10:801752c0480319b8e090188c566245a78e9abcf4 - Cache
      swig/4.0.2:099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb - Cache
      yasm/1.3.0:801752c0480319b8e090188c566245a78e9abcf4 - Cache
      zlib/1.3:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
  
  Installing (downloading, building) binaries...
  bzip2/1.0.8: Already installed!
  libiconv/1.15: Already installed!
  openssl/1.1.1t: Already installed!
  ragel/6.10: Already installed!
  ragel/6.10: Appending PATH environment variable: /Users/jk1/.conan/data/ragel/6.10/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
  yasm/1.3.0: Already installed!
  yasm/1.3.0: Appending PATH environment variable: /Users/jk1/.conan/data/yasm/1.3.0/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
  zlib/1.3: Already installed!
  pcre/8.45: Already installed!
  swig/4.0.2: Already installed!
  swig/4.0.2: Appending PATH environment variable: /Users/jk1/.conan/data/swig/4.0.2/_/_/package/099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb/bin
  conanfile.txt: Applying build-requirement: ragel/6.10
  conanfile.txt: Applying build-requirement: swig/4.0.2
  conanfile.txt: Applying build-requirement: yasm/1.3.0
  conanfile.txt: Applying build-requirement: pcre/8.45
  conanfile.txt: Applying build-requirement: bzip2/1.0.8
  conanfile.txt: Applying build-requirement: zlib/1.3
  conanfile.txt: Generator cmake_find_package created Findragel.cmake
  conanfile.txt: Generator cmake_find_package created FindSWIG.cmake
  conanfile.txt: Generator cmake_find_package created Findyasm.cmake
  conanfile.txt: Generator cmake_find_package created FindIconv.cmake
  conanfile.txt: Generator cmake_find_package created FindOpenSSL.cmake
  conanfile.txt: Generator cmake_find_package created FindPCRE.cmake
  conanfile.txt: Generator cmake_find_package created FindBZip2.cmake
  conanfile.txt: Generator cmake_find_package created FindZLIB.cmake
  conanfile.txt: Generator cmake_paths created conan_paths.cmake
  conanfile.txt: Generator txt created conanbuildinfo.txt
  conanfile.txt: Aggregating env generators
  conanfile.txt: Generated conaninfo.txt
  conanfile.txt: Generated graphinfo
  conanfile.txt imports(): Copied 434 '.i' files
  conanfile.txt imports(): Copied 273 '.swg' files
  conanfile.txt imports(): Copied 1 '.swig' file: Makefile.swig
  conanfile.txt imports(): Copied 2 '.ml' files: swig.ml, swigp4.ml
  conanfile.txt imports(): Copied 1 '.pl' file: Makefile.pl
  conanfile.txt imports(): Copied 6 files
  conanfile.txt imports(): Copied 1 '.rb' file: extconf.rb
  conanfile.txt imports(): Copied 1 '.h' file: noembed.h
  conanfile.txt imports(): Copied 1 '.scm' file: common.scm
  conanfile.txt imports(): Copied 1 '.mli' file: swig.mli
  conanfile.txt imports(): Copied 1 '.hpp' file: octheaders.hpp
  CMake Error at /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
    Could NOT find Python3 (missing: Python3_INCLUDE_DIRS Python3_LIBRARIES
    Development Development.Module Development.Embed)
  Call Stack (most recent call first):
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython/Support.cmake:3166 (find_package_handle_standard_args)
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython3.cmake:485 (include)
    catboost/python-package/catboost/CMakeLists.darwin-x86_64.txt:9 (find_package)
    catboost/python-package/catboost/CMakeLists.txt:20 (include)
  
  
  -- Configuring incomplete, errors occurred!
  See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeOutput.log".
  See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeError.log".
  Traceback (most recent call last):
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
      return _build_backend().build_wheel(wheel_directory, config_settings,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 434, in build_wheel
      return self._build_with_temp_dir(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir
      self.run_setup()
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 507, in run_setup
      super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in run_setup
      exec(code, locals())
    File "<string>", line 731, in <module>
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 397, in run
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 364, in run
      self.run_command("build")
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 332, in run
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 444, in run
    File "<string>", line 462, in build_with_cmake_and_ninja
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 517, in build
      cmd_runner.run(cmake_cmd, env=build_environ)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 164, in run
      subprocess.run(cmd, check=True, **subprocess_run_kwargs)
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/subprocess.py", line 571, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['cmake', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src', '-B', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain', '--log-level=VERBOSE', '-DCMAKE_POSITION_INDEPENDENT_CODE=On', '-DCATBOOST_COMPONENTS=python-package', '-DCMAKE_OSX_DEPLOYMENT_TARGET=11.0', '-DHAVE_CUDA=no', '-DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for catboost
Successfully built temporai
Failed to build catboost
ERROR: Could not build wheels for catboost, which is required to install pyproject.toml-based projects

</details>

[Enhancement] Achieve full docstring coverage

There are still a few places where docstrings are needed. This enhancement would get docstring coverage to 100%.

[Feat] Add pipeline logic

Feature Description

The library should offer the possibility to execute multiple plugins in sequence, and sample hyperparameters for all of them.

Sampling hyperparameters require a class, so that you don't instantiate an useless object.
To that end, you can create meta classes in python by inheriting the type class directly.

The pipeline wrapper should offer the following interface:

fit - train the pipeline
predict - transform(for preprocessing plugins in the pipeline) and predict
hyperarameters_space/sample_hyperparameters which should call the sampling logic from each plugin.

Reference implementation https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/plugins/pipeline/__init__.py

The feature should be covered by tests, covering:

a standard pipeline train and predict.
hyperparameter sampling and pipeline instantiation.
serialization

depends on https://github.com/vanderschaarlab/temporai-priv/issues/5

[Epic] Testing

Description

All the classes should have test coverage.

Type of Test

Unit test (e.g. checking a loop, method, or function is working as intended)
Integration test (e.g. checking if a certain group or set of functionality is working as intended)
Regression test (e.g. checking if by adding or removing a module of code allows other systems to continue to function as intended)
Stress test (e.g. checking to see how well a system performs under various situations, including heavy usage)
Performance test (e.g. checking to see how efficient a system is as performing the intended task)
Other...

[Feat] Prediction models: Add Dynamic DeepHit

Feature Description

Dynamic Deephit prediciton model.

Reference implementation in PyTorch: https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/plugins/core/models/time_series_survival/ts_surv_dynamic_deephit.py

[Evaluation] Add metrics for evaluating regression tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating regression tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

Important metrics to cover here:

r2" R^2(coefficient of determination) regression score function.
mse: Mean squared error regression loss.
mae: Mean absolute error regression loss.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

[Feat] bfill/ffill imputation methods

Feature Description

Add basic bfill/ffil imputation methods .

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ffill.html

The model should be covered by tests.

time_to_event models and scaling methods do not work with categorical data

Hi I found that the time_to_event models and scaling methods do not work with categorical data.

To reproduce it, one can go to the tutorials/data/tutorial1_data_format.ipynb, use the data example (comment one line of event data) and run the time to event models.

I have only tried time to event models and scaling methods, other methods might not work on categorical data as well. According to the tutorials for data format, pandas.Catergorical is supported as column values?

Thanks for looking into it!
Wenjuan

Use nbsphinx for tutorials / user guide in docs

nbsphinx extension is designed for integrating notebooks into documentation. Hence we should use that instead of the custom code in docs/pre_build.py. Ideally should also find a way of having the link to colab at the top of each tutorial in the docs.

Upgrade to pydantic 2

Pydantic 2.0 is now the current version, so changes need to be made to use it rather then 1.0.

nn_regressor: Pydantic crash

Description

Pydantic imposes a limit on the number of temporai objects that can be instantiated.

Example: In the test_nn_regressor.py, the following snippet will crash

 def test_hyperparam_sample():
     for repeat in range(10000):  # pylint: disable=unused-variable
         args = plugin._cls.sample_hyperparameters()  # pylint: disable=no-member, protected-access
         plugin(**args)

with the error

>   ???
E   pydantic.error_wrappers.ValidationError: 1 validation error for _InitArgsValidator
E   __root__
E     Model parameters could not be validated as defined by `EmptyParamsDefinition`, cause: 
E   ---------------
E   RecursionError:
E   maximum recursion depth exceeded
E   ---------------
E    (type=value_error)

pydantic/main.py:342: ValidationError

Expected behaviour

Pydantic should not limit the functionality of the library.

[Enhancement] Ensemble support

Feature Description

Given a set of pipelines - or just estimators, users should be able to create ensembles.

Popular ensemble techniques

WeightedEnsemble: average across all scores/prediction results, maybe with weights
Stacking (meta ensembling): use a meta learner to learn the base classifier results
Majority Vote Ensemble
DCS: Dynamic Classifier Selection: Combination of multiple classifiers using local accuracy estimates
DES: Dynamic Ensemble Selection: From dynamic classifier selection to dynamic ensemble selection

Reference code in AutoPrognosis: https://github.com/vanderschaarlab/autoprognosis/tree/main/src/autoprognosis/plugins/ensemble

More about here: https://github.com/yzhao062/combo

[AutoML] Add AutoML objective evaluation for regression tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #10
The benchmark is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/regression.py

Fix test failure due to `cmaes` import failure with optuna 3.4+

[Evaluation] Add metrics for evaluating classification tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating classification tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

Important metrics to cover here:

aucroc : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
aucprc : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
accuracy : Accuracy classification score.
f1_score(micro, macro, weighted): F1 score is a harmonic mean of the precision and recall. This version uses the "micro" average: calculate metrics globally by counting the total true positives, false negatives and false positives.
kappa: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.
precision(micro, macro, weighted): Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.
recall(micro, macro, weighted): Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.
mcc: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

[Docs] Add doscstrings for all the important classes

Embrace mypy strict typing

Investigate docs API generation

The way module reference (API) is rendered in docs is not great - need to investigate.

Currently like so: https://temporai.readthedocs.io/en/latest/api/modules.html
Should be more like: https://www.statsmodels.org/stable/api.html#
Could be related to the theme (sphinx-material) or autodoc.

Warnings raised on documentation building like

/mnt/data-fourtb/Dropbox/Programming/wsl_repos/_vds/temporai/docs/../src/tempor/data/pandera_utils.py:docstring of tempor.data.pandera_utils:1: WARNING: Inline interpreted text or phrase reference start-string without end-string.
...

Investigate and fix these.

Add tutorials(notebooks + Colab links)

Description

The library should have a tutorial for each major feature.
The tutorials should be notebooks, and should be also deployed on Colab, for easier use.

[Feat] Benchmarking tools

Feature Description

The library should offer methods for evaluating predictive models/pipelines.

For each problem type, there can be different relevant metrics, as described in the linked tasks.

The evaluation should be done using KFold(regression)/StratifiedKFold(classification, survival analsysis), and predefined random seed.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py
Example implementation for time series survival
https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/plugins/core/models/time_series_survival/benchmarks.py#L142

blocked by #9
blocked by #10
blocked by #11

Plugins cannot contain the `tempor` substring in their name

Description

A plugin with a name like plugin_temporal_minmax_scaler.py/temporal_minmax_scaler will break the PluginLoader.

[CI] Github workflows

Description

Before releasing, the library should be tested on the matrix {MacOS, Windows, Linux} x {ython {3.7, 3.8, 3.9, 3.10} for compatiblity.

On each test scenario, all the unit tests should pass.

Reference workflow: https://github.com/vanderschaarlab/autoprognosis/blob/main/.github/workflows/test.yml

Additional notes:

Add a notebook testing workflow a la https://github.com/vanderschaarlab/synthcity/blob/main/tests/nb_eval.py
Include doctests in workflow
As part of this, get rid of requirements-dev.txt and put that as testing mode in setup.cfg

[AutoML] Add AutoML objective evaluation for classification tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #11
The benchmark is done using the cross-validation tester documented in #20.
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/classifiers.py

vanderschaarlab / temporai Goto Github PK

temporai's Introduction

TemporAI

📃 Overview

How is TemporAI unique?

Key concepts

🚀 Installation

Instal with pip

Install in a conda environment

💥 Sample Usage

📖 Tutorials

Data

User Guide

Extending TemporAI

📘 Documentation

Note on documentation versions:

🌍 TemporAI Ecosystem (Experimental)

🔑 Methods

Time-to-Event (survival) analysis over time

Treatment effects

One-off

Temporal

Prediction

One-off

Temporal

Preprocessing

Feature Encoding

Imputation

Scaling

🔨 Tests and Development

✍️ Citing

temporai's People

Contributors

Stargazers

Watchers

Forkers

temporai's Issues

Description

Description

Why?

Breakdown

Feature Description

Feature Description

Feature Description

Description

Description

Breakdown

Description

Why?

Breakdown

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Description

Type of Test

Feature Description

Feature Description

Feature Description

Description

Expected behaviour

Feature Description

Feature Description

Feature Description

Description

Feature Description

Description

Description

Feature Description

Recommend Projects

Recommend Topics

Recommend Org

Instal with `pip`