Giter Club home page Giter Club logo

temporai's Introduction

Test In Colab Documentation Status

Python 3.7+ PyPI-Server Downloads License

Tests Tests codecov

arXiv slack about

TemporAI

βš—οΈ Status: This project is still in alpha, and the API may change without warning.

πŸ“ƒ Overview

TemporAI is a Machine Learning-centric time-series library for medicine. The tasks that are currently of focus in TemporAI are: time-to-event (survival) analysis with time-series data, treatment effects (causal inference) over time, and time-series prediction. Data preprocessing methods, including missing value imputation for static and temporal covariates, are provided. AutoML tools for hyperparameter tuning and pipeline selection are also available.

How is TemporAI unique?

  • πŸ₯ Medicine-first: Focused on use cases for medicine and healthcare, such as temporal treatment effects, survival analysis over time, imputation methods, models with built-in and post-hoc interpretability, ... See methods.
  • πŸ—οΈ Fast prototyping: A plugin design allowing for on-the-fly integration of new methods by the users.
  • πŸš€ From research to practice: Relevant novel models from research community adapted for practical use.
  • 🌍 A healthcare ecosystem vision: A range of interactive demonstration apps, new medical problem settings, interpretability tools, data-centric tools etc. are planned.

Key concepts

key concepts

πŸš€ Installation

Instal with pip

From the Python Package Index (PyPI):

$ pip install temporai

Or from source:

$ git clone https://github.com/vanderschaarlab/temporai.git
$ cd temporai
$ pip install .

Install in a conda environment

While have not yet published TemporAI on conda-forge, you can still install TemporAI in your conda environment using pip as follows:

Create and activate conda environment as normal:

$ conda create -n <my_environment>
$ conda activate <my_environment>

Then install inside your conda environment with pip:

$ pip install temporai

πŸ’₯ Sample Usage

(▢️ Expand to view the sections below.)

List the available plugins
from tempor import plugin_loader

print(plugin_loader.list())
Use a time-to-event (survival) analysis model
from tempor import plugin_loader

# Load a time-to-event dataset:
dataset = plugin_loader.get("time_to_event.pbc", plugin_type="datasource").load()

# Initialize the model:
model = plugin_loader.get("time_to_event.dynamic_deephit")

# Train:
model.fit(dataset)

# Make risk predictions:
prediction = model.predict(dataset, horizons=[0.25, 0.50, 0.75])
Use a temporal treatment effects model
import numpy as np

from tempor import plugin_loader

# Load a dataset with temporal treatments and outcomes:
dataset = plugin_loader.get(
    "treatments.temporal.dummy_treatments",
    plugin_type="datasource",
    temporal_covariates_missing_prob=0.0,
    temporal_treatments_n_features=1,
    temporal_treatments_n_categories=2,
).load()

# Initialize the model:
model = plugin_loader.get("treatments.temporal.regression.crn_regressor", epochs=20)

# Train:
model.fit(dataset)

# Define target variable horizons for each sample:
horizons = [
    tc.time_indexes()[0][len(tc.time_indexes()[0]) // 2 :] for tc in dataset.time_series
]

# Define treatment scenarios for each sample:
treatment_scenarios = [
    [np.asarray([1] * len(h)), np.asarray([0] * len(h))] for h in horizons
]

# Predict counterfactuals:
counterfactuals = model.predict_counterfactuals(
    dataset,
    horizons=horizons,
    treatment_scenarios=treatment_scenarios,
)
Use a missing data imputer
from tempor import plugin_loader

dataset = plugin_loader.get(
    "prediction.one_off.sine", plugin_type="datasource", with_missing=True
).load()
static_data_n_missing = dataset.static.dataframe().isna().sum().sum()
temporal_data_n_missing = dataset.time_series.dataframe().isna().sum().sum()

print(static_data_n_missing, temporal_data_n_missing)
assert static_data_n_missing > 0
assert temporal_data_n_missing > 0

# Initialize the model:
model = plugin_loader.get("preprocessing.imputation.temporal.bfill")

# Train:
model.fit(dataset)

# Impute:
imputed = model.transform(dataset)
temporal_data_n_missing = imputed.time_series.dataframe().isna().sum().sum()

print(static_data_n_missing, temporal_data_n_missing)
assert temporal_data_n_missing == 0
Use a one-off classifier (prediction)
from tempor import plugin_loader

dataset = plugin_loader.get("prediction.one_off.sine", plugin_type="datasource").load()

# Initialize the model:
model = plugin_loader.get("prediction.one_off.classification.nn_classifier", n_iter=50)

# Train:
model.fit(dataset)

# Predict:
prediction = model.predict(dataset)
Use a temporal regressor (forecasting)
from tempor import plugin_loader

# Load a dataset with temporal targets.
dataset = plugin_loader.get(
    "prediction.temporal.dummy_prediction",
    plugin_type="datasource",
    temporal_covariates_missing_prob=0.0,
).load()

# Initialize the model:
model = plugin_loader.get("prediction.temporal.regression.seq2seq_regressor", epochs=10)

# Train:
model.fit(dataset)

# Predict:
prediction = model.predict(dataset, n_future_steps=5)
Benchmark models, time-to-event task
from tempor.benchmarks import benchmark_models
from tempor import plugin_loader
from tempor.methods.pipeline import pipeline

testcases = [
    (
        "pipeline1",
        pipeline(
            [
                "preprocessing.scaling.temporal.ts_minmax_scaler",
                "time_to_event.dynamic_deephit",
            ]
        )({"ts_coxph": {"n_iter": 100}}),
    ),
    (
        "plugin1",
        plugin_loader.get("time_to_event.dynamic_deephit", n_iter=100),
    ),
    (
        "plugin2",
        plugin_loader.get("time_to_event.ts_coxph", n_iter=100),
    ),
]
dataset = plugin_loader.get("time_to_event.pbc", plugin_type="datasource").load()

aggr_score, per_test_score = benchmark_models(
    task_type="time_to_event",
    tests=testcases,
    data=dataset,
    n_splits=2,
    random_state=0,
    horizons=[2.0, 4.0, 6.0],
)

print(aggr_score)
Serialization
from tempor.utils.serialization import load, save
from tempor import plugin_loader

# Initialize the model:
model = plugin_loader.get("prediction.one_off.classification.nn_classifier", n_iter=50)

buff = save(model)  # Save model to bytes.
reloaded = load(buff)  # Reload model.

# `save_to_file`, `load_from_file` also available in the serialization module.
AutoML - search for the best pipeline for your task
from tempor.automl.seeker import PipelineSeeker

dataset = plugin_loader.get("prediction.one_off.sine", plugin_type="datasource").load()

# Specify the AutoML pipeline seeker for the task of your choice, providing candidate methods,
# metric, preprocessing steps etc.
seeker = PipelineSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    return_top_k=3,
    num_iter=100,
    tuner_type="bayesian",
    static_imputers=["static_tabular_imputer"],
    static_scalers=[],
    temporal_imputers=["ffill", "bfill"],
    temporal_scalers=["ts_minmax_scaler"],
)

# The search will return the best pipelines.
best_pipelines, best_scores = seeker.search()  # doctest: +SKIP

πŸ“– Tutorials

Data

User Guide

Extending TemporAI

πŸ“˜ Documentation

See the full project documentation here.

Note on documentation versions:

  • If you have installed TemporAI from PyPI, you should refer to the stable documentation.
  • If you have installed TemporAI from source, you should refer to the latest documentation.

See the Instal with pip section for reference.

🌍 TemporAI Ecosystem (Experimental)

We provide additional tools in the TemporAI ecosystem, which are in active development, and are currently (very) experimental. Suggestions and contributions are welcome!

These include:

πŸ”‘ Methods

(▢️ Expand to view the sections below.)

Time-to-Event (survival) analysis over time

Risk estimation given event data (category: time_to_event)

Name Description Reference
dynamic_deephit Dynamic-DeepHit incorporates the available longitudinal data comprising various repeated measurements (rather than only the last available measurements) in order to issue dynamically updated survival predictions Paper
ts_coxph Create embeddings from the time series and use a CoxPH model for predicting the survival function ---
ts_xgb Create embeddings from the time series and use a SurvivalXGBoost model for predicting the survival function ---

Treatment effects

One-off

Treatment effects estimation where treatments are a one-off event.

  • Regression on the outcomes (category: treatments.one_off.regression)
Name Description Reference
synctwin_regressor SyncTwin is a treatment effect estimation method tailored for observational studies with longitudinal data, applied to the LIP setting: Longitudinal, Irregular and Point treatment. Paper

Temporal

Treatment effects estimation where treatments are temporal (time series).

  • Classification on the outcomes (category: treatments.temporal.classification)
Name Description Reference
crn_classifier The Counterfactual Recurrent Network (CRN), a sequence-to-sequence model that leverages the available patient observational data to estimate treatment effects over time. Paper
  • Regression on the outcomes (category: treatments.temporal.regression)
Name Description Reference
crn_regressor The Counterfactual Recurrent Network (CRN), a sequence-to-sequence model that leverages the available patient observational data to estimate treatment effects over time. Paper

Prediction

One-off

Prediction where targets are static.

  • Classification (category: prediction.one_off.classification)
Name Description Reference
nn_classifier Neural-net based classifier. Supports multiple recurrent models, like RNN, LSTM, Transformer etc. ---
ode_classifier Classifier based on ordinary differential equation (ODE) solvers. ---
cde_classifier Classifier based Neural Controlled Differential Equations for Irregular Time Series. Paper
laplace_ode_classifier Classifier based Inverse Laplace Transform (ILT) algorithms implemented in PyTorch. Paper
  • Regression (category: prediction.one_off.regression)
Name Description Reference
nn_regressor Neural-net based regressor. Supports multiple recurrent models, like RNN, LSTM, Transformer etc. ---
ode_regressor Regressor based on ordinary differential equation (ODE) solvers. ---
cde_regressor Regressor based Neural Controlled Differential Equations for Irregular Time Series. Paper
laplace_ode_regressor Regressor based Inverse Laplace Transform (ILT) algorithms implemented in PyTorch. Paper

Temporal

Prediction where targets are temporal (time series).

  • Classification (category: prediction.temporal.classification)
Name Description Reference
seq2seq_classifier Seq2Seq prediction, classification ---
  • Regression (category: prediction.temporal.regression)
Name Description Reference
seq2seq_regressor Seq2Seq prediction, regression ---

Preprocessing

Feature Encoding

  • Static data (category: preprocessing.encoding.static)
Name Description Reference
static_onehot_encoder One-hot encode categorical static features ---
  • Temporal data (category: preprocessing.encoding.temporal)
Name Description Reference
ts_onehot_encoder One-hot encode categorical time series features ---

Imputation

  • Static data (category: preprocessing.imputation.static)
Name Description Reference
static_tabular_imputer Use any method from HyperImpute (HyperImpute, Mean, Median, Most-frequent, MissForest, ICE, MICE, SoftImpute, EM, Sinkhorn, GAIN, MIRACLE, MIWAE) to impute the static data Paper
  • Temporal data (category: preprocessing.imputation.temporal)
Name Description Reference
ffill Propagate last valid observation forward to next valid ---
bfill Use next valid observation to fill gap ---
ts_tabular_imputer Use any method from HyperImpute (HyperImpute, Mean, Median, Most-frequent, MissForest, ICE, MICE, SoftImpute, EM, Sinkhorn, GAIN, MIRACLE, MIWAE) to impute the time series data Paper

Scaling

  • Static data (category: preprocessing.scaling.static)
Name Description Reference
static_standard_scaler Scale the static features using a StandardScaler ---
static_minmax_scaler Scale the static features using a MinMaxScaler ---
  • Temporal data (category: preprocessing.scaling.temporal)
Name Description Reference
ts_standard_scaler Scale the temporal features using a StandardScaler ---
ts_minmax_scaler Scale the temporal features using a MinMaxScaler ---

πŸ”¨ Tests and Development

Install the testing dependencies using:

pip install .[testing]

The tests can be executed using:

pytest -vsx

For local development, we recommend that you should install the [dev] extra, which includes [testing] and some additional dependencies:

pip install .[dev]

For development and contribution to TemporAI, see:

✍️ Citing

If you use this code, please cite the associated paper:

@article{saveliev2023temporai,
  title={TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks for Medicine},
  author={Saveliev, Evgeny S and van der Schaar, Mihaela},
  journal={arXiv preprint arXiv:2301.12260},
  year={2023}
}

temporai's People

Contributors

bcebere avatar drshushen avatar julianklug avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

temporai's Issues

[Enhancement] Integrate jaxtyping for advanced parameter validation #120

Description

An improvement on top of pydantic would be to integrate jaxtyping, which allows for validating tensor shapes as well
jaxtyping supports PyTorch tensors and numpy arrays.

Example

from jaxtyping import Array, Float, PyTree

# Accepts floating-point 2D arrays with matching dimensions
def matrix_multiply(x: Float[Array, "dim1 dim2"],
                    y: Float[Array, "dim2 dim3"]
                  ) -> Float[Array, "dim1 dim3"]:
    ...

def accepts_pytree_of_ints(x: PyTree[int]):
    ...

def accepts_pytree_of_arrays(x: PyTree[Float[Array, "batch c1 c2"]]):
    ...

https://github.com/google/jaxtyping

dynamic deephit plugin does not work on a simple dataset

Hi,

I created a simple dataset according to the example in data format tutorial.

time_series_df = pd.DataFrame(
    {
        "sample_idx": ["sample_0", "sample_0", "sample_0", "sample_1", "sample_1", "sample_1", "sample_2", "sample_2", "sample_2"],
        "time_idx": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "t_feat_0": [11, 12, 13, 14, 21, 22, 31, 28, 26],
        "t_feat_1": [1.1, 1.2, 1.3, 1, 2.1, 2.2, 3.1, 2.3, 2.0],
    }
)

# Set the 2-level index:
time_series_df.set_index(keys=["sample_idx", "time_idx"], drop=True, inplace=True)

# Create a static data dataframe.
static_df = pd.DataFrame(
    {
        "s_feat_0": [100, 200, 300],
        "s_feat_1": [-1.1, -1, -1.3],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

# Create an event dataframe.

event_df = pd.DataFrame(
    {
        "e_feat_0": [(10, True), (12, False), (13, True)],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

# Create a dataset of time-to-event analysis task:
data = TimeToEventAnalysisDataset(
    time_series=time_series_df,
    static=static_df,
    targets=event_df,
)

But the dynamic_deephit plugin does not work on the above dataset with the following errors.

Screenshot 2023-07-26 at 15 00 55

It seems that the dataset was not fit-ready. I checked that I have all the components for time to event dataset. Could you help with this please?

Thanks very much,
Wenjuan

[Epic] Preprocessing plugins

Description

Preprocessing plugins, for scaling or dimensionality reduction.

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

  • drop constant features - TODO
  • handle multicollinearity - TODO
  • drop low variance features - TODO
  • encode data

[Enhancement] Evaluation: Add more metrics for evaluating survival analysis tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating survival analysis tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

The metrics should be reported by each evaluation time horizon, and aggregated(mean, std).

Important metrics to cover here:
[X] c_index : The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.
[X] brier_score: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.

  • aucroc : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
  • sensitivity: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.
  • specificity: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.
  • PPV: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.
  • NPV: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

[Enhancement] Fix RNN contiguous memory warning (serialization)

After serializing and deserializing some models that include RNNs, the following warning is received:

UserWarning: RNN module weights are not part of single contiguous chunk of memory"

Serialization mechanism needs to be improved to fix this problem.

[AutoML] Add AutoML objective evaluation for survival analysis tasks

Feature Description

[AutoML] Add AutoML objective evaluation for classification tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #15
The evaluation is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/risk_estimation.py

[Epic] Imputation models

Description

Add imputation plugins.

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

[Epic] Prediction models

Description

Add prediction models

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

[AutoML] Create pipeline from hyperparameters

Feature Description

For AutoML search, is important to be able to sample hyperparameters, and to recreate the pipeline from those hyperparameters.

AutoPrognosis implements the following strategy:

  • For each search task, the user can select imputation, preprocessing and prediction plugins.
  • For each pipeline, the prediction plugin "drives" the whole pipeline selection.
  • To that end, we artificially extend the predictor hyperparameters to include the imputation and preprocessing plugins. In other words, the user samples from predictor's [imputation plugin 1, 2, 3, ...] + [preprocessing plugin 1, 2, 3, ...] + hyperparam_space. This simplifies the sampling process.
  • Given a sampled preprocessing/imputed plugin, and a set of hyperparameters, the user must be able to create the complete pipeline

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/core/selector.py

Blocked by: #27

Issues with MethodSeeker and Dynamic DeepHit when running with PBC dataset

Hi,
I have errors when trying to run dynamic deephit with MethodSeeker with PBC dataset:
To reproduce the errors, please do the following:

from tempor.utils.dataloaders import PBCDataLoader
dataset = PBCDataLoader(random_state=42).load()

# Provide a custom hyperparameter space to search for each type of model.

hp_space = {
    "dynamic_deephit": [
        IntegerParams(name="n_iter", low=200, high=200),
        IntegerParams(name="batch_size", low=30, high=100),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "ts_xgb": [
        IntegerParams(name="n_iter", low=200, high=200),
        IntegerParams(name="batch_size", low=100, high=100),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
}

# Initialize a `MethodSeeker` and provide `override_hp_space`.
seeker = MethodSeeker(
    study_name="my_automl_study",
    task_type="time_to_event",
    estimator_names=[
        "dynamic_deephit",
        "ts_xgb",
    ],
    metric="c_index",   
    dataset=dataset,
    horizon=[1,5,9],
    return_top_k=2,
    num_iter=3,  # For the sake of speed of this example, only 3 AutoML iterations.
    tuner_type="bayesian",
    # Override hyperparameter space:
    override_hp_space=hp_space,
)

best_methods, best_scores = seeker.search()

The error is as follows:
Screenshot 2023-08-15 at 15 55 43

Thanks for the help!

Best wishes,
Wenjuan

[Feat] Reproducibility

Feature Description

Every plugin/pipeline or API call should support fixing the random seed.

There are methods for setting a global random seed. For example

# stdlib
import random

# third party
import numpy as np
import torch


def enable_reproducible_results(random_state: int = 0) -> None:
    np.random.seed(random_state)
    torch.manual_seed(random_state)
    random.seed(random_state)

[Enhancement] Add AutoML objective evaluation for ensembles

Feature Description

Given a set of K optimal pipelines selected by the AutoML logic given an objective, the next step is to evaluate ensembles of top of the candidate pipelines.

For the weighted ensemble, a separate AutoML search can be executed, to evaluate various weights.
The process benchmarks all the supported ensemble setups(weighted, stacked, voting etc), and returns the optimal solution.

depends on #8, #7, #6, #5, #13

AP references:

why not contribute this to sktime?

It would seem that this package is "cloning" a number of core aspects of sktime, including data format, base class design, etc.

It does add some novel aspects, but there aren't too many differences at the moment.
So, why develop this in complete detachment from the pydata ecosytem?

Long-term, it will be much harder to maintain if you insist on trying to build a parallel ecosystem targeted at medical doctors.

I understand the academic sensitivities of wanting to "own", but that's not really how open source works - the more you give and let go of, the more you get back, and the more successful you will be.

For instance, why not contribute this to sktime?

[Enhancement] Introduce plugin types

Introduce plugin types which can be listed separately, so that we can have clear separation between models/metrics/data sources etc. plugins.

[Bug] Catboost is required, but fails to build (macOS 13.5)

Describe the bug
Building the temporai package from pip or from github fails as catboost is required.
This is probably linked to #72 were the catboost dependency should have been removed and to catboost/catboost#2371 (comment)

Platform:

  • macOS 13.5
  • python 3.11.5

To Reproduce
pip install temporai
or
pip install "temporai @ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3"

Expected behavior
Build should not fail, and catboost should probably not be required.

Results
temporai build fails.

Collecting catboost>=1.0.5 (from hyperimpute>=0.1.17->temporai@ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3)
  Using cached catboost-1.2.2.tar.gz (60.1 MB)
``` Building wheel for catboost (pyproject.toml) ... error error: subprocess-exited-with-error

Γ— Building wheel for catboost (pyproject.toml) did not run successfully.
β”‚ exit code: 1
╰─> [218 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/monoforest.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/plot_helpers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/metrics.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/version.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/text_processing.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/datasets.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/core.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/dev_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/metrics_plotter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/ipythonwidget.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/callbacks.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/catboost_evaluation.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_model.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_readers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/log_config.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_splitter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/execution_case.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_storage.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/factor_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/evaluation_result.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_models_handler.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
running build_ext
Buildling _catboost with cmake and ninja
target_platform=darwin-x86_64. Building targets _catboost with PIC
Running "cmake /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src -B /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain --log-level=VERBOSE -DCMAKE_POSITION_INDEPENDENT_CODE=On -DCATBOOST_COMPONENTS=python-package -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0 -DHAVE_CUDA=no -DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects"
-- The C compiler identification is AppleClang 14.0.3.14030022
-- The CXX compiler identification is AppleClang 14.0.3.14030022
-- The ASM compiler identification is Clang
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /Users/jk1/opt/anaconda3/envs/treatment_effects/bin/python3.1 (found version "3.11.5") found components: Interpreter
-- CMAKE_C_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- CMAKE_CXX_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -Woverloaded-virtual -Wimport-preprocessor-directive-pedantic -Wno-undefined-var-template -Wno-return-std-move -Wno-defaulted-function-deleted -Wno-pessimizing-move -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-enum-enum-conversion -Wno-deprecated-enum-float-conversion -Wno-ambiguous-reversed-operator -Wno-deprecated-volatile -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- Conan: checking conan executable
-- Conan: Found program /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan
-- Conan: Version found Conan version 1.59.0
-- Conan executing: /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan install /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src --remote conancenter --install-folder /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 --build missing --env CONAN_CMAKE_GENERATOR=Ninja --settings build_type=Release --settings compiler=apple-clang --settings compiler.version=14.0 --settings compiler.libcxx=libc++ --settings compiler.cppstd=20 --conf tools.cmake.cmaketoolchain:generator=Ninja
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=apple-clang
compiler.cppstd=20
compiler.libcxx=libc++
compiler.version=14.0
os=Macos
os_build=Macos
[options]
[build_requires]
[env]
CONAN_CMAKE_GENERATOR=Ninja
[conf]
tools.cmake.cmaketoolchain:generator=Ninja

  Version ranges solved
      Version range '>=1.2.11 <2' required by 'pcre/8.45' resolved to 'zlib/1.3' in local cache
  
  conanfile.txt: Installing package
  Requirements
      libiconv/1.15 from 'conancenter' - Cache
      openssl/1.1.1t from 'conancenter' - Cache
  Packages
      libiconv/1.15:e1ef30a7ac2ff8c218173fdf49ec961a5c046a36 - Cache
      openssl/1.1.1t:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
  Build requirements
      bzip2/1.0.8 from 'conancenter' - Cache
      pcre/8.45 from 'conancenter' - Cache
      ragel/6.10 from 'conancenter' - Cache
      swig/4.0.2 from 'conancenter' - Cache
      yasm/1.3.0 from 'conancenter' - Cache
      zlib/1.3 from 'conancenter' - Cache
  Build requirements packages
      bzip2/1.0.8:b9b85a7c8f543b96385e1da9e174853f1fb08e0c - Cache
      pcre/8.45:842afe377248eac66b64b538531df2b005d57959 - Cache
      ragel/6.10:801752c0480319b8e090188c566245a78e9abcf4 - Cache
      swig/4.0.2:099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb - Cache
      yasm/1.3.0:801752c0480319b8e090188c566245a78e9abcf4 - Cache
      zlib/1.3:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
  
  Installing (downloading, building) binaries...
  bzip2/1.0.8: Already installed!
  libiconv/1.15: Already installed!
  openssl/1.1.1t: Already installed!
  ragel/6.10: Already installed!
  ragel/6.10: Appending PATH environment variable: /Users/jk1/.conan/data/ragel/6.10/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
  yasm/1.3.0: Already installed!
  yasm/1.3.0: Appending PATH environment variable: /Users/jk1/.conan/data/yasm/1.3.0/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
  zlib/1.3: Already installed!
  pcre/8.45: Already installed!
  swig/4.0.2: Already installed!
  swig/4.0.2: Appending PATH environment variable: /Users/jk1/.conan/data/swig/4.0.2/_/_/package/099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb/bin
  conanfile.txt: Applying build-requirement: ragel/6.10
  conanfile.txt: Applying build-requirement: swig/4.0.2
  conanfile.txt: Applying build-requirement: yasm/1.3.0
  conanfile.txt: Applying build-requirement: pcre/8.45
  conanfile.txt: Applying build-requirement: bzip2/1.0.8
  conanfile.txt: Applying build-requirement: zlib/1.3
  conanfile.txt: Generator cmake_find_package created Findragel.cmake
  conanfile.txt: Generator cmake_find_package created FindSWIG.cmake
  conanfile.txt: Generator cmake_find_package created Findyasm.cmake
  conanfile.txt: Generator cmake_find_package created FindIconv.cmake
  conanfile.txt: Generator cmake_find_package created FindOpenSSL.cmake
  conanfile.txt: Generator cmake_find_package created FindPCRE.cmake
  conanfile.txt: Generator cmake_find_package created FindBZip2.cmake
  conanfile.txt: Generator cmake_find_package created FindZLIB.cmake
  conanfile.txt: Generator cmake_paths created conan_paths.cmake
  conanfile.txt: Generator txt created conanbuildinfo.txt
  conanfile.txt: Aggregating env generators
  conanfile.txt: Generated conaninfo.txt
  conanfile.txt: Generated graphinfo
  conanfile.txt imports(): Copied 434 '.i' files
  conanfile.txt imports(): Copied 273 '.swg' files
  conanfile.txt imports(): Copied 1 '.swig' file: Makefile.swig
  conanfile.txt imports(): Copied 2 '.ml' files: swig.ml, swigp4.ml
  conanfile.txt imports(): Copied 1 '.pl' file: Makefile.pl
  conanfile.txt imports(): Copied 6 files
  conanfile.txt imports(): Copied 1 '.rb' file: extconf.rb
  conanfile.txt imports(): Copied 1 '.h' file: noembed.h
  conanfile.txt imports(): Copied 1 '.scm' file: common.scm
  conanfile.txt imports(): Copied 1 '.mli' file: swig.mli
  conanfile.txt imports(): Copied 1 '.hpp' file: octheaders.hpp
  CMake Error at /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
    Could NOT find Python3 (missing: Python3_INCLUDE_DIRS Python3_LIBRARIES
    Development Development.Module Development.Embed)
  Call Stack (most recent call first):
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython/Support.cmake:3166 (find_package_handle_standard_args)
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython3.cmake:485 (include)
    catboost/python-package/catboost/CMakeLists.darwin-x86_64.txt:9 (find_package)
    catboost/python-package/catboost/CMakeLists.txt:20 (include)
  
  
  -- Configuring incomplete, errors occurred!
  See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeOutput.log".
  See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeError.log".
  Traceback (most recent call last):
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
      return _build_backend().build_wheel(wheel_directory, config_settings,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 434, in build_wheel
      return self._build_with_temp_dir(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir
      self.run_setup()
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 507, in run_setup
      super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in run_setup
      exec(code, locals())
    File "<string>", line 731, in <module>
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 397, in run
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 364, in run
      self.run_command("build")
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 332, in run
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 444, in run
    File "<string>", line 462, in build_with_cmake_and_ninja
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 517, in build
      cmd_runner.run(cmake_cmd, env=build_environ)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 164, in run
      subprocess.run(cmd, check=True, **subprocess_run_kwargs)
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/subprocess.py", line 571, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['cmake', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src', '-B', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain', '--log-level=VERBOSE', '-DCMAKE_POSITION_INDEPENDENT_CODE=On', '-DCATBOOST_COMPONENTS=python-package', '-DCMAKE_OSX_DEPLOYMENT_TARGET=11.0', '-DHAVE_CUDA=no', '-DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for catboost
Successfully built temporai
Failed to build catboost
ERROR: Could not build wheels for catboost, which is required to install pyproject.toml-based projects

</details>

[Feat] Add pipeline logic

Feature Description

The library should offer the possibility to execute multiple plugins in sequence, and sample hyperparameters for all of them.

Sampling hyperparameters require a class, so that you don't instantiate an useless object.
To that end, you can create meta classes in python by inheriting the type class directly.

The pipeline wrapper should offer the following interface:

  • fit - train the pipeline
  • predict - transform(for preprocessing plugins in the pipeline) and predict
  • hyperarameters_space/sample_hyperparameters which should call the sampling logic from each plugin.

Reference implementation https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/plugins/pipeline/__init__.py

The feature should be covered by tests, covering:

  • a standard pipeline train and predict.
  • hyperparameter sampling and pipeline instantiation.
  • serialization

depends on https://github.com/vanderschaarlab/temporai-priv/issues/5

[Epic] Testing

Description

All the classes should have test coverage.

Type of Test

  • Unit test (e.g. checking a loop, method, or function is working as intended)
  • Integration test (e.g. checking if a certain group or set of functionality is working as intended)
  • Regression test (e.g. checking if by adding or removing a module of code allows other systems to continue to function as intended)
  • Stress test (e.g. checking to see how well a system performs under various situations, including heavy usage)
  • Performance test (e.g. checking to see how efficient a system is as performing the intended task)
  • Other...

[Evaluation] Add metrics for evaluating regression tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating regression tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

Important metrics to cover here:

  • r2" R^2(coefficient of determination) regression score function.
  • mse: Mean squared error regression loss.
  • mae: Mean absolute error regression loss.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

time_to_event models and scaling methods do not work with categorical data

Hi I found that the time_to_event models and scaling methods do not work with categorical data.

To reproduce it, one can go to the tutorials/data/tutorial1_data_format.ipynb, use the data example (comment one line of event data) and run the time to event models.

I have only tried time to event models and scaling methods, other methods might not work on categorical data as well. According to the tutorials for data format, pandas.Catergorical is supported as column values?

Thanks for looking into it!
Wenjuan

Use nbsphinx for tutorials / user guide in docs

nbsphinx extension is designed for integrating notebooks into documentation. Hence we should use that instead of the custom code in docs/pre_build.py. Ideally should also find a way of having the link to colab at the top of each tutorial in the docs.

Upgrade to pydantic 2

Pydantic 2.0 is now the current version, so changes need to be made to use it rather then 1.0.

nn_regressor: Pydantic crash

Description

Pydantic imposes a limit on the number of temporai objects that can be instantiated.

Example: In the test_nn_regressor.py, the following snippet will crash

 def test_hyperparam_sample():
     for repeat in range(10000):  # pylint: disable=unused-variable
         args = plugin._cls.sample_hyperparameters()  # pylint: disable=no-member, protected-access
         plugin(**args)

with the error

>   ???
E   pydantic.error_wrappers.ValidationError: 1 validation error for _InitArgsValidator
E   __root__
E     Model parameters could not be validated as defined by `EmptyParamsDefinition`, cause: 
E   ---------------
E   RecursionError:
E   maximum recursion depth exceeded
E   ---------------
E    (type=value_error)

pydantic/main.py:342: ValidationError

Expected behaviour

Pydantic should not limit the functionality of the library.

[Enhancement] Ensemble support

Feature Description

Given a set of pipelines - or just estimators, users should be able to create ensembles.

Popular ensemble techniques

  • WeightedEnsemble: average across all scores/prediction results, maybe with weights
  • Stacking (meta ensembling): use a meta learner to learn the base classifier results
  • Majority Vote Ensemble
  • DCS: Dynamic Classifier Selection: Combination of multiple classifiers using local accuracy estimates
  • DES: Dynamic Ensemble Selection: From dynamic classifier selection to dynamic ensemble selection

Reference code in AutoPrognosis: https://github.com/vanderschaarlab/autoprognosis/tree/main/src/autoprognosis/plugins/ensemble

More about here: https://github.com/yzhao062/combo

[AutoML] Add AutoML objective evaluation for regression tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #10
The benchmark is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/regression.py

[Evaluation] Add metrics for evaluating classification tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating classification tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

Important metrics to cover here:

  • aucroc : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
  • aucprc : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
  • accuracy : Accuracy classification score.
  • f1_score(micro, macro, weighted): F1 score is a harmonic mean of the precision and recall. This version uses the "micro" average: calculate metrics globally by counting the total true positives, false negatives and false positives.
  • kappa: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.
  • precision(micro, macro, weighted): Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.
  • recall(micro, macro, weighted): Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.
  • mcc: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

Investigate docs API generation

  1. The way module reference (API) is rendered in docs is not great - need to investigate.
  1. Warnings raised on documentation building like
/mnt/data-fourtb/Dropbox/Programming/wsl_repos/_vds/temporai/docs/../src/tempor/data/pandera_utils.py:docstring of tempor.data.pandera_utils:1: WARNING: Inline interpreted text or phrase reference start-string without end-string.
...

Investigate and fix these.

Add tutorials(notebooks + Colab links)

Description

  • The library should have a tutorial for each major feature.
  • The tutorials should be notebooks, and should be also deployed on Colab, for easier use.

[Feat] Benchmarking tools

Feature Description

The library should offer methods for evaluating predictive models/pipelines.

For each problem type, there can be different relevant metrics, as described in the linked tasks.

The evaluation should be done using KFold(regression)/StratifiedKFold(classification, survival analsysis), and predefined random seed.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py
Example implementation for time series survival
https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/plugins/core/models/time_series_survival/benchmarks.py#L142

blocked by #9
blocked by #10
blocked by #11

[CI] Github workflows

Description

Before releasing, the library should be tested on the matrix {MacOS, Windows, Linux} x {ython {3.7, 3.8, 3.9, 3.10} for compatiblity.

On each test scenario, all the unit tests should pass.

Reference workflow: https://github.com/vanderschaarlab/autoprognosis/blob/main/.github/workflows/test.yml


Additional notes:

[AutoML] Add AutoML objective evaluation for classification tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #11
The benchmark is done using the cross-validation tester documented in #20.
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/classifiers.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.