alteryx / evalml Goto Github PK

EvalML is an AutoML library written in python.

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.11% Python 99.84% Shell 0.05%

automl machine-learning data-science model-selection hyperparameter-tuning optimization feature-engineering feature-selection

evalml's People

Contributors

Stargazers

Watchers

Forkers

thehomebrewnerd johnkabler sanazdaneshvar sandy4321 actuarial-tools anhmike yutiansut sharshofski sclausen1 trendingtechnology baagie7 alexmetelli mikewcasale sparkpoints kaidisn joalmjoalm blockchainclimateinstitute skvorekn stjordanis pragyanaischool iloleg amirstudy burhanh jorik041 python01100100 sajidmeo sujala chetanmehra avain passion4energy earlbabson mohanrajofficial87 gortaina open-sources-project junryeol minje0611 nsood-ai obinnaobeleagu estkae boyoforex neronjust2017 hercules261188 cp-sadag milton-miranda taufik-mhamad ctwgl freddyaboulton tahasha peterataylor sarahcharlotte gannino protect-identity maheshmoholkar isabella232 gcode-ai j-maxey colinrtaylor ayanatherate rezaakhtar azulgarza kryvonis harshblue fdoperezi tanglespace skatingboy2006 harshvardhanb25 pxuan19 acartro fudp phymucs nasryousif arpitjain799 gfyre harel-coffee hipotures michaelfu512 ajx17 mjcarbonell danielavdar 4rva vineetp6

evalml's Issues

Write up guide for using callbacks

Callbacks were implemented in #42. We should add an explanation of potential uses cases in the docs e.g integrating EvalML into other systems

Improve detecting features with label leakage

Add functionality to check if any of the features are highly correlated with the target.

It can be a standalone function

import evalml

evalml.detect_label_leakage(X, y, threshold=.85)

this returns dictionary of features with correlations greater than or equal to threshold. threshold could default to .9.

{
"feature_1": .9,
"feature_2:  .95,
"feature_3": 1.0
}

By default, the automated pipeline search can make this check and raise warning. Can be turned off like this

evalml.AutoClassifer(detect_label_leakage=False)

Begin EvalML Integration Tests

Start new repo
Design metrics and architecture
Implement for one test case

Standardize Input as pd

Currently we do not standardize input as pandas DataFrames or Series. FraudCost.decision_function breaks when getting passed an np.array instead.

Categorical Encoding Design

Design for categorical encodings.

Should it go in the pipeline? Should it be part of preprocessing?

Add additional objectives to `describe_pipeline()`

Add a parameter that allows describe_pipeline() to score on any additional objectives.

Add plots for feature importance

Create documentation for using Zenhub.

Implement Test Suite

Expand testing architecture to encompass multiple tests

Design testing suite
Implement for a small number of datasets

`test_seralization` fails sometimes

The assertion in test_serialization fails sometimes if the two compared models actually produce the same score.

Split data doesn't support regression

We should add in support for regression to evalml.preprocessing.split_data

Data Check API

https://alteryx.quip.com/GgtAAtfgbDKf/Data-Checks-API-Design

Ability to save/load AutoML objects

There is a save_pipeline function but no save_classifier function

It'd be nice to be able to save classifiers or let pipelines store more info so describe_pipeline would be possible with just the saved pipeline

Performance Testing for EvalML

As we flesh out functionality for EvalML, it will be imperative to monitor performance across test datasets to ensure regressions do not occur as we continue to develop.

Metrics to consider:

Time to run
Performance metrics (R2/Acc/etc.)
Variance of pipelines

Datasets to consider:
https://www.openml.org/home

Improve describe_pipeline API

Right now you have to do AutoClassifier.describe_pipeline(id), but it's more intuitive to be able to do AutoClassifier.get_pipeline(id).describe() or AutoClassifier.best_pipeline.describe()?

The main issue right now is that the results of fitting and cross validating the pipeline are on the AutoClassifier object rather than the pipeline object.

EvalML Testing/Documentation

As we flesh our EvalML, it will be critical to keep up with testing and documentation.

Improve hyperparameter organization

Right now all parameters specified at one level for the whole pipeline. We should break up parameters by step of the pipeline.

This could also be used to improve the output of describe_pipeline

Automatically Infer Multiclass

Currently we require users to pass in if the classification task is multiclass or binary. However, we could infer from the provided objective if the user intended for the task to be multiclass. Therefore, we should decide if we should automatically infer multiclass=True, raise an error/warning, or something else.

Add FAQ to documentation

Choose Which Scores For `describe_pipeline()`

Currently we do not have the option to choose which scores to display for describe_pipeline(). This can be added as an additional parameter.

Preliminary Categorical Encoding Implementation

Implement one-hot encoding in current fixed pipelines.

Add plot for the best performance by iteration count

This can inform the user how much continuing to run the search process is improving pipeline performance.

A simple implementation would just be the line, but we could also superimpose a scatter plot all models searched

Add section on "Guardrails" provided by EvalML

Internal cross validation
- User can provided their own
- Warning in the output if there is a high CV variance
Internal holdout for learning the threshold
Make your own validation set when training the AutoClassifier
TODO: Add check for if a feature is 95% correlated with label

EvalML New Features

Let's continue move to a bigger release with more features.

Add tests for loading dataset

test calling all the demos.load_* methods

Automate Integration Tests

Automate integration test suite on potentially the following options:

Release
Scheduled
?

Add in more metrics for regression

Create repo of testing datasets

Run EvalML on all openml datasets and compare results

performance, runtime, memory, usage
also collect best hyperparameters to help EvalML in the future

I think dataset can be accessed using this api: https://docs.openml.org/

Add support for different units of time for max_time

Currently, max_time can only be specified in seconds. For much larger datasets, it would be a lot easier to specify the max time in minutes, hours, or days.

Change display of pipeline search progress

Current we show a progress bar that overwrites itself. This means at the end of the search it looks something like this

Testing LogisticRegression w/ imputation + scaling: 100%|██████████| 5/5 [00:13<00:00,  2.73s/it]

We've gotten feedback from users saying they'd like to see the history of the search as it progresses.

We should update output to show each of the models being searched

Tests using Auto(*) currently always pass as errors are not raised.

Right now there is no way of telling if the Auto(*) tests actually fail as we're not raising errors when training the pipelines. To solve this we can check for NaN scores or raise errors.