Giter Club home page Giter Club logo

mltb's Introduction

PyPI version License

We decided to archive this project and migrate the most important functionality to MLTB2.

Machine Learning Tool Box

This is the machine learning tool box. A collection of userful machine learning tools intended for reuse and extension. The toolbox contains the following modules:

  • hyperopt - Hyperopt tool to save and restart evaluations
  • keras - Keras (tf.keras) callback for various metrics and various other Keras tools
  • lightgbm - metric tool functions for LightGBM
  • metrics - several metric implementations
  • plot - plot and visualisation tools
  • tools - various (i.a. statistical) tools

Module: hyperopt

This module contains a tool function to save and restart Hyperopt evaluations. This is done by saving and loading the hyperopt.Trials objects. The usage looks like this:

from mltb.hyperopt import fmin
from hyperopt import tpe, hp, STATUS_OK


def objective(x):
    return {
        'loss': x ** 2,
        'status': STATUS_OK,
        'other_stuff': {'type': None, 'value': [0, 1, 2]},
        }


best, trials = fmin(objective,
    space=hp.uniform('x', -10, 10),
    algo=tpe.suggest,
    max_evals=100,
    filename='trials_file')

print('best:', best)
print('number of trials:', len(trials.trials))

Output of first run:

No trials file "trials_file" found. Created new trials object.
100%|██████████| 100/100 [00:00<00:00, 338.61it/s, best loss: 0.0007185087453453681]
best: {'x': 0.026805013436769026}
number of trials: 100

Output of second run:

100 evals loaded from trials file "trials_file".
100%|██████████| 100/100 [00:00<00:00, 219.65it/s, best loss: 0.00012259809712488858]
best: {'x': 0.011072402500130158}
number of trials: 200

Module: lightgbm

This module implements metric functions that are not included in LightGBM. At the moment this is the F1- and accuracy-score for binary and multi class problems. The usage looks like this:

bst = lgb.train(param,
                train_data,
                valid_sets=[validation_data]
                early_stopping_rounds=10,
                evals_result=evals_result,
                feval=mltb.lightgbm.multi_class_f1_score_factory(num_classes, 'macro'),
               )

Module: keras (for tf.keras)

BinaryClassifierMetricsCallback

This module provides custom metrics in form of a callback. Because the callback adds these values to the internal logs dictionary it is possible to use the EarlyStopping callback to do early stopping on these metrics.

Parameters

Parameter Description Type Default values
val_data Validation input list
val_label Validation output list
pos_label Which index is the positive label Optional[int] 1
metrics List of supported metric names or custom metric functions List[Union[str, Callable]] ['val_roc_auc', 'val_average_precision', 'val_f1', 'val_acc']

Available metrics

  • val_roc_auc : ROC-AUC
  • val_f1 : F1-score
  • val_acc: Accuracy
  • val_average_precision: Average precision
  • val_mcc: Matthews correlation coefficient

The usage looks like this:

bcm_callback = mltb.keras.BinaryClassifierMetricsCallback(val_data, val_labels)
es_callback = callbacks.EarlyStopping(monitor='val_roc_auc', patience=5,  mode='max')

history = network.fit(train_data, train_labels,
                      epochs=1000,
                      batch_size=128,

                      #do not give validation_data here or validation will be done twice
                      #validation_data=(val_data, val_labels),

                      #always provide BinaryClassifierMetricsCallback before the EarlyStopping callback
                      callbacks=[bcm_callback, es_callback],
)

You can also define your own custom metric:

def custom_average_recall_score(y_true, y_pred, pos_label):
    rounded_pred = np.rint(y_pred)
    return sklearn.metrics.recall_score(y_true, rounded_pred, pos_label)


bcm_callback = mltb.keras.BinaryClassifierMetricsCallback(val_data, val_labels,metrics=[custom_average_recall_score])
es_callback = callbacks.EarlyStopping(monitor='custom_average_recall_score', patience=5,  mode='max')

history = network.fit(train_data, train_labels,
                      epochs=1000,
                      batch_size=128,

                      #do not give validation_data here or validation will be done twice
                      #validation_data=(val_data, val_labels),

                      #always provide BinaryClassifierMetricsCallback before the EarlyStopping callback
                      callbacks=[bcm_callback, es_callback],
)

mltb's People

Contributors

philipmay avatar popojargo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mltb's Issues

boxplot bug

This does not work:

mltb.plot.boxplot(all_best_value, labels='best study values', ylabel='average-precision')

labels somehow must always be a list.

Add PyPI Badge

[![PyPI version](https://badge.fury.io/py/mltb.svg)](https://badge.fury.io/py/mltb)

PyPI version

Add copyright to keras.py

Hi @popojargo
can you please add your name in a end copyright line after mine for copyright in this script: https://github.com/PhilipMay/mltb/blob/master/mltb/keras.py

So it looks like this:

# Copyright (c) 2020 by Philip May
# Copyright (c) 2020 by Alexis Côté
# This software is distributed under the terms of the BSD 2-Clause License.
# For details see the LICENSE file in the root directory.

If the contribution was not private but in context of your company please add
, <company> after your name.

Many thanks
Philip

[Keras:Metrics] Use the val prefix for validation metrics

I've recently used your package and I was misleaded by the metric name. Usually in keras, validation metrics are prefixed with val. Maybe it should be better idea to use val_f1 and val_roc_auc since it's evaluated against validation set.

Also thanks for the library!

__init__.py doesn't contain import keras.py

after pip install mltb,
I tried to import codes from mltb.keras but failed.
But I found the code exists in my lib folder,did you forgot to add import keras.py into init.py?
version shows version 0.1

Save fig for plot, remove plt.show() and add doctext

def boxplot(values, labels=None, title=None, xlabel=None, ylabel=None, vert=True, savefig_filename=None):
    """Create boxplot.
   
    Prints one or more boxplots in a single diagram.
    Parameters
    ----------
    values : array_like of numbers for one boxplot or array_like of array_like of numbers for several
        The values to draw the boxplot for. If you want to draw
        more then one boxplot you have to give an array_like  
        of array_like with numbers.
    labels : str or array_like of str, optional
        The labels of the boxplots.
    title : str, optional
        Title of the plot.    
    xlabel : str, optional
        Label name of the x-axis.
    ylabel : str, optional
        Label name of the y-axis.
    vert : bool, optional
        If True (default), makes the boxes vertical. If False, everything is drawn horizontally.
    """
    _, ax = plt.subplots()
   
    if title is not None:
        ax.set_title(title)

    if xlabel is not None:
        ax.set(xlabel=xlabel)

    if ylabel is not None:
        ax.set(ylabel=ylabel)

    ax.boxplot(values, labels=labels, vert=vert)

    plt.grid(b=True, axis='y', linestyle='--')

    plt.xticks(rotation=90)

   
    if savefig_filename is not None:
        plt.savefig(savefig_filename, bbox_inches='tight')
   

Feature Suggestion: Transformers OMLflow Callback

I've been using OptunaMLflow utility, and I think a great addition could be a Transformers OMFlow callback.

This callback would enable automatic MLflow logging via your wrapper function inside the internal loop of a trainer.train() call, to automatically record for instance the evaluation metrics every eval_steps, and also to optionally record the training arguments and model config as params on MLflow.

This is especially useful for trials that do not use cross-validation, and where a single model is trained for a number of epochs, allowing the eval steps to be saved as nested runs.

This implementation would be very similar to the Transformers MLflow callback, but with some custom arguments and designed to wort with an OptunaMLflow trial.

The Transformers implementation is here.

If you agree this could be a useful feature for mltb, I'm happy to submit a PR– I've already put together the code for this for my use case.

TypeError: average_precision_score() got an unexpected keyword argument 'pos_label'

I run your keras_demo.py, but have the error below:
I install the mltb by pip install mltb

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-fae66036345b> in <module>()
     28     class_weight={
     29         0: 1.0,
---> 30         1: 9.0
     31     },
     32 )

~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1037                                         initial_epoch=initial_epoch,
   1038                                         steps_per_epoch=steps_per_epoch,
-> 1039                                         validation_steps=validation_steps)
   1040 
   1041     def evaluate(self, x=None, y=None,

~/anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py in fit_loop(model, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
    215                         for l, o in zip(out_labels, val_outs):
    216                             epoch_logs['val_' + l] = o
--> 217         callbacks.on_epoch_end(epoch, epoch_logs)
    218         if callback_model.stop_training:
    219             break

~/anaconda3/lib/python3.6/site-packages/keras/callbacks.py in on_epoch_end(self, epoch, logs)
     77         logs = logs or {}
     78         for callback in self.callbacks:
---> 79             callback.on_epoch_end(epoch, logs)
     80 
     81     def on_batch_begin(self, batch, logs=None):

~/anaconda3/lib/python3.6/site-packages/mltb/keras.py in on_epoch_end(self, batch, logs)
     35 
     36         average_precision = sklearn.metrics.average_precision_score(self.val_labels, predict_results,
---> 37                                                                     pos_label=self.pos_label)
     38         logs['average_precision'] = average_precision
     39 

TypeError: average_precision_score() got an unexpected keyword argument 'pos_label'

[Keras:Metrics] Optional metrics

By default, a bunch of metrics is added by the BinaryClassifierMetricsCallback.

I would like to propose a new parameter: "metrics" to the callback constructor.
This way, we could specify the metrics that we only need.

Parameter: metrics
Order: 3
Default value: [ ] or ['val_f1','val_best_f1','val_roc_auc','val_average_precision' ]
Type: List[Union[CustomMetric, str]]

Custom metric

Metric name: function.__name__
Type: Callable[ [List[float], List[float] ], float]

A custom metric would follow the same Keras metric API:

def recall_m(y_true, y_pred):
   return

Add Picke Code to tools

    complete_history = []

    try:
        complete_history = joblib.load(filename)
        #evals_loaded_trials = len(trials.statuses())
        #max_evals += evals_loaded_trials
        #print('{} evals loaded from trials file "{}".'.format(evals_loaded_trials, filename))
    except FileNotFoundError:
        pass # do nothing

    complete_history.append(history.history)
    print('Saving history list of lenth {}.'.format(len(complete_history)))
    joblib.dump(complete_history, filename, compress=('gzip', 3))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.