michaelgao8 / classification-metrics Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 0.0 814 KB

:bar_chart: Utility for automated model metric figures, interactive Jupyter Notebook slides, and pdf reports

classification-metrics's People

Contributors

Stargazers

Watchers

classification-metrics's Issues

Add Bootstrapping for the ROC curve

Ideally, there would be bootstrapping to generate confidence bands around the ROC curve. I initially took inspiration from here.

This requires looking into how to bootstrap at certain thresholds, which may be nontrivial. Because the original is written in R's data.table package, it can be difficult to infer the process with which they used.

This requires changing the plot_dict object as well as modifying the plotting code in the template.

Test directory creation

Right now, if a user wants to write the output of the papermill call to a directory, I'm not sure what happens if the directory does not exist.
This also occurs with the -o flag in the generate_plot_data.py module.

Test to see what happens in this instance
If an error is thrown, create a change so that the directory is created (perhaps a message is also given).

Risk Deciles are reversed on PPV by Decile plot

The PPV by decile plot can be difficult to interpret due to the x-axis label being unintuitive. The deciles of risk should start near 100 and go down to 0, whereas right now they are percentiles of risk (descending).

This should be reversed, or at least made clear.

Inject Title as a Markdown cell and take in a parameter to add the title

Currently, the resulting .ipynb file has no title slide, which is annoying. However, it can be difficult to actually inject a markdown cell without opening the notebook.

Under the hood, all jupyter notebooks are represented as json structure, meaning that it is feasible to read in the json and insert a cell in the form of

{
    cells: {
        type: 'markdown'
    }
}

or something like that. This should be done after the papermill call, but not after any pdf conversions

Add a thresholding tool for confusion matrices

Similar to this tool:

michaelgao.shinyapps.io/threshold_tuning

it would be nice to incorporate ipywidgets in the RISE-driven jupyter notebooks. A proof of concept would be to use this to make a slider that would change the output of a confusion matrix as well as other metrics (sensitivity, specificity, PPV, NPV), etc.

cutpoints for precision @ k

Currently, the cutpoints variable in generate_plot_data.py returns the cutpoints to calculate the precision. However, the x-axis of the associated plot does not use this. For the sake of consistency, it would be useful either to:

Remove the cutpoints key from the dictionary
Change the cutpoints to a np.linspace call and replace the plot call in Metrics Template.ipynb

Design command-line interface

Currently, the process to use this tool involves

running a python script to generate intermediate data (generate_plot_data.py)
Calling papermill to inject and execute the new Jupyter notebook
Running jupyter notebook inside the container to launch the slides
Potentially #2 converting to pdf

However, some of these steps can be consolidated and called directly from python.

For example, currently generate_plot_data.py takes in command line parameters that can be inspected by running generate_plot_data.py -h.

Since this writes intermediate data to a file, and this file is then specified again in the papermill call, papermill can potentially be called directly from python in the following way:

Call Papermill from python

Designing a command-line interface that specifies flags for things like saving figures to a directory, converting to pdf, etc. can all be done within a python script by adding the appropriate argparse arguments.

Add PDF conversion as an option

Currently, the conversion to a pdf file, which would normally work as follows:

nbconvert  --to pdf <original_notebook.ipynb>

However, this functionality does not currently work because of a missing pandoc and tex installation.

To work on this, it may be useful to shell into the container and try to install the packages as referenced here (note that the docker container uses apt as its package manager)
and then test the functionality.

Ultimately, this should then be incorporated in the docker container itself. To try this on a fresh image, rebuild the container and then try to convert a file to pdf.

nbconvert pdf

Add documentation to the plots

Before each plot, as a sub-slide, it will be useful to describe what the metric is and how it used. The primary audience will be people who are interested in the model metrics, but may not know exactly how to interpret the results.

After discussing with Suresh, it may be useful to think about what an "ideal" (perfect) classifier would look like, and what the worst classifier is. This helps orient people spatially.

For example, an ROC curve might start with an explanation of what the ROC curve is and then show that the perfect classifier is a triangle formed by the left and top edges of the plot along with the diagonal and the worst classifier just lies along that diagonal.

michaelgao8 / classification-metrics Goto Github PK

classification-metrics's People

Contributors

Stargazers

Watchers

classification-metrics's Issues

Add Bootstrapping for the ROC curve

Test directory creation

Risk Deciles are reversed on PPV by Decile plot

Inject Title as a Markdown cell and take in a parameter to add the title

Add a thresholding tool for confusion matrices

cutpoints for precision @ k

Design command-line interface

Add PDF conversion as an option

Add documentation to the plots

Modify Dockerfile to pull from alpine/slim while still ensuring apt functionality

Create an example model with output and then use the tool

Add Bootstrapping for Calibration Curve

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent