Giter Club home page Giter Club logo

classification-metrics's People

Contributors

michaelgao8 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

classification-metrics's Issues

Add Bootstrapping for the ROC curve

Ideally, there would be bootstrapping to generate confidence bands around the ROC curve. I initially took inspiration from here.

This requires looking into how to bootstrap at certain thresholds, which may be nontrivial. Because the original is written in R's data.table package, it can be difficult to infer the process with which they used.

This requires changing the plot_dict object as well as modifying the plotting code in the template.

Test directory creation

Right now, if a user wants to write the output of the papermill call to a directory, I'm not sure what happens if the directory does not exist.
This also occurs with the -o flag in the generate_plot_data.py module.

  1. Test to see what happens in this instance
  2. If an error is thrown, create a change so that the directory is created (perhaps a message is also given).

Risk Deciles are reversed on PPV by Decile plot

The PPV by decile plot can be difficult to interpret due to the x-axis label being unintuitive. The deciles of risk should start near 100 and go down to 0, whereas right now they are percentiles of risk (descending).

This should be reversed, or at least made clear.

Inject Title as a Markdown cell and take in a parameter to add the title

Currently, the resulting .ipynb file has no title slide, which is annoying. However, it can be difficult to actually inject a markdown cell without opening the notebook.

Under the hood, all jupyter notebooks are represented as json structure, meaning that it is feasible to read in the json and insert a cell in the form of

{
    cells: {
        type: 'markdown'
    }
}

or something like that. This should be done after the papermill call, but not after any pdf conversions

cutpoints for precision @ k

Currently, the cutpoints variable in generate_plot_data.py returns the cutpoints to calculate the precision. However, the x-axis of the associated plot does not use this. For the sake of consistency, it would be useful either to:

  1. Remove the cutpoints key from the dictionary
  2. Change the cutpoints to a np.linspace call and replace the plot call in Metrics Template.ipynb

Design command-line interface

Currently, the process to use this tool involves

  1. running a python script to generate intermediate data (generate_plot_data.py)
  2. Calling papermill to inject and execute the new Jupyter notebook
  3. Running jupyter notebook inside the container to launch the slides
  4. Potentially #2 converting to pdf

However, some of these steps can be consolidated and called directly from python.

For example, currently generate_plot_data.py takes in command line parameters that can be inspected by running generate_plot_data.py -h.

Since this writes intermediate data to a file, and this file is then specified again in the papermill call, papermill can potentially be called directly from python in the following way:

Call Papermill from python

Designing a command-line interface that specifies flags for things like saving figures to a directory, converting to pdf, etc. can all be done within a python script by adding the appropriate argparse arguments.

Add PDF conversion as an option

Currently, the conversion to a pdf file, which would normally work as follows:

nbconvert  --to pdf <original_notebook.ipynb>

However, this functionality does not currently work because of a missing pandoc and tex installation.

To work on this, it may be useful to shell into the container and try to install the packages as referenced here (note that the docker container uses apt as its package manager)
and then test the functionality.

Ultimately, this should then be incorporated in the docker container itself. To try this on a fresh image, rebuild the container and then try to convert a file to pdf.

nbconvert pdf

Add documentation to the plots

Before each plot, as a sub-slide, it will be useful to describe what the metric is and how it used. The primary audience will be people who are interested in the model metrics, but may not know exactly how to interpret the results.

After discussing with Suresh, it may be useful to think about what an "ideal" (perfect) classifier would look like, and what the worst classifier is. This helps orient people spatially.

For example, an ROC curve might start with an explanation of what the ROC curve is and then show that the perfect classifier is a triangle formed by the left and top edges of the plot along with the diagonal and the worst classifier just lies along that diagonal.

Create an example model with output and then use the tool

using sklearn's built-in datasets and models, it may useful to built a Jupyter notebook trained on some data and then to use those model results to test the tool. This can also act as an example for newcomers to the tool.

Will involve changing the README to include the example and how to use it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.