districtdatalabs / yellowbrick Goto Github PK

Visual analysis and diagnostic tools to facilitate machine learning model selection.

License: Apache License 2.0

Makefile 0.07% Python 99.70% TeX 0.23%

anaconda estimator machine-learning matplotlib model-selection python scikit-learn visual-analysis visualization visualizer

yellowbrick's Issues

Grid Search Viz

Create a package for visualizing grid search results. How to generalize this?

Publish Yellowbrick to PyPI

Make sure that yellowbrick is pip installable and on PyPI.

Feature Analysis: Rank 2D

Seaborn has an example of this, but let's create a function that will immediately give the rank 2d in advance of the SPLOM.

Convert ROCAUC and ClassifierReport

Transform ClassifierReport and ROCAUC into EstimatorScore class that knows how to handle them

Update documentation

specifically adding the roadmap and end goal description and depiction of the yellowbrick/scikit-learn workflow (maybe something like baleen has)

Update all the docstrings

Now that we are in sphinx land we should update all the docstrings to conform to sphinx markup and also check for accuracy.

Good way to mark items as interface? Relevant?
Good way to autodoc params from a docstring, not the docstring itself or the method name, for cases when I want to DRYly autodoc-inject params from an interface?

Base visualization structure

Accepts as input a model + data, and might also build a model in return. We can use this class as a stub for general functionality, but subclassed for specifics.

Note that our primary API will be functions not classes. Unlike Seaborn, however, potentially we can create classes that are callable.

Add Sphinx autodoc and intersphinx

Add sphinx autodoc and intersphinx to pull docstrings into documentation.

Regressor Viz Package

Create a package for visualizing Regressors.

Single feature analysis tools (augmenting Rank1D)

Radviz, parallel coordinates, and Rank2D are already implemented. We should expand the feature visualization toolset to include single feature analysis tools:

boxplots
histograms
violinplots

Create a SingleFeatureVisualizer class whose poof method takes as input the name(s) or position(s) of one of the features. Display the appropriate visualization allowing multiple poof calls with different features. Make sure to integrate this with Rank1D.

Open questions: do we need to store the data? Or can we create all visualization and then display on demand, or is there some middle ground?

Note: similar to parallel coordinates and radviz, these need to be native implementations, emulating other libs (seaborn, pandas) as necessary.

Fix Travis tests

broke when trying to install matplotlib/scipy/numpy/sklearn etc.

Isolate Yellowbrick Dependencies

Lots of libraries are pinned to versions in the requirements.txt

Should this be the case?

Multi-Estimator Visualizer

Add exception hierarchy

Add hierarchy for exceptions.

Multiplot API

Seaborn has a FacetGrid - do we need something similar for generating multiple plots?

Classifier Viz package

Create a package for visualizing classifier performance.

Colors and rcparams

We currently have moved over the Seaborn matplotlib rc params modifications over to yellowbrick; however I'm going to constrain this as follows:

for font/size/style etc I will use the "notebook context" as the unchangeable default and pass it back to the user to make other contextual changes (without yellowbrick help); we can bring this back later if needed.
for colors I'm going all in on the ColorPalette object, which I think is awesome.
rcmod will happen on import for the notebook style and the default colors.

Additional Datasets

Need to get some additional datasets for testing. We currently have two classification datasets and one regression one. We need:

at least one more regression dataset
at least one more classification dataset that's not a binary classification.

Convert radviz into FeatureVisualizer

Convert radviz into FeatureVisualizer. Allow the user to specify which dimensions to visualize, either in init or draw.

Draft gallery for the docs

Add in some initial images for the gallery in the documentation

Model name detection

Separate model name detection function to extract from pipelines.

Develop API (yellowbrick/scikit-learn)

Yellowbrick needs a standard API that coordinates with and/or corresponds to the Scikit-learn API so that each visualization tool accepts parameters and produces outputs in a systematic way

Visual Pipeline

Create a visual pipeline class that extends the scikit-learn pipeline class to ensure that visualizers get called correctly.

Convert prediction error and regression error plots into estimator scores

Convert prediction error and regression error plots into EstimatorScores

refactor ClassifierReport class

refactor ClassifierReport class to take multiple models as input

Fitted vs. Unfitted API

Since we are moving towards thinking about this as a fitted model API (e.g. a version for more experienced Scikit-Learn users who will naturally want to have more control over their model fitting and tuning), we need to refactor the ROCAUC and ResidualsPlot classes so that they are not doing the fitting and predicting, but instead taking the results of fit and predict (as well as the split train/test data for residuals, and the y_pred for roc-auc).

Contributor Documentation

Create documentation about how to contribute.

Estimator Score Visualizers

Text Visualizations

Create a new module text and a base class TextVisualizer for performing text visualization.

Finish the palettes listing

This is a great project for anyone who wants a way to quickly and easily contribute to Yellowbrick! Basically what we need is a repair and extension of the palettes we've added in yellowbrick.colors.palettes to make sure everything looks great. This is a high impact task that should be relatively easy!

Check over the colors in the palettes notebook and make sure that the start of the color palette (for unordered color maps, e.g. not Paired or Sequential) has the following rough sequence: blue, green, red, maroon, yellow, cyan and if they're longer than 6 long, then black (anything else can follow).
Go to the colorbrewer page and make sure that all the palettes there are represented.
Create a sequential colormap list with mplcol.ListedColormap as follows: add a new variable called _COLOR_MAPS with named lists of the color maps, then create a new top level variable COLOR_MAPS that converts the raw lists into a ListedColorMap (e.g. is a dictionary of names to ListedColorMap objects. Alternatively simply create a function color_sequence that behaves like color_palette but returns a ListedColorMap.

Happy to discuss this and any questions.

Getting Started - README

We need documentation to discuss how to get started with Yellowbrick in the README. We're hoping for an end-to-end walk through that starts with pip install and then discusses how to get setup for using the library in other code bases, particularly jupyter notebooks (matplotlib inline) but also for getting started to do development on the library.

Create deployment scripts

Get pushed to PyPI

Bad link in README

in readme there is a broken link that should lead to “quick start guide”, in “Using YellowBrick” section.

Best,

Style Management

Create a scheme similar to Seaborn for creating a unified style and context system that sits on top of mpl.RCParams (so that every plot generated by our tool looks the same).

Additional Datasets

Need to get some additional datasets for testing. We currently have two classification datasets and one regression one. We need:

at least one more regression dataset
at least one more classification dataset that's not a binary classification.

Add ClassBalance class

Add ClassBalance class, inherits from ClassificationScoreVisualizer class. The purpose is to help the user in visualizing any potential class imbalance issues.

Feature Selection Visualization

How do we adapt radviz, parallel coords, splom, etc. so that our tool is better/more capable than seaborn or pandas.

One idea is to perform automatic feature analysis and show which features are most visually relevant to the models. In that case we would pull the relevant code from the libraries and adapt it to our use.

Feature Visualizer

This issue replaces #7

Method for testing images and matplotlib renderings.

Need to develop a custom test suite to test images (e.g. test that images generated via visualization methods are not blank)

Release Version 0.3

See the Version 0.3 release milestone for more details.

Create Base Visualizer

Create a class called VisualizerMixin in a module called base.py at the root of yellowbrick.

The intent is that visualizers should extend Scikit-Learn's BaseEstimator, TransformerMixin and our VisualizerMixin classes - giving it the following required methods:

fit
draw
fit_draw

The idea is that fit will be passed X data and maybe y data and will prepare the data for drawing, and then draw will actual conduct the drawing.

The __init__ method should take styling arguments. So things like size, color, whether or not to save to a file, markers, line stuff, etc.

NOTE: check if the transformer mixin extends from BaseEstimator and if it does, then also subclass VisualizerMixin from BaseEstimator - to allow us use of set_param and get_param on rendering variables.

Best fit curves

We have linear best fit lines, but will need quadratic and exponential too.

Ensure that on import, default style loads

Make sure that color palette is loaded in init.py so that our default styles always exist, but also can be overrode by the user.

Add InformativeFeatures class

Need to add a class to enable the user to evaluate the features that were most informative for a fitted model. This class will inherit from ScoreVisualizer

Best Fit Lines

Our visualization uses best fit lines, and we need a mechanism to automatically add these to the scatter plots.

Convert parallel coordinates into FeatureVisualizer

Convert parallel coordinates into a FeatureVisualizer. Allow the user to specify which dimensions to visualize, either in init or draw.

Make example data downloader more robust

Make the UCI data downloader from the examples more robust.

Improve Parallel Coordinates

See #50 for a more detailed discussion.

There are several things that need to be done to improve the parallel coordinates visualizer:

Benchmarking, it is currently SLOW
Add a draw_instance method so that instances can be added to the figure at any time
Add DataFrame support so that the visualizer can accept either an ndarray or a DataFrame as input.
Create a subclass NormalizedParallelCoordinates that normalizes the data to the space 0 to 1 before drawing the coordinates.
Add subsampling of instances to reduce clutter and improve performance
Add fast vs. slow drawing methods for performance
Add alpha so that you can see instances through the other lines
~~Create an optimization technique that reorders the columns such that the overlap of two instances by the same class is minimized~~

There are probably several more things and there are comments in the Parallel Coordinates codebase as well.

Most informative features

Create a visualization that inspects or compares the L1 vs. L2 regularizers by showing the relative weights of each feature for each norm.

districtdatalabs / yellowbrick Goto Github PK

yellowbrick's Issues

Recommend Projects

Recommend Topics

Recommend Org