districtdatalabs / yellowbrick Goto Github PK
View Code? Open in Web Editor NEWVisual analysis and diagnostic tools to facilitate machine learning model selection.
Home Page: http://www.scikit-yb.org/
License: Apache License 2.0
Visual analysis and diagnostic tools to facilitate machine learning model selection.
Home Page: http://www.scikit-yb.org/
License: Apache License 2.0
Create a package for visualizing grid search results. How to generalize this?
Make sure that yellowbrick is pip installable and on PyPI.
Seaborn has an example of this, but let's create a function that will immediately give the rank 2d in advance of the SPLOM.
Transform ClassifierReport
and ROCAUC
into EstimatorScore
class that knows how to handle them
specifically adding the roadmap and end goal description and depiction of the yellowbrick/scikit-learn workflow (maybe something like baleen has)
add cover image to README.md
Now that we are in sphinx land we should update all the docstrings to conform to sphinx markup and also check for accuracy.
Accepts as input a model + data, and might also build a model in return. We can use this class as a stub for general functionality, but subclassed for specifics.
Note that our primary API will be functions not classes. Unlike Seaborn, however, potentially we can create classes that are callable.
Add sphinx autodoc and intersphinx to pull docstrings into documentation.
Create a package for visualizing Regressors.
Radviz, parallel coordinates, and Rank2D are already implemented. We should expand the feature visualization toolset to include single feature analysis tools:
Create a SingleFeatureVisualizer
class whose poof
method takes as input the name(s) or position(s) of one of the features. Display the appropriate visualization allowing multiple poof
calls with different features. Make sure to integrate this with Rank1D.
Open questions: do we need to store the data? Or can we create all visualization and then display on demand, or is there some middle ground?
Note: similar to parallel coordinates and radviz, these need to be native implementations, emulating other libs (seaborn, pandas) as necessary.
broke when trying to install matplotlib/scipy/numpy/sklearn etc.
Lots of libraries are pinned to versions in the requirements.txt
Should this be the case?
Add hierarchy for exceptions.
Seaborn has a FacetGrid - do we need something similar for generating multiple plots?
Create a package for visualizing classifier performance.
We currently have moved over the Seaborn matplotlib rc params modifications over to yellowbrick; however I'm going to constrain this as follows:
ColorPalette
object, which I think is awesome.Need to get some additional datasets for testing. We currently have two classification datasets and one regression one. We need:
Convert radviz into FeatureVisualizer
. Allow the user to specify which dimensions to visualize, either in init
or draw
.
Add in some initial images for the gallery in the documentation
Separate model name detection function to extract from pipelines.
Yellowbrick needs a standard API that coordinates with and/or corresponds to the Scikit-learn API so that each visualization tool accepts parameters and produces outputs in a systematic way
Create a visual pipeline class that extends the scikit-learn pipeline class to ensure that visualizers get called correctly.
Convert prediction error and regression error plots into EstimatorScores
refactor ClassifierReport class to take multiple models as input
Since we are moving towards thinking about this as a fitted model API (e.g. a version for more experienced Scikit-Learn users who will naturally want to have more control over their model fitting and tuning), we need to refactor the ROCAUC and ResidualsPlot classes so that they are not doing the fitting and predicting, but instead taking the results of fit and predict (as well as the split train/test data for residuals, and the y_pred for roc-auc).
Create documentation about how to contribute.
Create a new module text
and a base class TextVisualizer
for performing text visualization.
This is a great project for anyone who wants a way to quickly and easily contribute to Yellowbrick! Basically what we need is a repair and extension of the palettes we've added in yellowbrick.colors.palettes to make sure everything looks great. This is a high impact task that should be relatively easy!
mplcol.ListedColormap
as follows: add a new variable called _COLOR_MAPS
with named lists of the color maps, then create a new top level variable COLOR_MAPS
that converts the raw lists into a ListedColorMap
(e.g. is a dictionary of names to ListedColorMap
objects. Alternatively simply create a function color_sequence
that behaves like color_palette
but returns a ListedColorMap
.Happy to discuss this and any questions.
We need documentation to discuss how to get started with Yellowbrick in the README. We're hoping for an end-to-end walk through that starts with pip install and then discusses how to get setup for using the library in other code bases, particularly jupyter notebooks (matplotlib inline) but also for getting started to do development on the library.
Get pushed to PyPI
in readme there is a broken link that should lead to “quick start guide”, in “Using YellowBrick” section.
Best,
Create a scheme similar to Seaborn for creating a unified style and context system that sits on top of mpl.RCParams (so that every plot generated by our tool looks the same).
Need to get some additional datasets for testing. We currently have two classification datasets and one regression one. We need:
Add ClassBalance class, inherits from ClassificationScoreVisualizer class. The purpose is to help the user in visualizing any potential class imbalance issues.
How do we adapt radviz, parallel coords, splom, etc. so that our tool is better/more capable than seaborn or pandas.
One idea is to perform automatic feature analysis and show which features are most visually relevant to the models. In that case we would pull the relevant code from the libraries and adapt it to our use.
This issue replaces #7
Need to develop a custom test suite to test images (e.g. test that images generated via visualization methods are not blank)
See the Version 0.3 release milestone for more details.
Create a class called VisualizerMixin
in a module called base.py
at the root of yellowbrick.
The intent is that visualizers should extend Scikit-Learn's BaseEstimator
, TransformerMixin
and our VisualizerMixin
classes - giving it the following required methods:
fit
draw
fit_draw
The idea is that fit
will be passed X data and maybe y data and will prepare the data for drawing, and then draw will actual conduct the drawing.
The __init__
method should take styling arguments. So things like size, color, whether or not to save to a file, markers, line stuff, etc.
NOTE: check if the transformer mixin extends from BaseEstimator
and if it does, then also subclass VisualizerMixin
from BaseEstimator
- to allow us use of set_param
and get_param
on rendering variables.
We have linear best fit lines, but will need quadratic and exponential too.
Make sure that color palette is loaded in init.py so that our default styles always exist, but also can be overrode by the user.
Need to add a class to enable the user to evaluate the features that were most informative for a fitted model. This class will inherit from ScoreVisualizer
Our visualization uses best fit lines, and we need a mechanism to automatically add these to the scatter plots.
Convert parallel coordinates into a FeatureVisualizer
. Allow the user to specify which dimensions to visualize, either in init
or draw
.
Make the UCI data downloader from the examples more robust.
See #50 for a more detailed discussion.
There are several things that need to be done to improve the parallel coordinates visualizer:
draw_instance
method so that instances can be added to the figure at any timeDataFrame
support so that the visualizer can accept either an ndarray
or a DataFrame
as input.NormalizedParallelCoordinates
that normalizes the data to the space 0 to 1 before drawing the coordinates.There are probably several more things and there are comments in the Parallel Coordinates codebase as well.
Create a visualization that inspects or compares the L1 vs. L2 regularizers by showing the relative weights of each feature for each norm.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.