Giter Club home page Giter Club logo

yellowbrick's Introduction

Yellowbrick

Build Status Coverage Status Total Alerts Language Grade: Python PyPI version Documentation Status Black DOI JOSS Binder

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Banner

What is Yellowbrick?

Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the scikit-learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines scikit-learn with matplotlib in the best tradition of the scikit-learn documentation, but to produce visualizations for your machine learning workflow!

For complete documentation on the Yellowbrick API, a gallery of available visualizers, the contributor's guide, tutorials and teaching resources, frequently asked questions, and more, please visit our documentation at www.scikit-yb.org.

Installing Yellowbrick

Yellowbrick is compatible with Python 3.4 or later and also depends on scikit-learn and matplotlib. The simplest way to install Yellowbrick and its dependencies is from PyPI with pip, Python's preferred package installer.

$ pip install yellowbrick

Note that Yellowbrick is an active project and routinely publishes new releases with more visualizers and updates. In order to upgrade Yellowbrick to the latest version, use pip as follows.

$ pip install -U yellowbrick

You can also use the -U flag to update scikit-learn, matplotlib, or any other third party utilities that work well with Yellowbrick to their latest versions.

If you're using Anaconda (recommended for Windows users), you can take advantage of the conda utility to install Yellowbrick:

conda install -c districtdatalabs yellowbrick

Using Yellowbrick

The Yellowbrick API is specifically designed to play nicely with scikit-learn. Here is an example of a typical workflow sequence with scikit-learn and Yellowbrick:

Feature Visualization

In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm and then returns them ranked as a lower left triangle diagram.

from yellowbrick.features import Rank2D

visualizer = Rank2D(
    features=features, algorithm='covariance'
)
visualizer.fit(X, y)                # Fit the data to the visualizer
visualizer.transform(X)             # Transform the data
visualizer.show()                   # Finalize and render the figure

Model Visualization

In this example, we instantiate a scikit-learn classifier and then use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.

from sklearn.svm import LinearSVC
from yellowbrick.classifier import ROCAUC

model = LinearSVC()
visualizer = ROCAUC(model)
visualizer.fit(X,y)
visualizer.score(X,y)
visualizer.show()

For additional information on getting started with Yellowbrick, view the Quick Start Guide in the documentation and check out our examples notebook.

Contributing to Yellowbrick

Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!

If you are interested in contributing, check out our contributor's guide. Beyond creating visualizers, there are many ways to contribute:

  • Submit a bug report or feature request on GitHub Issues.
  • Contribute a Jupyter notebook to our examples gallery.
  • Assist us with user testing.
  • Add to the documentation or help with our website, scikit-yb.org.
  • Write unit or integration tests for our project.
  • Answer questions on our issues, mailing list, Stack Overflow, and elsewhere.
  • Translate our documentation into another language.
  • Write a blog post, tweet, or share our project with others.
  • Teach someone how to use Yellowbrick.

As you can see, there are lots of ways to get involved and we would be very happy for you to join us! The only thing we ask is that you abide by the principles of openness, respect, and consideration of others as described in the Python Software Foundation Code of Conduct.

For more information, checkout the CONTRIBUTING.md file in the root of the repository or the detailed documentation at Contributing to Yellowbrick

Yellowbrick Datasets

Yellowbrick gives easy access to several datasets that are used for the examples in the documentation and testing. These datasets are hosted in our CDN and must be downloaded for use. Typically, when a user calls one of the data loader functions, e.g. load_bikeshare() the data is automatically downloaded if it's not already on the user's computer. However, for development and testing, or if you know you will be working without internet access, it might be easier to simply download all the data at once.

The data downloader script can be run as follows:

$ python -m yellowbrick.download

This will download the data to the fixtures directory inside of the Yellowbrick site packages. You can specify the location of the download either as an argument to the downloader script (use --help for more details) or by setting the $YELLOWBRICK_DATA environment variable. This is the preferred mechanism because this will also influence how data is loaded in Yellowbrick.

Note: Developers who have downloaded data from Yellowbrick versions earlier than v1.0 may experience some problems with the older data format. If this occurs, you can clear out your data cache as follows:

$ python -m yellowbrick.download --cleanup

This will remove old datasets and download the new ones. You can also use the --no-download flag to simply clear the cache without re-downloading data. Users who are having difficulty with datasets can also use this or they can uninstall and reinstall Yellowbrick using pip.

Citing Yellowbrick

We would be glad if you used Yellowbrick in your scientific publications! If you do, please cite us using the citation guidelines.

Affiliations

District Data Labs NumFOCUS Affiliated Project

yellowbrick's People

Contributors

bbengfort avatar cjmorale avatar drwaterman avatar gary-mayfield avatar jkeung avatar kautumn06 avatar lauralorenz avatar lwgray avatar mattandahalfew avatar mchestnut91 avatar mgarod avatar mmorrison1670 avatar morganmendis avatar naresh-bachwani avatar navarretedaniel avatar ndanielsen avatar nealhumphrey avatar nickpowersys avatar pbs929 avatar pdamodaran avatar pdeziel avatar percygautam avatar pkaf avatar pswaldia avatar pvomelveny avatar ralle123 avatar rebeccabilbro avatar stefmolin avatar vladskripniuk avatar zjpoh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yellowbrick's Issues

Feature Analysis: Rank 2D

Seaborn has an example of this, but let's create a function that will immediately give the rank 2d in advance of the SPLOM.

Develop API (yellowbrick/scikit-learn)

Yellowbrick needs a standard API that coordinates with and/or corresponds to the Scikit-learn API so that each visualization tool accepts parameters and produces outputs in a systematic way

Text Visualizations

Create a new module text and a base class TextVisualizer for performing text visualization.

Best fit curves

We have linear best fit lines, but will need quadratic and exponential too.

Base visualization structure

Accepts as input a model + data, and might also build a model in return. We can use this class as a stub for general functionality, but subclassed for specifics.

Note that our primary API will be functions not classes. Unlike Seaborn, however, potentially we can create classes that are callable.

Update documentation

specifically adding the roadmap and end goal description and depiction of the yellowbrick/scikit-learn workflow (maybe something like baleen has)

Fitted vs. Unfitted API

Since we are moving towards thinking about this as a fitted model API (e.g. a version for more experienced Scikit-Learn users who will naturally want to have more control over their model fitting and tuning), we need to refactor the ROCAUC and ResidualsPlot classes so that they are not doing the fitting and predicting, but instead taking the results of fit and predict (as well as the split train/test data for residuals, and the y_pred for roc-auc).

Visual Pipeline

Create a visual pipeline class that extends the scikit-learn pipeline class to ensure that visualizers get called correctly.

Update all the docstrings

Now that we are in sphinx land we should update all the docstrings to conform to sphinx markup and also check for accuracy.

  • Good way to mark items as interface? Relevant?
  • Good way to autodoc params from a docstring, not the docstring itself or the method name, for cases when I want to DRYly autodoc-inject params from an interface?

Improve Parallel Coordinates

See #50 for a more detailed discussion.

There are several things that need to be done to improve the parallel coordinates visualizer:

  • Benchmarking, it is currently SLOW
  • Add a draw_instance method so that instances can be added to the figure at any time
  • Add DataFrame support so that the visualizer can accept either an ndarray or a DataFrame as input.
  • Create a subclass NormalizedParallelCoordinates that normalizes the data to the space 0 to 1 before drawing the coordinates.
  • Add subsampling of instances to reduce clutter and improve performance
  • Add fast vs. slow drawing methods for performance
  • Add alpha so that you can see instances through the other lines
  • Create an optimization technique that reorders the columns such that the overlap of two instances by the same class is minimized

There are probably several more things and there are comments in the Parallel Coordinates codebase as well.

Fix Travis tests

broke when trying to install matplotlib/scipy/numpy/sklearn etc.

Grid Search Viz

Create a package for visualizing grid search results. How to generalize this?

Create Base Visualizer

Create a class called VisualizerMixin in a module called base.py at the root of yellowbrick.

The intent is that visualizers should extend Scikit-Learn's BaseEstimator, TransformerMixin and our VisualizerMixin classes - giving it the following required methods:

  • fit
  • draw
  • fit_draw

The idea is that fit will be passed X data and maybe y data and will prepare the data for drawing, and then draw will actual conduct the drawing.

The __init__ method should take styling arguments. So things like size, color, whether or not to save to a file, markers, line stuff, etc.

NOTE: check if the transformer mixin extends from BaseEstimator and if it does, then also subclass VisualizerMixin from BaseEstimator - to allow us use of set_param and get_param on rendering variables.

Best Fit Lines

Our visualization uses best fit lines, and we need a mechanism to automatically add these to the scatter plots.

Additional Datasets

Need to get some additional datasets for testing. We currently have two classification datasets and one regression one. We need:

  • at least one more regression dataset
  • at least one more classification dataset that's not a binary classification.

Additional Datasets

Need to get some additional datasets for testing. We currently have two classification datasets and one regression one. We need:

  • at least one more regression dataset
  • at least one more classification dataset that's not a binary classification.

Finish the palettes listing

This is a great project for anyone who wants a way to quickly and easily contribute to Yellowbrick! Basically what we need is a repair and extension of the palettes we've added in yellowbrick.colors.palettes to make sure everything looks great. This is a high impact task that should be relatively easy!

  • Check over the colors in the palettes notebook and make sure that the start of the color palette (for unordered color maps, e.g. not Paired or Sequential) has the following rough sequence: blue, green, red, maroon, yellow, cyan and if they're longer than 6 long, then black (anything else can follow).
  • Go to the colorbrewer page and make sure that all the palettes there are represented.
  • Create a sequential colormap list with mplcol.ListedColormap as follows: add a new variable called _COLOR_MAPS with named lists of the color maps, then create a new top level variable COLOR_MAPS that converts the raw lists into a ListedColorMap (e.g. is a dictionary of names to ListedColorMap objects. Alternatively simply create a function color_sequence that behaves like color_palette but returns a ListedColorMap.

Happy to discuss this and any questions.

Multiplot API

Seaborn has a FacetGrid - do we need something similar for generating multiple plots?

Most informative features

Create a visualization that inspects or compares the L1 vs. L2 regularizers by showing the relative weights of each feature for each norm.

Colors and rcparams

We currently have moved over the Seaborn matplotlib rc params modifications over to yellowbrick; however I'm going to constrain this as follows:

  • for font/size/style etc I will use the "notebook context" as the unchangeable default and pass it back to the user to make other contextual changes (without yellowbrick help); we can bring this back later if needed.
  • for colors I'm going all in on the ColorPalette object, which I think is awesome.
  • rcmod will happen on import for the notebook style and the default colors.

Add InformativeFeatures class

Need to add a class to enable the user to evaluate the features that were most informative for a fitted model. This class will inherit from ScoreVisualizer

Bad link in README

in readme there is a broken link that should lead to “quick start guide”, in “Using YellowBrick” section.

Best,

Feature Selection Visualization

How do we adapt radviz, parallel coords, splom, etc. so that our tool is better/more capable than seaborn or pandas.

One idea is to perform automatic feature analysis and show which features are most visually relevant to the models. In that case we would pull the relevant code from the libraries and adapt it to our use.

Getting Started - README

We need documentation to discuss how to get started with Yellowbrick in the README. We're hoping for an end-to-end walk through that starts with pip install and then discusses how to get setup for using the library in other code bases, particularly jupyter notebooks (matplotlib inline) but also for getting started to do development on the library.

Single feature analysis tools (augmenting Rank1D)

Radviz, parallel coordinates, and Rank2D are already implemented. We should expand the feature visualization toolset to include single feature analysis tools:

  • boxplots
  • histograms
  • violinplots

Create a SingleFeatureVisualizer class whose poof method takes as input the name(s) or position(s) of one of the features. Display the appropriate visualization allowing multiple poof calls with different features. Make sure to integrate this with Rank1D.

Open questions: do we need to store the data? Or can we create all visualization and then display on demand, or is there some middle ground?

Note: similar to parallel coordinates and radviz, these need to be native implementations, emulating other libs (seaborn, pandas) as necessary.

Style Management

Create a scheme similar to Seaborn for creating a unified style and context system that sits on top of mpl.RCParams (so that every plot generated by our tool looks the same).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.