Giter Club home page Giter Club logo

topmodel's Introduction

topmodel

topmodel is a service for evaluating binary classifiers. It comes with built-in metrics and comparisons so that you don't have to build your own from scratch.

You can store your data either locally or in S3.

Metrics

Here are the graphs topmodel will give you for any binary classifier:

Precision/recall curve

Precision/recall curve

ROC (Receiver operating characteristic) curve

ROC curve

We also use bootstrapping to show the uncertainty on ROC curves and precision/recall curves. Here's an example:

ROC curve with bootstrapping

Marginal precision

The idea here is that among all items with score 0.9, you expect 90% of them to be in the target group (marked 'True'). This graph compares the expected rate to the actual rate -- the closer it is to a straight line, the better.

Marginal precision

Brier decomposition

These are a set of metrics that measure, among other things, how close the marginal precision is to a straight line. Read more about decomposing the Brier score

Brier

Score distribution

Plots the distribution of scores for all instances and only for instances labelled 'True'.

Score frequencies

Using topmodel locally

topmodel comes with example data so you can try it out right away. Here's how:

  1. Create a virtualenv

  2. Install the requirements: pip install -r requirements.txt

  3. Start a topmodel server:

    ./topmodel_server.py
    
  4. topmodel should now be running at http://localhost:9191.

  5. See a page of metrics for some example data at http://localhost:9191/model/data/test/my_model_name/

You can now add new models for evaluation! (see "How to add a model to topmodel" below for more)

Using topmodel with S3

It's better to store your model data in a S3 bucket, so that you don't lose it. To get this working:

Create a config.yaml file:

cp config_example.yaml config.yaml

and fill it in with the S3 bucket you want to use and your AWS secret key and access key. topmodel will automatically find models in the bucket as long as they're named correctly (see "How to add a model to topmodel")

Then start topmodel with

./topmodel_server.py --remote

How to add a model to topmodel

  1. Create a TSV with columns 'pred_score' and 'actual'. Save it to your_model_name.tsv. The columns should be separated by tabs. In each row:

    • actual should be 0 or 1 (True/False also work)
    • pred_score should be the score the model determined.
    • See the examples in example_data/
    • For example:
    actual	pred-score
    False	0.2
    True	0.8
    True	0.7
    False	0.3
    
  2. Copy the TSV to S3 at s3://your-s3-bucket/your_model_name/scores.tsv, or locally to data/your_model_name/scores.tsv

  3. You're done! Your model should appear at http://localhost:9191/ if you reload.

Developing topmodel

We'd love for you to contribute. If you run topmodel with

./topmodel_server.py --development

it will autoreload code.

There's example data to test on in data/test.

Authors

License

Copyright 2014 Stripe, Inc

Licensed under the MIT license.

topmodel's People

Contributors

julia-stripe avatar jvns avatar ryan-stripe avatar geohot avatar

Watchers

Joseph Misiti avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.