Giter Club home page Giter Club logo

machine-march-madness's Introduction

DEPENDENCIES:
theano: http://deeplearning.net/software/theano/
numpy
scipy
(optional) matplotlib: http://matplotlib.sourceforge.net/

Note: maccam912 reports that the link above is currently down (Feb 28, 2012) but says, "I see you can still get it with 'easy_install Theano' or downloading it from http://pypi.python.org/pypi/Theano or the repo on github. Just a heads up. I don't use git or github often so I wasn't sure if the issues section was the place to point this out or not."


DATA:
Right now, only the aggregate data is really being used.  To check that 
you can load the data properly, run

> python march_madness_data.py

this should output the following:
Skipped 1426 entries due to UNK
After loading simple data
2006-2007: 5125 games
2007-2008: 5248 games
2008-2009: 5332 games
2009-2010: 5363 games
After removing tournament games
2006-2007: 5047 games
2007-2008: 5161 games
2008-2009: 5237 games
2009-2010: 5269 games

So it loaded about 5000 games from each of 4 past seasons.  

If you'd like to dig into the full data, look for 

def load_full_data(self):

in march_madness_data.py.


BRACKET:
We don't have data that specifically identifies which games were a part 
of the tournament, so we do it programmatically.  Most of the code is
called automatically when you make a MarchMadnessData object.  To see the
results, you can run

> python bracket.py

This should output the filled-in tournament bracket for previous seasons.
It should look like this:

2008-2009
nav---nav---nav---nav---nav---nav
raa     |     |     |     |     |   
lav---lav     |     |     |     |   
bav           |     |     |     |   
              |     |     |     |   
gaj---gaj---gaj     |     |     |   
aac     |           |     |     |   
wao---wao           |     |     |   
iae                 |     |     |   
                    |     |     |   
oae---oae---oae---oae     |     |   
mbq     |     |           |     |   
max---max     |           |     |   
cbg           |           |     |   
              |           |     |   
sci---sci---sci           |     |   
scc     |                 |     |   
aar---aar                 |     |   
tad                       |     |   

...

The three-letter code mappings to team names are in ./data/YahooTeamCodeMapping.csv.


LEARNING:
There is also starter code for learning, but this is still in progress.

The simplest thing to try is to run

> python learn_synthetic.py

This will run learning with the simplest model, on synthetic data.  The first run
will take a bit longer at startup, because theano is doing the symbolic differentiation.


You can then move on to

> python learn_real.py

This will learn, but on the real data now.  This code is not finished, but it should be
enough structure to get you started.

In model.py, you can see three different models, of increasing level of complexity.  You
can select between these in the learn_*.py scripts.


Some TODOs for the ambitious:

- Load the full data and verify it against the aggregate data.

- Set up a proper validation/testing framework, so we can evaluate different methods
  properly.  Perhaps we want to do leave-one-out cross validation.

- Try different objective functions in the theano models -- what do we actually
  want to optimize?

- Think about how to better include pace of the game

- Improve the optimization (maybe using momentum, LBFGS, or conjugate gradients?)

machine-march-madness's People

Contributors

dtarlow avatar jaspersnoek avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.