Giter Club home page Giter Club logo

workflows_group_306's Introduction

Workflows_Group_306

  • authors: Frank Lu, Simardeep Kaur, Tani Barasch

About

In this project we attempt to predict NFL game winners using classification algorithems, Random Forest and Logistic Regression, in order to test the hypothesis that ELO ratings can be used to predict the outcome as presented by the website FiveThirtyEight.com in their 'NFL Prediction Game'.

The website FiveThirtyEight launched a “prediction game” for NFL games this year, in which the readers assign a probability to to each teams wining chances for every game, competing with fivethirtyeight’s ELO based prediction system.

ELO ranking is a relative ranking system in which performing better then what the ELO would predict leads to an increase in ranking and performing worse that would be expected leads to decrees. Where predictions and expectations are based on the difference between two players/teams ELO ratings. A more general explanation can be found here by Singingbanana on youtube.

In order to test whether or not ELO is a valid method by which to predict NFL game outcomes, we will attempt to use historic ELO ratings to predict the winners of the 2019-2020 season games which has recently ended. Using this data we train two machine learning classification algorithems: logistic regression and random forest to test the validity of ELO ratings as a predictor.

The data for this project will be the same data fivethirtyeigh uses to generate their own predictions which can be found in this github repo.

Report

The final report markdown can be found here

Usage

To run this analysis, clone this github repo, install the dependencies as listed below and run the following code in the command line/terminal from the root directory of this project:

Dependencies

  • Python 3.7.3 and Python packages:
    • docopt==0.6.2
    • requests==2.22.0
    • pandas==0.24.2
    • scikit-learn==0.22.1
  • R version 3.6.1 and R packages:
    • knitr==1.26
    • tidyverse==1.2.1
  • Linux 18.04 LTS
    • Make==4.1

Scripts

To run with a Docker container:

  • First, we clean all the intermediate files from previous execution, run the command below:
docker run --rm -e PASSWORD="ppp" -p 8787:8787 -v $(pwd):/mnt tbarasch/g306_522 make -C /mnt clean
  • Then create the report by running the command below from terminal:
docker run --rm -e PASSWORD="ppp" -p 8787:8787 -v $(pwd):/mnt tbarasch/g306_522 make -C /mnt all

To run on your own system:

  • Clone this repo, and run the command below from the repo root in terminal:
make all
  • To clean all the intermediate files, run the command below:
make clean

Refrences

de Jonge, Edwin. 2018. Docopt: Command-Line Interface Specification Language. https://CRAN.R-project.org/package=docopt.

FiveThirtyEight. 2019. “Fivethirtyeight Data Repository.” fivethirtyeight. https://data.fivethirtyeight.com/.

Keleshev, Vladimir. 2014. Docopt: Command-Line Interface Description Language. https://github.com/docopt/docopt.

McKinney, Wes. 2010. “Data Structures for Statistical Computing in Python.” In Proceedings of the 9th Python in Science Conference, edited by Stéfan van der Walt and Jarrod Millman, 51–56.

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.

R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Van Rossum, Guido, and Fred L. Drake. 2009. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.

Wickham, Hadley. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.

Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.

workflows_group_306's People

Contributors

tbarasch avatar franklu2014 avatar simardeepkaur avatar

Watchers

James Cloos avatar  avatar

workflows_group_306's Issues

Milestone 2 General Feedback

  • Good test coverage in your scripts! Although there are a few functions without tests.
  • Your final report draft is good, the question, analysis and results are clear, but elaborate more on future directions: which models could be more robust?

milestone 4 work

  • makefile graph - added by Simar
  • create docker for whole project
  • address feedback

Visualizations Feedback!

  • Below are some thoughts about the visualizations in this report and some suggestions for improvement, and notes when things are done well. You should implement these in milestone4
    • Great job adding figure numbers and captions! Though your captions could have been a bit more informative, overall this is a very good practice. Also, both figures are called Figure 1
    • Figure 1
        • This is a really effective plot, great job!
        • I would make the plot a bit smaller, and the textsize a bit bigger so it's proportionate to the rest of your report
    • Figure 2
        • consider combining Figure 1 and 2 together in a side-by-side way and call them A, B so it's easier to interpret, read, and takes up less space
    • I think your figures are very reasonable for your project and research question. Nice work!

Is ELO really THAT bad?

We find that both models achieve similar results, with 0.771% accuracy

for the logistic regression and 0.767% accuracy for the random forest

classifier. Which overall is a pretty unreliable prediction model, casting doubt over the method presented by FiveThirtyEight.

0.8% ? That is pretty low... are you sure there isn't a typo in your report ?

Repo strucure

You can improve your repo structure (you have CSV files and figures in your scripts folder, for example)

Marmap library

After re-running your EDA script, you in fact load a marmap library that isn't used.

Code - commenting

Continues thing - try to make sure you write comments into your code as you go, its pretty hard to read someone else code if you don't know exactly what was going on, and comments can help a lot.

peer feedback : add precision and recall

Since the objective is to find out whether ELO can be a good predictor for game results, and we have 3 classes, it's clearer for the readers if we show precision and maybe recall.

Milestone 3 General Feedback

Great job overall! Fix all remaining issues for milestone 4, and ensure your analysis runs on all computers before submitting.

  • Missing test and author info on eda.py

Some details about your report:

  • Your equations aren't rendering properly.
  • As discussed in the lab, elaborate on discussion and future directions (why is ELO rating not a good predictor?)

peer Feebdback

  • Since the data is based on football and elo ratings related to it , so it will be good idea to add some information about how elo rating is related to the game.

Final Report Format

Always render a .pdf of .md version of your report so it can be read in Github.

Final Feedback

Great work overall!

Mechanics:
Your dependency graph is not in your readme. Your use of issues was very basic.

Quality:
Missing description on top of docker file with authors and explaining what it does.

Vis:
Most feedback was not addressed.

Writing:
Equations still did not render properly.

Milestone 1 Feedback: minor fixes

  • Proofreading is required to correct grammar mistakes and typos.
  • Pay attention to communicating through Github issues and writing more distinct commit messages.

Communication

Use GitHub issues to communicate and close after addressing them.

Milestone 3 Preliminary feedback

  • No big issues that need to be fixed for Milestone 4, good job!
  • Remember to use issues for communication on this last milestone. and close the existing issues after addressing them.
  • Elaborate on your conclusion/discussion, your future directions are still vague. Why could the suggested methods work better?
  • Using wrong release link in personal submission.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.