Giter Club home page Giter Club logo

nba-data-science's Introduction

Predicting Game Outcomes from 2019 NBA Season

Date: 06/18/21

Team: Bryce Randolph, Jeannie Davis, Harsandeep Singh, Kevin Robell


Introduction

Our group wanted to try and find out how well we could predict the outcome of NBA games. We decided to use a NBA games dataset that we found on Kaggle which contains all the information including statistics, teams, and dates from 2004-2020. The hope is that we can create a reasonably accurate model and look at the effectiveness of different in-game counting stats to predict the winner.

Selection of Data

Our dataset is focused on the 2019 season from this dataset of NBA games found in the games.csv file. We are choosing to make the program modular enough that other seasons can also be analyzed, but only a single season is chosen at a time to stay within the goals of the project and avoid having to deal with time series. We’ve chosen to focus on the 2019 season because it is the most recent full season included in the dataset. We first dropped all irrelevant predictors including GAME_DATE_EST, GAME_STATUS_TEXT, HOME_TEAM_ID, VISITOR_TEAM_ID, SEASON, TEAM_ID_home, PTS_home, PTS_away and TEAM_ID_away. We made sure to remove the points scored for both teams because that information would make the outcome of the game obvious. Instead, we will focus on the in-game counting stats from each game which includes rebounds, field goal percentage, and assists. The thing we will be trying to predict will be the winner of the game as represented by the target HOME_TEAM_WINS. This is a binary column where 1 means the home team won and 0 means they lost.

Methods

The machine learning methods we tested out were KNN classification and linear regression. As we predicted the KNN classification initially ended up giving us better results in early testing so we decided to stick with that method for a while. However, with more tweaking to the code, we found that linear regression was slightly better. Our final choice of machine learning method after much testing was linear regression.

  • Numpy, Pandas, Matplotlib, and Seaborn for data analysis and visualization
  • Scikit-learn for inference
  • Github and Google Drive for group collaboration and version control

The Models

  • KNN Classification: We tested the k values from 1-30 to find the value that gave us the lowest error rate.
  • Linear Regression: We tried to predict the home team's point differential and that prediction then was converted into win/loss predictions.

Results

We were able to get the error rate quite low(>less than 17%) and ended up with an accuracy rate of more than 80% in predicting the winner of a given NBA game. So we met our initial goal by having a reliable and accurate prediction. We did this using the stats of a game that had already occurred, so we won’t be making big money on sports betting anytime soon, but the proof of concept is there. The results are shown in the following confusion matrices.

KNN Classification Results

KNN Classification Confusion Matrix

Linear Regression Results

Linear Regression Confusion Matrix

Discussion

We found that it was best to use a K value of at least 28 using the elbow method as it produced the lowest error rate in our testing. We initially thought that the KNN classification would be the best method to use, but later switched to linear regression after comparing a few confusion matrices. We also did some digging online, specifically on Github, and found that several others have attempted a similar project with worse results. We believe the nature of how we used the data, specifically using one game's data at a time, rather than averaging season stats and comparing them, is what led to the high degree of accuracy in our project. Best K Value Graph

Summary

This NBA game winner prediction project was a success in our eyes, as we accomplished what we initially set out to do. Using predictors like rebounds, field goal percentage, and assists, we used linear regression and knn classification as supervised methods of machine learning to determine the winner of a given NBA game using the included dataset. We believe that with more tuning and tinkering, we could make the error rate lower over time. That tuning would involve working to eliminate predictors that don't improve the accuracy of the models and including polynomial features.

References

nba-data-science's People

Contributors

apainintheneck avatar jeanniedavis avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.