Given the recent advancement in technology, analytics through Machine Learning and Artificial Intelligence has become crucial in many areas, including sports. A considerable amount of data is now collected and used to predict scores and future trends, make managerial decisions and analyse the performance of the players in various sports, using Machine Learning techniques.
We tried to use these advancements to analyse and explore the Indian Premier League (IPL).
Our project has two aims:
- Predict the outcome of matches, i.e. the winner of the match, based on previous years’ data.
- Predict the scores during certain intervals of the game. For example, the score in overs 1-15, 1-10 or 3-8 etc.
To accomplish this, we used algorithms such as Decision Trees, Linear Regression, Random Forest and Support Vector Machines.
The file structure is as follows:
-
Dataset/: This directory contains all the data files, stored weights and the embedding matrix.
- Dataset/Embeddings/: Contains the embedding matrix
- emb_player_vec_dict: contains the embedding matrix and vector information.
- emb_player_stoi_dict: contains the string to index information for player names.
- del_to_emb_final: stores the mapping of player names from deliveries.csv to embedding matrix.
- emb_player_vec_dict: stores the embedding vector player_name wise.
- not_found: contains the list of players who had different names in both the datasets.
- Dataset/cricksheet_ipl_csv: Data collected from Cricsheet's website in .csv format.
- Dataset/cricsheet_ipl_yaml: Data collected from Cricsheet's website in .yaml format.
- Dataset/ipl_stats: Data scraped from the IPL's official website for season wise player points.
- Dataset/kaggle_data: contains matches.csv (match-wise information about every IPL season from 2008 to 2020) and deliveries.csv (delivery-wise information about every IPL match).
- Dataset/Ball_by_ball.csv: contains modified deliveries.csv with 7-dimensional vectors instead of player names.
- Dataset/Embeddings/: Contains the embedding matrix
-
Reports/:
- Project_Presentation.pptx: Project presentation
- Project_Report.pdf: A detailed project report
-
plots/: Folder containing plots created during Exploratory Data Analysis.
-
Ball_by_Ball_Regression.ipynb: Code to perform regression on ball-by-ball dataset. It predicts the score for each ball.
-
Create_ball_by_ball_dataset.ipynb: Converts the categorized features (batsman, non-striker and bowler) in deliveries.csv into their corresponding vectors.
-
Create_variable_balls_dataset.ipynb: Groups together variable number of balls from the deliveries.csv.
-
EDA.ipynb: Contains the Exploratory Data Analysis performed on the collected data.
-
HyperparameterTuning_Regression.ipynb: Regression based models with hyperparameter tuning for the best model.
-
Player_Embedding_Vectors.ipynb: Creates player embeddings based on the player-wise performance scraped from IPL's official website.
-
match_classification.ipynb: Classification models with hyperparameter tuning for the match result prediction task.
- Mudit Dhawan
- Samik Prakash
- Shivangi Dhiman