Giter Club home page Giter Club logo

predicting-football-match-outcome-using-machine-learning's Introduction

Predicting Football Match Outcome using Machine Learning

I have used dataset from two sites for this project 1.https://www.kaggle.com/hugomathien/soccer
2.http://football-data.co.uk/data.php

The dataset from kaggle website was in sqlite format but I was not able to upload the file in sqlite so i have uploaded the csv files for all the tables.

This dataset has tables of Country, League, Match, Player, Player Attributes, Team ,Team Attributes and sequences. It has information of more than 25000 matches, 10000 players, 11 European Countries with their lead championship from 2008 to 2016, Players and Teams attributes sourced from EA Sports' FIFA video game series, betting odds from up to 10 providers

I have performed Exploratory Data Analysis and used this dataset for it.

Later I have downloaded data from the football-data.co.uk website which had even more relevant information which i have used to perform prediction.

I have performed Logistic Regression, Naive Bayes and Support Vector Machine algorithms on the dataset with SVM giving the highest accuracy of 61.29%

predicting-football-match-outcome-using-machine-learning's People

Contributors

prathameshtari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

predicting-football-match-outcome-using-machine-learning's Issues

MultinomialNB Cannot accept negative Alpha

Hi,

I was trying to run your code on my machine with Phyton 3.7 and at step Multinomial Naive Bayes¶ i get the following error

ValueError Traceback (most recent call last)
in
4 for i in range(-1000,1000,50):
5 clf1 = MultinomialNB(alpha=i)
----> 6 clf1.fit(X_train,y_train)
7 clf1.fit(X_train_2,y_train)
8 scores = cross_val_score(clf1, X_train, y_train, cv=10)

c:\users\johnm\appdata\local\programs\python\python37\lib\site-packages\sklearn\naive_bayes.py in fit(self, X, y, sample_weight)
609 dtype=np.float64)
610 self._count(X, Y)
--> 611 alpha = self._check_alpha()
612 self._update_feature_log_prob(alpha)
613 self._update_class_log_prior(class_prior=class_prior)

c:\users\johnm\appdata\local\programs\python\python37\lib\site-packages\sklearn\naive_bayes.py in check_alpha(self)
471 if np.min(self.alpha) < 0:
472 raise ValueError('Smoothing parameter alpha = %.1e. '
--> 473 'alpha should be > 0.' % np.min(self.alpha))
474 if isinstance(self.alpha, np.ndarray):
475 if not self.alpha.shape[0] == self.feature_count
.shape[1]:

ValueError: Smoothing parameter alpha = -1.0e+03. alpha should be > 0.

It would Appear that the MultinomialNB function cannot accept a negative alpha value, how did you manage to run the code with a negative Alpha

Error in Code

hii broo..i cannot get proper table.team column. the column shows all true values instead of each team name what should i do?? Plzz help me. i attached screenshot below.

Screenshot (14)

Error

AT

In [38]:
#Extract necessary features from the data file
feature_table = df.iloc[:,:23]
print(table)

#Full Time Result(FTR), Home Shots on Target(HST), Away Shots on Target(AST), Home Corners(HC), Away Corners(AC)
feature_table = feature_table[['HomeTeam','AwayTeam','FTR','HST','AST','HC','AC']]
print(feature_table)
#Home Attacking Strength(HAS), Home Defensive Strength(HDS), Away Attacking Strength(AAS), Away Defensive Strength(ADS)
f_HAS = []
f_HDS = []
f_AAS = []
f_ADS = []
for index,row in feature_table.iterrows():
f_HAS.append(table[table['Team'] == row['HomeTeam']]['HAS'].values[0])
f_HDS.append(table[table['Team'] == row['HomeTeam']]['HDS'].values[0])
f_AAS.append(table[table['Team'] == row['AwayTeam']]['AAS'].values[0])
f_ADS.append(table[table['Team'] == row['AwayTeam']]['ADS'].values[0])

feature_table['HAS'] = f_HAS
feature_table['HDS'] = f_HDS
feature_table['AAS'] = f_AAS
feature_table['ADS'] = f_ADS
feature_table

APPAER THIS ERROR AND I M NOT PROCEDED


IndexError Traceback (most recent call last)
in
10 f_ADS = []
11 for index,row in feature_table.iterrows():
---> 12 f_HAS.append(table_16[table_16['Team'] == row['HomeTeam']]['HAS'].values[0])
13 f_HDS.append(table_16[table_16['Team'] == row['HomeTeam']]['HDS'].values[0])
14 f_AAS.append(table_16[table_16['Team'] == row['AwayTeam']]['AAS'].values[0])

Random Forest Classifier gives 100% accuracy

I applied the random forest algorithm on merged_dataset.csv. Out of 6080 rows, I used 80% rows for training and the remaining 20% for testing. I found that the trained model predicted target with 100% accuracy. I take attribute FTR as a target.

CODE :

`from sklearn.preprocessing import LabelEncoder
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.ensemble import RandomForestClassifier

dataframe = pd.read_csv('./dataset/Merged_dataset.csv')
print(dataframe.head())

df = dataframe.apply(LabelEncoder().fit_transform)
print(df.head())

target = np.array(df['FTR'])
features= df.drop(['id','FTR','FTAG','FTHG'], axis = 1)
features = np.array(features)

#Split the data into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(features, target, test_size = 0.20, random_state = 42)

model = RandomForestClassifier()
model.fit(train_features, train_labels)

predicted_labels = model.predict(test_features)

print("actual Test labels")
print(test_labels)
print("")
print("predicted test labels")
print(predicted_labels)

#calculate accuracy
count = 0
totalCount = len(predicted_labels)
for i in range(len(test_labels)):
if(predicted_labels[i] == test_labels[i]):
count = count+1

print("Accuracy : "+str((count/totalCount)*100)+" %")
`

OUTPUT :

image

isn't it too unreal to have 100% percent accuracy? If I applied Logistic Regression then model's accuracy is 68%

will you correct what is wrong in my code? or what is the concept that I am missing while training my model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.