A simple data analysis of players of World Cup 2018 - more of a pandas exercise rather than a serious data analysis:)
Install python requiremenets:
pip install -r requirements.txt
And start a notebook server:
jupyter notebook
Go to http://localhost:8888/notebooks/wc2016.ipynb
Get the players data from fifa world cup 2018 official announcement pdf
Export the data (tables) using tabula and store it as csv.
And then do a simple analysis for simple stats like:
-
age
- average age of each national team
- max (oldest), min (youngest)
- age distribution
-
height
- average height of each team
- max (tallest), min (shortest)
-
bmi
- of each player (mass/height^2)
- average of team
- max (fattest), min(lightest)
- bmi distribution
- bmi distribution by position (GK, DF, MF, FW)
-
How many players play in domestic league per team
-
Club representation What are the clubs that have the most players ? How do the numbers change as the tournament progresses (group-stage, round of 16, quarter finals ,..)
-
Possible expansions (need merge with other datasets):
-
birthday paradox! (?) per match ? (what is the chance in 22 players to have the same birthday! - verify theoritical vs observation (group matches)) (needs group matches info)
-
panini misses! who are the players that panini missed or wrongly included? (needs panini album dataset - can be found here (?)
-
Team power ranking according to club ranking (needs club world ranking data)
-
Team value already here
-