This project scrapes fight statistics and fighter details from ufcstats.com for all historical UFC events. The data
folder contains two csv files, fight_hist.csv
and fighter_stats.csv
, data for all fights and fighters recent as of 09/14/2019 (UFC Fight Night 158).
A writeup of an analysis done with this data and Neo4j can be found here: https://towardsdatascience.com/ranking-the-best-ufc-fighters-using-pagerank-and-neo4j-5385805b4515
All code used in the writeup above is in the analysis
folder.
Additionally, the scripts
folder contains a file called ufc_scraper.py
which has functions used for both the initial scraping and functions used to update the data as new UFC events occur.
Usage examples can also be found in the notebook scrape_fights.ipynb
in the scripts
folder.
fight_hist = get_all_fight_stats()
fighter_details = get_fighter_details(fight_hist.fighter_url.unique())
where fight_hist
is a dataframe of fight statistics and fighter_url
being a column of urls linking to individual fighter pages.
fight_hist_updated = update_fight_stats(fight_hist)
where fight_hist
is a dataframe of already saved fight statistics. fight_hist_updated
will be a dataframe of updated fight stats up to the most recent UFC card.
fighter_stats_updated = update_fighter_details(fight_hist_updated.fighter_url.unique(), fighter_stats)
where fight_hist_updated
is a dataframe of updated fight stats and fighter_url
being a column of urls linking to individual fighter pages. fighter_stats
is a dataframe of already saved fighter details. fighter_stats_updated
will be an updated dataframe of fighter details for all fighters who have made an appearance in the UFC.