Giter Club home page Giter Club logo

pro-football-reference-web-scraper's Introduction

Hi there ๐Ÿ‘‹

I'm Matt, and I'm a senior in Columbia College studying computer science. Before coming to Columbia, I grew up in Los Angeles, California. In my free time, I like watching football and basketball (Go Nets!), listening to fantasy football podcasts (8x champion), and playing Spikeball with my friends. I am looking to contribute to any sports or sports analytics-based projects.

Twitter URL

pro-football-reference-web-scraper's People

Contributors

dependabot[bot] avatar ingoldsby avatar mjk2244 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pro-football-reference-web-scraper's Issues

players with similar names player_game_log.py

player_game_log.py currently cannot accommodate players who have names similar to other players. The current functionality assumes that a player's game log can be found at the following url: https://www.pro-football-reference.com/players//00/gamelog//. However, players with names similar to other players may have a different url.

For example, the game log of Josh Allen (current Buffalo Bills QB) can be found at the following url: https://www.pro-football-reference.com/players/A/AlleJo02/gamelog/2022/. "AlleJo00" belongs to a different player named Josh Allen. There are countless other examples, including Damien Harris (https://www.pro-football-reference.com/players/H/HarrDa06.htm), Christian McCafrrey (https://www.pro-football-reference.com/players/M/McCaCh01.htm) and Davante Adams (https://www.pro-football-reference.com/players/A/AdamDa01.htm). As a result, the wrong pages are being scraped in cases like these.

In the case of Josh Allen, a potential solution may be to go to https://www.pro-football-reference.com/players/A/ and retrieve the href for the correct Josh Allen. Analogous steps could also be taken for other names.

team_game_log crashes attempting to parse games where one of the teams has 0 passing yards

Describe the bug
When a team has exactly 0 passing yards in a game, PFR reports it as an empty string in the table, which causes team_game_log to crash when attempting to cast that empty string to an integer with ValueError: invalid literal for int() with base 10: ''

To Reproduce
Steps to reproduce the behavior:

  1. Attempt to call get_team_game_log with a team/year in which one of the teams had a passing yards of 0. An example game would be week 17 of the Oakland Raiders's 2003 Season, in which they had 0 passing yards vs the Chargers.

Expected behavior
The program should check for an empty string (possibly for any stat? score has 0 reported, but other stats such as turnovers have an empty string when the value is 0 in PFR) and treat these cases as being 0

Desktop (please complete the following information):

  • OS: Windows 11

Additional context
Add any other context about the problem here.

Sample Code given does not even work

I tried to use the simple sample code provided just to test this out:

from pro_football_reference_web_scraper import team_game_log as t

game_log = t.get_team_game_log(team ='Kansas City Chiefs', season = 1995)
print(game_log)

And received the error:

"/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pro_football_reference_web_scraper/team_game_log.py", line 172, in get_team_game_log
return collect_data(soup, season, team)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pro_football_reference_web_scraper/team_game_log.py", line 206, in collect_data
games = soup.find_all('tbody')[1].find_all('tr')
IndexError: list index out of range

I have not gotten any functionality from this, but I have seen good reviews online. All I have done is pip install and then try this trial code.

AttributeError: 'NoneType' object has no attribute 'find_all'

When I run the following code as displayed in the ReadMe:

from pro_football_reference_web_scraper import player_game_log as p

game_log = p.get_player_game_log('Josh Allen', 'QB', 2022)
print(game_log)

I get the following error:

AttributeError: 'NoneType' object has no attribute 'find_all'

no bye week - get_team_game_log.py

Some teams in NFL history did not had a bye week. This causes an error in the logic of get_team_game_log.py. Fix the logic to account for teams that did not have a bye week

WR Snap % not tracked before 2012

Describe the bug
In player_game_log.py, WR game logs currently include offensive snap percentage. However, Pro Football Reference only has snap percentage from the 2012 season onward. As a result, any attempts to get a WR's game log before the 2011 season produces an error.

To Reproduce

from pro_football_reference_web_scraper import player_game_log as p

# this code produces an error
p.get_player_game_log('Julio Jones', 'WR', 2011)

# this code works
p.get_player_game_log('Julio Jones', 'WR', 2012)

Expected behavior
We should avoid the error by only tracking snap pct. in WR game logs if the season is >=2012.

Screenshots
Julio Jones, 2011:
Screen Shot 2023-05-03 at 1 10 23 PM

Julio Jones, 2012:
Screen Shot 2023-05-03 at 1 10 50 PM

Desktop (please complete the following information):

  • OS: iOS
  • Version 0.2.0

Smartphone (please complete the following information):
N/A

Additional context
N/A

change 'results' column in player_game_log.py

The 'results' column when retrieving a player's game log in player_game_log.py currently follows the format of '[result] [team points]-[opposing team points]'. For example, if you were to retrieve Tom Brady's game log from 2022, the results column of the first row would read 'W 19-3'. This format is probably not so useful for data analysis and should be separated into 3 separate columns (result, points_for, points_allowed).

All Stats for a player

Can we show all stats for a player, not just some? For instance, RBs have a lot of receptions, and some WRs have rushing attempts. In the current "get_player_game_log", you have us specify the position, which then limits the outputs in the resulting dataframe.

For Instance, when we pull the 2023 Week 2 log for Deebo Samuel:
p.get_player_game_log(player = 'Deebo Samuel', position = 'WR', season = 2023).loc[1]

We get the following:
date 2023-09-17
week 2
team SFO
game_location @
opp LAR
result W
team_pts 30
opp_pts 23
tgt 9
rec 6
rec_yds 63
rec_td 0
snap_pct 0.89
Name: 1, dtype: object

No rushing info at all. However, in pro-football-reference (https://www.pro-football-reference.com/players/S/SamuDe00.htm), we clearly see that Deebo actually had 5 rushing attempts, 38 rushing yards, and a rushing TD.

On the flip side, let's consider Christian McCaffrey, lets look at Week 4 (p.get_player_game_log(player = 'Christian McCaffrey', position = 'RB', season = 2023).loc[3]):
date 2023-10-01
week 4
team SFO
game_location
opp ARI
result W
team_pts 35
opp_pts 16
rush_att 20
rush_yds 106
rush_td 3
tgt 8
rec_yds 71
rec_td 1
Name: 3, dtype: object

Mostly everything is here except the number of receptions. It'd be great to add that.

I was just hoping you could return ALL of the stats for each player, instead of just a selection. There's so much more available on PFR that would be nice to easily retrieve.

Other than that, this is a super helpful tool and very easy to use! Thank you for supporting it! :D

incorrect/misspelled team name - team_game_log.py

Some teams have changed names over time (e.g. the Houston Oilers became the Tennessee Oilers and then the Tennessee Titans). Users may not be aware of these changes and may call get_team_game_log() with a mismatching team name/year pairing. Users may also simply misspell a team's name. Raise an exception when either of these things happen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.