Giter Club home page Giter Club logo

espnscraper's Introduction

Howdy y'all

I stay busy on GitHub by sharing datasets via the TidyTuesday project along with some various tutorials, my blog, and packages.

The Mockup Blog

This is my personal blog, you can find it at TheMockup.blog.

If you want to customize your distill blog - you can use your own custom CSS. The CSS components you can change are on the distill GitHub.

A table of some of my top posts are below:

Title Description
Reading tables from images with magick magick is an R package for manipulating images in R
2020 in Review Surviving a pandemic at home
Creating and using custom ggplot2 themes the best way to make each plot your own
Extracting JSON data from websites and public APIs with R tidyr + jsonlite are magical
Embedding custom HTML in gt tables HTML is basically a superpower.
Plotting Points as Images in ggplot Trials and tribulations of the various strategies.
Functions and Themes for gt tables Save time and effort in making beautiful tables
10+ Guidelines for Better Tables in R Make tables people ACTUALLY want to read.
Heatmaps in ggplot2 It’s more than just a passing fad.
Building a blog with distill I love simplicity.
Meta RMarkdown - Taxonomy and Use cases A meta collection of all things R Markdown.
Flipping tibbles for many models Pivoting data from wide to long to run many models at once
Bigger, nflfastR, dbplyr Doing more with dplyr and SQL

espnscraper's People

Contributors

bmacgtpm avatar colinifer avatar jthomasmock avatar tonyelhabr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

espnscraper's Issues

Two stray commas causing get_nfl_schedule to fail

These lines have commas at the end with nothing after them, and it was causing an error in get_nfl_schedule:

away_record = list(2, "records", 1, "summary"),

rec_leader_pos = list(3, "leaders", 1, "athlete", "position", "abbreviation"),

I updated purrr, dplyr, and tidyr and it now works, but thought I'd mention it anyway, in case others don't realize a package update will take care of it.

College QBR Data missing

Hi Tom,

I wanted to scrape some College QBR to analyse the 2021 prospects, so I was looking for QBR since 2017.
Unfortunately the QBR via get_college_qbr() data seems to be missing for all years before 2020 (except week 1). I attached my code where I tested with week 1 and 2.
I cross checked the ESPN website and it seems to be an issue with ESPN.

Any chance we can solve this problem?

Best regards,
Christian

get_qbr <- function(weeks, years, silent = F){
  qbr_raw <- NULL
  try(qbr_raw <- espnscrapeR::get_college_qbr(season=years, week = weeks))
  if (is.null(qbr_raw)) return(tibble::tibble())
  qbr_raw
}

all_qbr <- purrr::pmap_dfr(purrr::transpose(
  purrr::cross2(1:2,2017:2020)), get_qbr)
#> Scraping QBR for week 1 of 2017!
#> Scraping QBR for week 2 of 2017!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2018!
#> Scraping QBR for week 2 of 2018!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2019!
#> Scraping QBR for week 2 of 2019!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2020!
#> Scraping QBR for week 2 of 2020!

ESPN Data inconsistent

Note: This Issue isn't a code problem! It is just for information to the users and to make the developer aware of it.

ESPN is writing on it's Total QBR website

To qualify, a player must play a minimum of 20 action plays

which always was my explanation when a player was missing in the data. But it gets very confusing now. I am doing this example for the 2018 playoffs and didn't check it for other years.

2018 Wildcard weekend had the following games (winners bold):

  1. IND @ HOU
  2. SEA @ DAL
  3. LAC @ BAL
  4. PHI @ CHI

Running

qbr_week <- get_nfl_qbr("2018", season_type = "Playoffs", week = 1) %>%
  select(short_name, team_short_name, qbr_total, qb_plays)

leads to 3 entries
image

But running

qbr_all <- get_nfl_qbr("2018", season_type = "Playoffs", week = NA)%>%
  select(short_name, team_short_name, qbr_total, qb_plays)

leads to this
Bildschirmfoto 2020-03-20 um 10 50 35

In the total data there are not only more qbs from the wildcard weekend (Watson, Wilson, Trubisky), there is also another total qbr given for Lamar Jackson...
It is unclear which dataset to trust and the problem is that we can only combine qbs that lost because the overall dataset mixes the games of qbs who played more than one game.

Connection Error

Hi Tom,

I keep getting a connection error whenever I try get_college_qbr. Example below:

image

It happens for any combination of year or week I run. I tried updating espnscrapeR to see if that'd help but the same error keeps popping up.

Thanks,
Jerrick

get_nfl_teams() Error 403

Hi Tom,
I'm trying to use the function get_nfl_teams() and not working, send me the next error
Captura de pantalla 2021-05-28 a las 21 02 59
Surely espn's api is blocked
Thank u

Defense type in scrape_espn_stats()

Any chance we could get a Defense stat type in the scrape_espn_stats() function? This would be particularly helpful to me because ESPN is somewhat proprietary in how they define "Stuffs" compared to NFL standard Tackle for Loss, for example. I'm interested in correlations. Thanks!

2022 Pass Win Rates Update

Currently the 2022 season does not work with the latest version of the package, I added it here.

scrape_espn_win_rate <- function(season = 2022) {
  if (!(as.numeric(season) %in% c(2019:2022))) {
    stop("Data available for 2021-22")
  }
  pbwr_url <- "https://www.espn.com.au/nfl/story/_/id/34536376/2022-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
  pbwr_2021 <- "https://www.espn.com/nfl/story/_/id/32176833/2021-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
  pbwr_2020 <- "https://www.espn.com/nfl/story/_/id/29939464/2020-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
  pbwr_2019 <- "https://www.espn.com/nfl/story/_/id/27584726/nfl-pass-blocking-pass-rushing-rankings-2019-pbwr-prwr-leaderboard#prwrteam"
  pbwr_2018 <- "https://www.espn.com/nfl/story/_/id/25074144/nfl-pass-blocking-pass-rushing-stats-final-leaderboard-pass-block-win-rate-pass-rush-win-rate"
  stats_in <- c(
    "Pass Rush Win Rate", "Run Stop Win Rate",
    "Pass Block Win Rate", "Run Block Win Rate"
  )
  stat_2019 <- c("Pass Rush Win Rate", "Pass Block Win Rate")
  raw_html <- rvest::read_html(case_when(
    season == 2019 ~ pbwr_2019,
    season == 2020 ~ pbwr_2020,
    season == 2021 ~ pbwr_2021,
    season == 2022 ~ pbwr_url
  ))
  date_updated <- raw_html %>%
    rvest::html_node("#article-feed > article:nth-child(1) > div > div.article-body > div.article-meta > span > span") %>%
    rvest::html_text()
  raw_text <- raw_html %>%
    rvest::html_nodes("#article-feed > article:nth-child(1) > div > div.article-body > p") %>%
    rvest::html_text()
  tibble::enframe(raw_text) %>%
    filter(str_detect(value, "1. ")) %>%
    mutate(name = if_else(season == 2019, list(stat_2019),
      list(stats_in)
    )[[1]]) %>%
    mutate(value = str_split(
      value,
      "\n"
    )) %>%
    unnest_longer(value) %>%
    separate(value, into = c(
      "rank",
      "team", "win_pct"
    ), sep = "\\. |, ") %>%
    mutate(
      rank = as.integer(rank),
      win_pct = str_remove(win_pct, "%"), win_pct = as.double(win_pct),
      date_updated = date_updated, season = season
    ) %>%
    rename(
      stat = name,
      stat_rank = rank
    )
}

`scrape_team_stats_nfl` gives error when `role="defense"`

When I try this

scrape_team_stats_nfl(season = 2022, stats = "passing", role = "defense")

I get this error

Error in `purrr::set_names()`:
! The size of `nm` (15) must be compatible with the size of `x` (11).

A similar error occurs when stats is changed to rushing, scoring, or downs. When stats = "receiving", there is a different error. In particular, this code

scrape_team_stats_nfl(season = season, stats = "receiving", role = "defense")

gives this error

Error in rvest::html_table(raw_html, fill = TRUE)[[1]] : 
  subscript out of bounds

The code works well for all 5 choices of stats when role="offense". The errors only happen when role="defense".

List of Player ID's

Hi! Is there a way to get a list of player ID's from a season or multiple seasons? (i.e. all player IDs) I would like to use get_athlete() to get player info for all athletes from a season. Thank you!

get_nfl_boxscore_players(game_id = "401220131")

Hi,

I'm trying to use the function get_nfl_boxscore_players with game_id = "401220131" but it's not working. This is the 9/13/2020 MIA vs. NE game. Other functions recognize this game_id (e.g. get_nfl_pbp(game_id="401220131") works.

Here's the error that I am getting:
player_game = get_nfl_boxscore_players(game_id = "401220131");
Error: Problem with mutate() input ..1.
x 'list' object cannot be coerced to type 'double'
i Input ..1 is across(c(pass_yds:punt_long), ~suppressWarnings(as.double(.x))).
Run rlang::last_error() to see where the error occurred.

Error in player name for players on multiple teams throughout the year

When running the scrape_espn_stats() function, I noticed that players who were on multiple teams have an error. For example, Kenyan Drake's name shows up as "Kenyan DrakeMIA/" with "ARI" showing up in the "team" field. Same thing for Josh Gordon's name showing up as "Josh GordonNE/" with "SEA" appearing in the "team" field.

Some boxscores do not appear

I have tried to pull boxscore data and have been successful for the vast majority of games but games like
get_nfl_boxscore(game_id = "301223023")

do not work on rstudio.

Wrong team in ESPN API?

There are some team errors in the 2019 data. qbr in the below code looks like this:
grafik

In this example it's week 4 but the teams are wrong for the whole season. I don't know if those are all or if there is more.

qbr <- get_nfl_qbr("2019", season_type = "Regular", week = 4) %>%
  group_by(team_short_name) %>% 
  filter(n()>1) %>%
  select(short_name, team_short_name) %>%
  arrange(team_short_name)

Unable to pull 2020 College QBR

When I try to pull the 2020 College QBR numbers I get this error:

image

As you can see from the image, I'm able to get the 2019 numbers fine, it just seems to be an issue with 2020.

Thanks!

Intragame win probability

Espn posts win probabilities that are updated live with each play/ clock tick during games. Have you looked at scraping this and/or is there a repo of anything interesting, eg time stamped probability data anywhere?

get_athlete() Error: Not Found (HTTP 404)

get_athlete() function fails and returns the following: "Error: Not Found (HTTP 404)"

After browsing the source code the error is thrown after running this section of code:

  raw_get  <- base_url %>%
    glue::glue() %>%
    httr::GET()
  
  httr::stop_for_status(raw_get)

#Error: Not Found (HTTP 404).

It appears that the link for the API request is invalid, perhaps due to a change on ESPN's end. This function was working properly for me about two weeks ago, but has since stopped working.

get_college_qbr() only pulling 2020 data

Hi Tom,

Thanks for updating the package, and it's working great for 2020, but the 2020 season is the only data I can pull. If I'm reading it right, it looks like 2020 is hardcoded into the function here:

image

Thanks!

Installation Assistance

Hello, This may be a bad question, but how exactly do I install this? I'm having trouble understanding how to install this program onto a machine (Debian VM).

Is someone able to in more detail explain how I install this? I tried to run the remote.... command in the terminal, and I get this output: syntax error near unexpected token `('

What am I doing wrong?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.