Giter Club home page Giter Club logo

evolving-hockey's Introduction

lifecycle

Scraper Walkthrough

The sc.scrape_pbp function is used to scrape one or more games from the NHL's publicly available data. A list is returned with data that is requested.

Example:

## Dependencies
library(RCurl); library(xml2); library(rvest); library(jsonlite); library(foreach)
library(lubridate)
library(tidyverse) ## -- specifically: stringr, readr, tidyr, and dplyr

## Source scraper functions from GitHub
devtools::source_url("https://raw.githubusercontent.com/evolvingwild/evolving-hockey/master/EH_scrape_functions.R")

## Scrape games
pbp_scrape <- sc.scrape_pbp(games = c("2018020001", "2018020002"))

Function Arguments

games:

  • a vector of full NHL game IDs (one or more may be provided)
  • example: 2018020001

scrape_type:

  • "full": all data returned
  • "event_summary": only event summary, rosters, and scratches information returned
  • "rosters": only rosters and scratches information returned
  • default is "full"

live_scrape:

  • FALSE = function adjusts incorrect player & goalie shifts
  • TRUE = function does not adjust incorrect player & goalie shifts (this should be used when scraping games that are in progress)
  • default is FALSE

verbose:

  • TRUE = print the system time for each game scraped
  • default is TRUE

sleep:

  • time to wait between each game being scraped (in seconds)
  • default is 0

Full Scrape Example:

## Scrape the first 100 games from the 20182019 regular season

games_vec <- c(as.character(seq(2018020001, 2018020100, by = 1)))

pbp_scrape <- sc.scrape_pbp(games = games_vec)

## Pull out of list
game_info_df_new <-     pbp_scrape$game_info_df       ## game information data
pbp_base_new <-         pbp_scrape$pbp_base           ## main play-by-play data
pbp_extras_new <-       pbp_scrape$pbp_extras         ## extra play-by-play data
player_shifts_new <-    pbp_scrape$player_shifts      ## full player shifts data
player_periods_new <-   pbp_scrape$player_periods     ## player TOI sums per period
roster_df_new <-        pbp_scrape$roster_df          ## roster data
scratches_df_new <-     pbp_scrape$scratches_df       ## scratches data
event_summary_df_new <- pbp_scrape$events_summary_df  ## event summary data
scrape_report <-        pbp_scrape$report             ## scrape report

Scrape Schedule Example:

## Get yesterday's schedule
schedule_current <- sc.scrape_schedule(start_date = Sys.Date() - 1, end_date =   Sys.Date() - 1)

evolving-hockey's People

Contributors

evolvingwild avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.