Giter Club home page Giter Club logo

baseballr's Introduction

baseballr

CRAN version CRAN downloads Version-Number R-CMD-check Lifecycle:maturing Contributors

baseballr is a package written for R focused on baseball analysis. It includes functions for scraping various data from websites, such as FanGraphs.com, Baseball-Reference.com, and baseballsavant.mlb.com. It also includes functions for calculating metrics, such as wOBA, FIP, and team-level consistency over custom time frames.

You can read more about some of the functions and how to use them at its official site as well as this Hardball Times article.

Installation

You can install the CRAN version of baseballr with:

install.packages("baseballr")

You can install the released version of baseballr from GitHub with:

# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
  install.packages('pacman')
}
pacman::p_load_current_gh("BillPetti/baseballr")
# Alternatively, using the devtools package:
if (!requireNamespace('devtools', quietly = TRUE)){
  install.packages('devtools')
}
devtools::install_github(repo = "BillPetti/baseballr")

For experimental functions in development, you can install the development branch:

# install.packages("devtools")
devtools::install_github("BillPetti/baseballr", ref = "development_branch")

Functionality

The package consists of two main sets of functions: data acquisition and metric calculation.

For example, if you want to see the standings for a specific MLB division on a given date, you can use the bref_standings_on_date() function. Just pass the year, month, day, and division you want:

library(baseballr)
library(dplyr)
bref_standings_on_date("2015-08-01", "NL East", from = FALSE)
## ── MLB Standings on Date data from baseball-reference.com ─── baseballr 1.5.0 ──

## ℹ Data updated: 2023-12-25 02:24:44 EST

## # A tibble: 5 × 8
##   Tm        W     L `W-L%` GB       RS    RA `pythW-L%`
##   <chr> <int> <int>  <dbl> <chr> <int> <int>      <dbl>
## 1 WSN      54    48  0.529 --      422   391      0.535
## 2 NYM      54    50  0.519 1.0     368   373      0.494
## 3 ATL      46    58  0.442 9.0     379   449      0.423
## 4 MIA      42    62  0.404 13.0    370   408      0.455
## 5 PHI      41    64  0.39  14.5    386   511      0.374

Right now the function works as far as back as 1994, which is when both leagues split into three divisions.

You can also pull data for all hitters over a specific date range. Here are the results for all hitters from August 1st through October 3rd during the 2015 season:

data <- bref_daily_batter("2015-08-01", "2015-10-03") 
data %>%
  dplyr::glimpse()
## Rows: 764
## Columns: 30
## $ bbref_id <chr> "machama01", "duffyma01", "altuvjo01", "eatonad02", "choosh01…
## $ season   <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
## $ Name     <chr> "Manny Machado", "Matt Duffy", "José Altuve", "Adam Eaton", "…
## $ Age      <dbl> 22, 24, 25, 26, 32, 21, 27, 28, 36, 28, 29, 29, 27, 29, 27, 2…
## $ Level    <chr> "Maj-AL", "Maj-NL", "Maj-AL", "Maj-AL", "Maj-AL", "Maj-AL", "…
## $ Team     <chr> "Baltimore", "San Francisco", "Houston", "Chicago", "Texas", …
## $ G        <dbl> 59, 59, 57, 58, 58, 58, 59, 58, 59, 57, 55, 57, 57, 58, 56, 5…
## $ PA       <dbl> 266, 264, 262, 262, 260, 259, 259, 258, 257, 257, 255, 255, 2…
## $ AB       <dbl> 237, 248, 244, 230, 211, 224, 239, 235, 231, 233, 213, 218, 2…
## $ R        <dbl> 36, 33, 30, 37, 48, 35, 32, 29, 37, 27, 50, 37, 36, 25, 38, 4…
## $ H        <dbl> 66, 71, 81, 74, 71, 79, 54, 66, 75, 48, 65, 56, 61, 51, 78, 5…
## $ X1B      <dbl> 43, 54, 53, 56, 47, 51, 34, 37, 48, 30, 34, 32, 35, 33, 66, 2…
## $ X2B      <dbl> 10, 12, 19, 12, 14, 17, 6, 17, 16, 11, 13, 13, 15, 10, 7, 13,…
## $ X3B      <dbl> 0, 2, 3, 1, 1, 4, 1, 0, 2, 1, 2, 4, 0, 1, 3, 0, 4, 0, 1, 1, 0…
## $ HR       <dbl> 13, 3, 6, 5, 9, 7, 13, 12, 9, 6, 16, 7, 11, 7, 2, 20, 9, 8, 8…
## $ RBI      <dbl> 32, 30, 18, 31, 34, 32, 27, 40, 53, 21, 50, 19, 31, 39, 23, 4…
## $ BB       <dbl> 26, 15, 10, 23, 39, 18, 16, 17, 21, 21, 34, 33, 21, 39, 12, 3…
## $ IBB      <dbl> 1, 0, 1, 1, 1, 0, 0, 6, 1, 1, 0, 1, 1, 5, 0, 4, 3, 3, 7, 2, 2…
## $ uBB      <dbl> 25, 15, 9, 22, 38, 18, 16, 11, 20, 20, 34, 32, 20, 34, 12, 35…
## $ SO       <dbl> 42, 35, 28, 55, 51, 38, 68, 56, 29, 53, 46, 62, 41, 48, 27, 7…
## $ HBP      <dbl> 2, 0, 4, 5, 8, 1, 3, 5, 1, 1, 2, 3, 3, 1, 1, 6, 1, 3, 4, 1, 0…
## $ SH       <dbl> 0, 0, 1, 2, 1, 11, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, …
## $ SF       <dbl> 1, 1, 3, 2, 1, 5, 1, 1, 4, 2, 5, 1, 2, 2, 3, 0, 3, 2, 3, 4, 3…
## $ GDP      <dbl> 5, 9, 6, 1, 1, 4, 2, 2, 9, 7, 5, 1, 4, 8, 1, 2, 3, 10, 5, 4, …
## $ SB       <dbl> 6, 8, 11, 9, 2, 10, 0, 0, 0, 3, 3, 4, 5, 4, 24, 2, 1, 0, 6, 0…
## $ CS       <dbl> 4, 0, 4, 4, 0, 2, 0, 0, 0, 1, 0, 1, 3, 2, 7, 2, 3, 0, 2, 0, 0…
## $ BA       <dbl> 0.279, 0.286, 0.332, 0.322, 0.337, 0.353, 0.226, 0.281, 0.325…
## $ OBP      <dbl> 0.353, 0.326, 0.364, 0.392, 0.456, 0.395, 0.282, 0.341, 0.377…
## $ SLG      <dbl> 0.485, 0.387, 0.508, 0.448, 0.540, 0.558, 0.423, 0.506, 0.528…
## $ OPS      <dbl> 0.839, 0.713, 0.872, 0.840, 0.996, 0.953, 0.705, 0.848, 0.906…

In terms of metric calculation, the package allows the user to calculate the consistency of team scoring and run prevention for any year using team_consistency():

team_consistency(2015)
## # A tibble: 30 × 5
##    Team  Con_R Con_RA Con_R_Ptile Con_RA_Ptile
##    <chr> <dbl>  <dbl>       <dbl>        <dbl>
##  1 ARI    0.37   0.36          17           15
##  2 ATL    0.41   0.4           88           63
##  3 BAL    0.4    0.38          70           42
##  4 BOS    0.39   0.4           52           63
##  5 CHC    0.38   0.41          30           85
##  6 CHW    0.39   0.4           52           63
##  7 CIN    0.41   0.36          88           15
##  8 CLE    0.41   0.4           88           63
##  9 COL    0.35   0.34           7            3
## 10 DET    0.39   0.38          52           42
## # ℹ 20 more rows

You can also calculate wOBA per plate appearance and wOBA on contact for any set of data over any date range, provided you have the data available.

Simply pass the proper data frame to woba_plus:

data %>%
  dplyr::filter(PA > 200) %>%
  woba_plus %>%
  dplyr::arrange(desc(wOBA)) %>%
  dplyr::select(Name, Team, season, PA, wOBA, wOBA_CON) %>%
  dplyr::glimpse()
## Rows: 117
## Columns: 6
## $ Name     <chr> "Edwin Encarnación", "Bryce Harper", "David Ortiz", "Joey Vot…
## $ Team     <chr> "Toronto", "Washington", "Boston", "Cincinnati", "Baltimore",…
## $ season   <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
## $ PA       <dbl> 216, 248, 213, 251, 253, 260, 245, 255, 223, 241, 223, 259, 2…
## $ wOBA     <dbl> 0.490, 0.450, 0.449, 0.445, 0.434, 0.430, 0.430, 0.422, 0.410…
## $ wOBA_CON <dbl> 0.555, 0.529, 0.541, 0.543, 0.617, 0.495, 0.481, 0.494, 0.459…

You can also generate these wOBA-based stats, as well as FIP, for pitchers using the fip_plus() function:

bref_daily_pitcher("2015-04-05", "2015-04-30") %>% 
  fip_plus() %>% 
  dplyr::select(season, Name, IP, ERA, SO, uBB, HBP, HR, FIP, wOBA_against, wOBA_CON_against) %>%
  dplyr::arrange(dplyr::desc(IP)) %>% 
  head(10)
## ── MLB Daily Pitcher data from baseball-reference.com ─────── baseballr 1.5.0 ──

## ℹ Data updated: 2023-12-25 02:27:52 EST

## # A tibble: 10 × 11
##    season Name               IP   ERA    SO   uBB   HBP    HR   FIP wOBA_against
##     <int> <chr>           <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>
##  1   2015 Johnny Cueto     37    1.95    38     4     2     3  2.62        0.21 
##  2   2015 Dallas Keuchel   37    0.73    22    11     0     0  2.84        0.169
##  3   2015 Sonny Gray       36.1  1.98    25     6     1     1  2.69        0.218
##  4   2015 Mike Leake       35.2  3.03    25     7     0     5  4.16        0.24 
##  5   2015 Félix Hernández  34.2  1.82    36     6     3     1  2.2         0.225
##  6   2015 Corey Kluber     34    4.24    36     5     2     2  2.4         0.295
##  7   2015 Jake Odorizzi    33.2  2.41    26     8     1     0  2.38        0.213
##  8   2015 Josh Collmenter  32.2  2.76    16     3     0     1  2.82        0.29 
##  9   2015 Bartolo Colón    32.2  3.31    25     1     0     4  3.29        0.28 
## 10   2015 Zack Greinke     32.2  1.93    27     7     1     2  3.01        0.24 
## # ℹ 1 more variable: wOBA_CON_against <dbl>

Issues

Please leave any suggestions or bugs in the Issues section.

Pull Requests

Pull request are welcome, but I cannot guarantee that they will be accepted or accepted quickly. Please make all pull requests to the development branch for review.

Breaking Changes

Full News on Releases

Follow the SportsDataverse (@SportsDataverse) on Twitter and star this repo

GitHub stars

Our Authors

  • Bill Petti (@BillPetti)

    @BillPetti

  • Saiem Gilani (@saiemgilani)

    @saiemgilani

Our Contributors (they’re awesome)

  • Ben Baumer (@BaumerBen)

    @beanumber

  • Ben Dilday (@BenDilday)

    @bdilday

  • Robert Frey (@RobertFrey40)

    @robert-frey

  • Camden Kay (@k_camden)

    @camdenk

Citations

To cite the baseballr R package in publications, use:

BibTex Citation

@misc{petti_gilani_2021,
  author = {Bill Petti and Saiem Gilani},
  title = {baseballr: The SportsDataverse's R Package for Baseball Data.},
  url = {https://billpetti.github.io/baseballr/},
  year = {2021}
}

baseballr's People

Contributors

a-meyers avatar afeierman avatar apapanico avatar bbwieland avatar bdilday avatar beanumber avatar begavett avatar billpetti avatar camdenk avatar christianh00k avatar darh78 avatar hadley avatar jonathan-inwt avatar keberwein avatar lawwu avatar markromanmiller avatar mmcgowan13 avatar robert-frey avatar saiemgilani avatar sboysel avatar shanepiesik avatar travisrpetersen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

baseballr's Issues

Get "Error in .[[33]] : subscript out of bounds" with fg_bat_leaders

I get this error no matter what I do:

> fg <- fg_bat_leaders(2016, 2016, 0)
Error in .[[33]] : subscript out of bounds
> fg <- fg_bat_leaders(2015, 2016, 0)
Error in .[[33]] : subscript out of bounds
> fg <- fg_bat_leaders(2015, 2016, 4)
Error in .[[33]] : subscript out of bounds

Not sure what other info you'd find helpful. Let me know and I will provide it :-)

scraping statcast causes error for dates with no games

There is a tryCatch in the scraping function, but it isn't catching errors and warnings. If you try to loop over a sequence of days, this can cause it to fail in the middle of the loop and lose the work done up to that point.

example,

date_seq = seq(as.Date("2017-07-09"), as.Date("2017-07-14"), by=1)

statcast_list = lapply(date_seq, function(d) {scrape_statcast_savant_batter_all(start_date = as.character(d), end_date = as.character(d))})

[1] "These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved."
[1] "Grabbing data, this may take a minute..."
URL read and payload aquired successfully.
[1] "These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved."
[1] "Grabbing data, this may take a minute..."
URL caused a warning. Make sure your date range is correct:
Original warning message:
incomplete final line found by readTableHeader on 'https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC=&hfSea=2017%7C&hfSit=&player_type=batter&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&game_date_gt=2017-07-10&game_date_lt=2017-07-10&team=&position=&hfRO=&home_road=&hfFlag=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details&'
 Error in scrape_statcast_savant_batter_all(start_date = as.character(d),  : 
  object 'payload' not found 

statcast_list
Error: object 'statcast_list' not found

Error in namespaceExport

I'm trying to install baseballr on R 3.2.3 (both on Windows and Linux) and I'm getting the following error:
The downloaded source packages are in
‘/tmp/RtmpRFVgnk/downloaded_packages’
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD INSTALL '/tmp/RtmpRFVgnk/devtools86a798f69fb/BillPetti-baseballr-7a96d6e'
--library='/home/martin/R/x86_64-pc-linux-gnu-library/3.2' --install-tests

  • installing source package ‘baseballr’ ...
    ** R
    ** preparing package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    Error in namespaceExport(ns, exports) : undefined exports: aging_curves
    Error: loading failed
    Execution halted
    ERROR: loading failed
  • removing ‘/home/martin/R/x86_64-pc-linux-gnu-library/3.2/baseballr’
    Error: Command failed (1)

Am I missing something or is the package not yet compatible with version 3.2.3?

team_results_bref produces "Error: Column 20 must be named"

Attempted to run the example code for team_results_bref and received the following error:

nyy <- team_results_bref('NYY', 2017)
Error: Column 20 must be named

standings_on_date_bref function appears to have run as expected.

R 3.4.2/RStudio 1.1.423/macOS 10.13

Let me know if there is any additional logging that would be beneficial.

Error Installing

The bulk of the package is installing for me, but the full package isn't. I'm getting error messages at this stage.

trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/XML_3.98-1.6.tgz'
Error in download.file(url, destfile, method, mode = "wb", ...) :
cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/XML_3.98-1.6.tgz'
In addition: Warning message:
In download.file(url, destfile, method, mode = "wb", ...) :
cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/XML_3.98-1.6.tgz': HTTP status was '404 Not Found'
Warning in download.packages(x$name, destdir = dest_dir, repos = x$repos, :
download of package ‘XML’ failed
Error in download.packages(x$name, destdir = dest_dir, repos = x$repos, :
subscript out of bounds

Minor league daily player data

Would it be possible to create a new function that pulls minor league daily data for both hitters and pitchers from Fangraphs? Possibly with the ability to select from all leagues, or specify a particular level?

Thanks for creating this package, it's been a great help!

Easiest way to rebuild a Statcast database

I followed this great post in order to build my Statcast database. https://billpetti.github.io/2018-02-19-build-statcast-database-rstats/

The only additions were:

con <- DBI::dbConnect(RSQLite::SQLite(), dbname = "statcast.sqlite3")

dbWriteTable(con, "statcast", statcast_bind)

I went to look at the Statcast data on Baseball Savant, and it looks like some of the pitch type data has changed. I haven't looked at every year yet, but it has definitely changed for 2017.

2 Questions for you.

  1. What's the easiest way to rebuild the database? At a minimum, I'd want to replace the 2017 values in my database.

  2. Can I bind 1 season at a time to the Database using the dbWriteTable function above, or would that function overwrite my existing "statcast" table in the database?

Thanks!

season-to-date information provided by standings_on_date_bref, daily_batter_bref, daily_pitcher_bref

Thank you, Bill, great package!

It would be extremely useful to be able to scrape the season-to-date team standings, batter stats, and pitcher stats for any given date since the beginning of the corresponding season.

So, instead of daily_batter_bref("2015-05-10", "2015-06-20") yielding the batter stats averaged across the time period between the first date to the 2nd date provided, would it be possible to output the same stats, but from the beginning of the season to the specified date (e.g., "2015-05-10"), and the season-to-date stats for every subsequent day within the range of the first date and the second date (e.g., "2015-06-20") specified, with a separate row of data for each date?

Thanks again for a very useful package.

pitcher data doesn't query over multiple years

R version 3.3.3

This is the version I am currently running but when trying to install baseballr it says that it is unavailable for this version. Is this just a matter of time before it will be available or do I have to downgrade to a previous version?

viz_gb_on_period error: could not find function "hcaes"

I tried running the following code and received the following error:

library(baseballr)
viz_gb_on_period("2018-03-29", "2018-04-14", "NL Central")

Error in hcaes(x = Date, y = GB, group = Team) :
could not find function "hcaes"

Since that is from the highcharter package I accessed its library directly with library(highcharter) and then it worked, so I'm assuming there's a dependency issue or missing highcharter::hcaes.

Error in standings_on_date_bref()

standings_on_date_bref() fails with an error.

My code:

library(baseballr)
standings_on_date_bref("2015-08-04", "AL East")

Error in setNames(., table_names[ind]) :
'names' attribute [1] must be the same length as the vector [0]

Running baseballr version 0.3.2 on Windows 10 with R 3.3.3.

standings_on_date_bref() throwing error

Installed on April 14, 2016, and tried to run the first example in the README: standings_on_date_bref.

Copying and pasting the exact code in the example in, I received an error:

> standings_on_date_bref("2015-08-01", "NL East", from = FALSE)
Error in function_list[[i]](value) : could not find function "html_text"

This same error threw when testing other dates as well.

I'm using R 3.2.0 on a Mac running OS 10.10.5 Yosemite.

running fg_bar_leaders results in incorrect number of dimensions

When I run this fg_bat_leaders code on the project website:

head(fg_bat_leaders(x = 2015, y = 2016, league = "all", qual = 1200, ind = 0)) %>% select(Seasons:AVG)

I get the following error:

Error in leaders[1, ] : incorrect number of dimensions

Is this a me problem?

To install baseballr I also was asked to install 'selectr'

After installing baseballr thanks to your help, I then went to do your first example and it did not work. t asked for 'selectr' I installed it and then ran your sample

standings_on_date_bref("2015-08-01", "NL East", from = FALSE)
Error in loadNamespace(name) : there is no package called ‘selectr’

install.packages("selectr", dependencies = FALSE)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/selectr_0.3-1.zip'
Content type 'application/zip' length 159942 bytes (156 KB)
downloaded 156 KB

package ‘selectr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpoP2z55\downloaded_packages

> standings_on_date_bref("2015-08-01", "NL East", from = FALSE)
$NL East
Tm W L W-L% GB RS RA pythW-L%
1 WSN 54 48 0.529 -- 422 391 0.535
2 NYM 54 50 0.519 1.0 368 373 0.494
3 ATL 46 58 0.442 9.0 379 449 0.423
4 MIA 42 62 0.404 13.0 370 408 0.455
5 PHI 41 64 0.390 14.5 386 511 0.374

team_results_bref Issues

The following issues arise in team_results_bref:

  1. After every 50 games, the headers appear as a row of data, which should not be included
  2. Column 4 is labelled ".1".
  3. Column 5 H_A only shows "H"
  4. Result included -wo for Walkoffs.

I have tried using exports from Baseball-Reference's team page myself and have had trouble, so these issues are totally understandable! This package is awesome, thanks for helping the baseball community so much.

Reverse playerid_lookup

Idea for an enhancement: reverse functionality for playerid_lookup to return player name based on playerid.

scrape_statcast_savant_all only returning most recent 40000 rows

When I use scrape_statcast_savant_batter_all(start_date = "2018-03-28", end_date = "2018-04-17") or pitcher, all I get is 40000 rows of data exactly with a date range of 2018-04-07 to 2018-04-17, so it appears I only get the most recent 40000 data points.

team_consistancy error

team_consistency(2017)
Error in team_results_bref(.$Tm, .$year) : object 'col_names' not found

I'm getting the error above when trying the team_consistency function

Issue with fg_bat_leaders() html scrape

I've been trying to use the fg_bat_leaders() command, but it looks like the new Fangraphs layout broke the html scraper. I get an Error in .[[24]] : subscript out of bounds. When I tried copy and pasting the function guts to find the error I found the failure comes on the initial read_html() call. I assume something in the paste0() needs to be changed with the FG update, but I could be wrong!

Is anyone else getting this result?

SwStr%

Looking at your stat line for stat cast. Terrific script. Shouldn't SwStr% include swinging_strikes_blocked and foul_tips? I did a quick data check on baseball savant and it appears that Whiffs include both of those in addition to swinging_strikes.

velo_monthly function

Hello all,

I've written a function that generates a plot of monthly averages of release velocity of given a dataset and updated it so that it is applicable for the Statcast data in the baseballr package. I'd like to contribute developing the package by writing functions that visualize Statcast and PITCHf/x data.

This following code would generate a plot that shows how Justin Verlander's velocity changed over seasons.

library(ggvis)
library(xts)
library(baseballr)

#Using ggvis and xts packages, function will generate a plot of monthly average of release velocity of given a dataframe.

#Justin Verlander's 2013-2016 statcast data
verlander <- scrape_statcast_savant_pitcher(start_date = "2013-04-06", end_date = "2016-10-31",pitcherid =434378)

velo_monthly <- function(df,overplot=F ,fastball="both"){

#Fastball vs. Non-fastball

df$fastball <- as.factor(df$pitch_type %in% c("FA","FF","FC","FT","FS"))
levels(df$fastball) <- c("F", "NF")

if(fastball=="NF"){
ndf <- df %>%filter(fastball=="NF")
shapes <- "cross"
} else if(fastball=="F"){
ndf <- df %>% filter(fastball=="F")
shapes <- "circle"
}else {
ndf <- df
shapes <-"diamond"
}

#Time Series

idx <- ndf$game_date
df_ <- xts(ndf[,c("pitcher","game_date","inning","fastball","pitch_type","start_speed")],order.by=idx)

#Monthly avg of velocity

mthlySumm <- apply.monthly(df_[,6],mean,na.rm=T)
mthlysum <- as.data.frame(coredata(mthlySumm))

mthdat <- as.data.frame(mthlysum[,1])
names(mthdat) <- "velo_mon"
mthdat$period <- index(mthlySumm)
mthdat$seasonYear <- year(mthdat$period)
mthdat$month <- month(mthdat$period)

#overplot over seasonYear

if(overplot==F){
ans <-mthdat %>% ggvis(~period,~velo_mon) %>%layer_points(fill=~as.factor(seasonYear),shape:=shapes) %>% group_by(seasonYear) %>% layer_smooths(stroke=~as.factor(seasonYear)) %>%add_axis("x",title=paste(df$player_name[1],min(df$game_date),max(df$game_date)),subdivide = 2) %>% add_axis("y", title="Velocity Monthly Average",subdivide=4) %>%add_legend(c("fill","stroke"), title = "Season", orient = "right")

}else{
ans<-mthdat %>% ggvis(~month,~velo_mon) %>%layer_points(fill=~as.factor(seasonYear),shape:=shapes) %>% group_by(seasonYear) %>% layer_smooths(stroke=~as.factor(seasonYear))%>%add_axis("x",title=paste(df$player_name[1],min(df$game_date),max(df$game_date)),subdivide = 2) %>% add_axis("y", title="Velocity Monthly Average",subdivide=4) %>%add_legend(c("fill","stroke"), title = "Season", orient = "right")

}

return(ans)
}

#Justin Verlander's 2013-2016 statcast data
verlander <- scrape_statcast_savant_pitcher(start_date = "2013-04-06", end_date = "2016-10-31",pitcherid =434378)

velo_monthly(verlander)

velo_monthly(verlander,fastball="NF")
velo_monthly(verlander,overplot=TRUE,fastball="F)

Issue with FG Leaders Pull

I tried running this data pull:
head(fg_bat_leaders(x = 2015, y = 2016, league = "all", qual = "y", ind = 0)) %>%

  • select(Seasons:AVG)
    

I got the error below:
Error in select_(.data, .dots = lazyeval::lazy_dots(...)) :
object 'Seasons' not found
In addition: Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion

Problem installing

Downloading GitHub repo BillPetti/baseballr@master
from URL https://api.github.com/repos/BillPetti/baseballr/zipball/master
Error: Does not appear to be an R package (no DESCRIPTION)

`viz_gb_on_period` could not find function "hc_theme_smpl"

Hi,

Running the following code from scratch results on an error in hc_theme_smpl() function from the highcharter package.

library(baseballr)
viz_gb_on_period("2018-03-29", "2018-04-18", "AL East")
 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 52s
# A tibble: 10 x 7
   League  Date       Team      W     L WLpct    GB
   <chr>   <date>     <chr> <int> <int> <dbl> <dbl>
 1 AL East 2018-03-29 NYY       1     0 1.00   0.  
 2 AL East 2018-03-29 TBR       1     0 1.00   0.  
 3 AL East 2018-03-29 BAL       1     0 1.00   0.  
 4 AL East 2018-03-29 BOS       0     1 0.     1.00
 5 AL East 2018-03-29 TOR       0     1 0.     1.00
 6 AL East 2018-04-18 BOS      15     2 0.882  0.  
 7 AL East 2018-04-18 TOR      12     5 0.706  3.00
 8 AL East 2018-04-18 NYY       8     8 0.500  6.50
 9 AL East 2018-04-18 TBR       5    13 0.278 10.5 
10 AL East 2018-04-18 BAL       5    13 0.278 10.5 
Error in hc_theme_smpl() : could not find function "hc_theme_smpl"

The problem is solved if we run library (highcharter), so it seems that the issue is related to importing function hc_theme_smpl in the baseballr package.

My R session is (after loading highcharter):

R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252  LC_CTYPE=Portuguese_Portugal.1252   
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C                        
[5] LC_TIME=Portuguese_Portugal.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] highcharter_0.5.0 bindrcpp_0.2      baseballr_0.3.3  

loaded via a namespace (and not attached):
 [1] httr_1.3.1          tidyr_0.8.0         jsonlite_1.5       
 [4] splines_3.4.4       Formula_1.2-2       assertthat_0.2.0   
 [7] TTR_0.23-3          latticeExtra_0.6-28 selectr_0.3-2      
[10] yaml_2.1.18         pillar_1.2.1        backports_1.1.2    
[13] lattice_0.20-35     glue_1.2.0          rlist_0.4.6.1      
[16] digest_0.6.15       RColorBrewer_1.1-2  checkmate_1.8.5    
[19] rvest_0.3.2         colorspace_1.3-2    htmltools_0.3.6    
[22] Matrix_1.2-12       plyr_1.8.4          psych_1.7.8        
[25] XML_3.98-1.10       pkgconfig_2.0.1     broom_0.4.3        
[28] purrr_0.2.4         scales_0.5.0        XML2R_0.0.6        
[31] htmlTable_1.11.2    tibble_1.4.2        mgcv_1.8-23        
[34] ggplot2_2.2.1       pbapply_1.3-4       nnet_7.3-12        
[37] hexbin_1.27.2       lazyeval_0.2.1      cli_1.0.0          
[40] quantmod_0.4-12     mnormt_1.5-5        crayon_1.3.4       
[43] survival_2.41-3     magrittr_1.5        nlme_3.1-131.1     
[46] MASS_7.3-49         xts_0.10-2          xml2_1.2.0         
[49] foreign_0.8-69      reldist_1.6-6       tools_3.4.4        
[52] data.table_1.10.4-3 stringr_1.3.0       munsell_0.4.3      
[55] cluster_2.0.6       compiler_3.4.4      rlang_0.2.0        
[58] grid_3.4.4          RCurl_1.95-4.10     rstudioapi_0.7     
[61] pitchRx_1.8.2       htmlwidgets_1.0     igraph_1.2.1       
[64] bitops_1.0-6        base64enc_0.1-3     gtable_0.2.0       
[67] curl_3.1            reshape2_1.4.3      R6_2.2.2           
[70] gridExtra_2.3       zoo_1.8-1           lubridate_1.7.3    
[73] knitr_1.20          dplyr_0.7.4         utf8_1.1.3         
[76] bindr_0.1.1         Hmisc_4.1-1         stringi_1.1.7      
[79] parallel_3.4.4      Rcpp_0.12.16        rpart_4.1-13       
[82] acepack_1.4.1       tidyselect_0.2.4

30,000 row limit for scrape?

There seems to be a limit of 30,000 rows when using the scrape_statcast functions. When I run:

start="2015-04-01"
stop="2015-05-01"
statcast_pitching=scrape_statcast_savant_pitcher_all(start,stop)
min(statcast_pitching$game_date)

The result is:

[1] "2015-04-24"

And I get a data set of 30,000 rows

If I change the stop date to an earlier date:

start="2015-04-01"
stop="2015-04-10"
statcast_pitching=scrape_statcast_savant_pitcher_all(start,stop)
min(statcast_pitching$game_date)

I get:

[1] "2015-04-05"

And a data set of ~17,500 rows.

And then if I try a very long window:

start="2015-04-01"
stop="2015-07-01"
statcast_pitching=scrape_statcast_savant_pitcher_all(start,stop)
min(statcast_pitching$game_date)

I get this nasty error message:

URL caused a warning. Make sure your date range is correct:
Original warning message:
incomplete final line found by readTableHeader on 'https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC=&hfSea=2015%7C&hfSit=&player_type=pitcher&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&game_date_gt=2015-04-01&game_date_lt=2015-07-10&team=&position=&hfRO=&home_road=&hfFlag=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details&'
Error in scrape_statcast_savant_pitcher_all(start, "2015-07-10") :
object 'payload' not found

Is this an issue with the package itself or just Savant Search? And is there an easy work around other than scrapping smaller time frames and putting them together?

scrape savant issues

Hi,

When I run this code for A.J. Pollock, it downloads his data, but also everyone else's in the date range. Three days ago it would only download the playerid specified data.

bat <- scrape_statcast_savant(start_date = "2018-03-28", end_date = paste(Sys.Date() - 1), playerid = 572041, player_type = 'batter')

Run scoring

I know this is likely infeasible, but grabbing the score-state for each pitch (number of runs for home team, number of runs for the away team) in the scrape_statcast_savant_batter_all would be awesome.

scrape_statcast_savant_pitcher() returns batting data for pitchers

Hi!

Cool package, which I read about on Exploring Baseball Data with R!

I am having one issue: For me, the function scrape_statcast_savant_pitcher() is returning batting data for pitchers, not pitching data. I believe when building the URL the relevant part of the setting should be player_type=pitcher instead of the current player_type=batter, based on my read of the function here. When I manually make that change to the function, it returns pitching data for pitchers.

Attached is my working file in case it might help. Sorry if I overlooked something obvious.
baseballr.txt

Cheers,
Eric Tassone

New functions?

Hey @BillPetti I quietly dropped this package on CRAN today. Basically, just a method to download tables from the Baseball Databank, because I got tired of Lahman always being out of date. There's no real overlap with your package but there are a couple of functions I thought you would find useful. Feel free to use them.

I'll probably start promoting the package next week sometime. Anyhoo, feel free to close this issue, just wanted to open an invitation to use any of this stuff if you want.

scrape_statcast_savant_batter_all: No unique game_id

The data frame returned does not include a game_id, so double headers are mixed together. Here's an example:

tmp <- scrape_statcast_savant_batter_all("2016-05-07", "2016-05-07")
tmp1 <- tmp %>% filter(home_team == "BAL", away_team == "OAK", inning == 1, inning_topbot == "Top")

Bug report on missing games for Judge

I was looking for games for Aaron Judge in July and August of 2017 and found the data are missing. Here is the code I am running.

judge.data.miss <- scrape_statcast_savant_batter(start_date = "2017-07-15", end_date = "2017-8-15", batterid = 621043)
unique(judge.data.miss$game_date)

You can see that only two games for Aaron Judge come up during this stretch. But there should be more, as he only missed three games combined in these months.

scrape_statcast_savant_batter_all

Documentation for start_date and end_date reads "Format must be in Y-d-m format." This should be changed to match the function which appears to be YYYY-MM-DD.

edge_scrape error on dates > 2017

Ran the edge scrape script for any date in 2016 no problem, but I get this error when it's 2017 or 2018

Error in function (type, msg, asError = TRUE) :
Could not resolve host: writefunction

Error in loadNamespace

Hi,

I have been trying to install baseballr using R 3.3.2 on Windows, and I get the following error:

  • installing source package 'baseballr' ...
    ** R
    ** data
    *** moving datasets to lazyload DB
    ** preparing package for lazy loading
    Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
    there is no package called 'XML2R'
    ERROR: lazy loading failed for package 'baseballr'

I have tried installing the missing package separately, but every time I do this and then try installing baseballr again I get the same error but with a different package listed as missing. Any idea how to fix this?

Thanks

Imported packages do not appear to be getting attached

library(baseballr)   
team_results_bref("NYM",` 2015)
   Error` in team_results_bref("NYM", 2015) : could not find function "%>%"

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] baseballr_0.0.0.9000

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.2  

It ran OK when I loaded dplyr and rvest independently

On a side note, you might want to set the output to a tbl_df for easier handling
Good luck with package

Function viz_gb_on_period not found

Hey Bill,

I'm glad to know my PRs were merged into the master branch.
I've reinstalled the package and I'm able to run the standings_on_date_bref without problems and get the new names on the table.
Nevertheless, the function viz_gb_on_period is not available in the package. I do not know if that is related to the difference in case-sensitive names in the function.
image

Regards.

Daniel.

savant scrape no longer working

Trying to install and get error "there is no package called ‘baseballr’ "

Hi there Good evening

Iam novice R user but always follow instructions as good as possible
Trying to install baseballr I encounter the following.
PLase help Thank you

> require(devtools)

install_github("BillPetti/baseballr")
Error in curl::curl_fetch_disk(url, x$path, handle = handle) :
Couldn't resolve host name
library("devtools", lib.loc="C:/Program Files/R/R-3.2.1/library")
> install_github("BillPetti/baseballr")
Downloading GitHub repo BillPetti/baseballr@master
from URL https://api.github.com/repos/BillPetti/baseballr/zipball/master
Installing baseballr
Installing 1 package: lubridate

There is a binary version available (and will be installed) but the
source version is later:
binary source
lubridate 1.6.0 1.7.1

trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/lubridate_1.6.0.zip'
Content type 'application/zip' length 654624 bytes (639 KB)
downloaded 639 KB

package ‘lubridate’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
Installing 1 package: reldist
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/reldist_1.6-6.zip'
Content type 'application/zip' length 116008 bytes (113 KB)
downloaded 113 KB

package ‘reldist’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
Installing 1 package: rvest
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/rvest_0.3.2.zip'
Content type 'application/zip' length 853411 bytes (833 KB)
downloaded 833 KB

package ‘rvest’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
Installing 1 package: XML

There is a binary version available (and will be installed) but the
source version is later:
binary source
XML 3.98-1.6 3.98-1.9

trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/XML_3.98-1.6.zip'
Content type 'application/zip' length 4298226 bytes (4.1 MB)
downloaded 4.1 MB

package ‘XML’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
Installing 1 package: xml2
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/xml2_1.1.1.zip'
Content type 'application/zip' length 3488697 bytes (3.3 MB)
downloaded 3.3 MB

package ‘xml2’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
"C:/PROGRA1/R/R-321.1/bin/x64/R" --no-site-file --no-environ --no-save
--no-restore --quiet CMD INSTALL
"C:/Users/Javier/AppData/Local/Temp/RtmpCeAFw9/devtools430478a12f62/BillPetti-baseballr-c1f2ddf"
--library="C:/Program Files/R/R-3.2.1/library" --install-tests

  • installing source package 'baseballr' ...
    ** R
    ** data
    *** moving datasets to lazyload DB
    ** preparing package for lazy loading
    Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
    there is no package called 'tibble'
    ERROR: lazy loading failed for package 'baseballr'
  • removing 'C:/Program Files/R/R-3.2.1/library/baseballr'
    Error: Command failed (1)
    > require(baseballr)
    Loading required package: baseballr
    Warning message:
    In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
    there is no package called ‘baseballr’

plot.subtitle only available in devel. version of ggplot

The Mookie Betts example won't run on the most current version of ggplot. The culprit seems to be the plot.subtitle argument from the imported theme, which was written with the developmental version of ggplot2.

I tripped on this while trying to write a vignette from the Mookie Betts example. This isn't a big issue, but (if you plan on submitting to CRAN), it wouldn't fly in its current condition. My recomendation would be to re-write the theme function to exclude plot.subtitle and use ** in Rmarkdown to append a subtitle to the plot.

This isn't a huge deal, feel free to close the issue if you want. I just wanted you to be aware because as it sits right now, most users won't be able to run it.

Missing PO string in savant batter all scraper

Bill, not sure what happened to this edit before, I thought you added this before, but you're missing the PO and S for the playoffs and spring training in the GT portion of the URL. It's only the batter_all one, the other 3 have all 3 strings in the url.

error in installation

I keep getting this error message when trying to download therefore I cannot use the packagee

  • removing 'C:/Users/brett.zaziski/Documents/R/win-library/3.4/baseballr'
    Installation failed: Command failed (1)

Package installation issue

I am new to using R and am trying to work through this data and mess around and see what I can do, during installation of the package I will get random errors

Installation failed: NULL : 'rcmd_safe_env' is not an exported object from 'namespace:callr'

Any tips t clear this up are appreciated, thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.