dgrtwo / ebbr Goto Github PK
View Code? Open in Web Editor NEWEmpirical Bayes binomial estimation
License: Other
Empirical Bayes binomial estimation
License: Other
Sorry this is not really an issue but I did not know of a better way to contact you.
I'm working through your book Empirical Bayes: Examples from Baseball Statistics! and it's a huge blast and a lot of fun but I've been having a hard time extending the concepts in the book and ebbr past success/total analyses.
I can extrapolate the book and package to something like k% (strikeouts/batters) or SwStr% (pitches swung at/pitches thrown) for pitchers but start to get conceptually tripped up for more complex, composite metrics like wOBA. Can ebbr be applied to metrics like this, if not, would you be able to point me in the direction of something in the same vein?
Also, do you have any idea of the implications of using ebb_fit_prior() fitted values in composite metrics?
Hello,
I try to add prior_subset to add_ebb_estimate in combination with gamlss as below:
eb_career_ab <- career %>%
add_ebb_estimate(H, AB, method = "gamlss",
prior_subset = AB >= 500,
mu_predictors = ~ log10(AB))
Although this gives this error:
Error in lm.wfit(X[onlydata, , drop = FALSE], y, w) :
incompatible dimensions
In addition: Warning message:
`data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.
Alternatively I did succeed to fit the prior with ebb_fit_prior on a subset of the data, followed by augment(prior, newdata = full_dataset) on the complete dataset.
Thanks.
Is there a reason this was never released to CRAN?
When debugging the error below it seems to be caused by the call parameters <- broom::tidy(fit)
in ebb_fit_prior()
.
# recreating chapter 11 of David Robinson's Introduction to Empricial Bayes
library(Lahman)
#> Warning: package 'Lahman' was built under R version 4.1.3
library(ebbr)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.2
#> Warning: package 'ggplot2' was built under R version 4.1.2
#> Warning: package 'tibble' was built under R version 4.1.2
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
#> Warning: package 'purrr' was built under R version 4.1.2
#> Warning: package 'dplyr' was built under R version 4.1.2
#> Warning: package 'stringr' was built under R version 4.1.2
#> Warning: package 'forcats' was built under R version 4.1.2
theme_set(theme_light())
# grab career batting average of non-pitchers
pitchers <-
Pitching %>%
group_by(playerID) %>%
summarise(gamesPitched = sum(G)) %>%
filter(gamesPitched > 3)
# add player names
player_names <-
People %>%
tibble %>%
select(playerID, nameFirst, nameLast, bats) %>%
unite(name, nameFirst, nameLast, sep = " ")
career_full <-
Batting %>%
filter(AB > 0) %>%
anti_join(pitchers, by = "playerID") %>%
group_by(playerID) %>%
summarise(H = sum(H), AB = sum(AB), year = mean(yearID)) %>%
inner_join(player_names, by = "playerID") %>%
filter(!is.na(bats))
career <-
career_full %>%
select(-bats, -year)
# solve this with beta-binomial regression
eb_career_ab <-
career %>%
ebb_fit_prior(H, AB, method = "gamlss",
mu_predictors = ~ log10(AB))
#> Warning in summary.gamlss(x): summary: vcov has failed, option qr is used instead
#> ******************************************************************
#> Family: c("BB", "Beta Binomial")
#>
#> Call:
#> gamlss::gamlss(formula = form, family = fam, data = tbl, sigma.predictors = sigma_predictors)
#>
#>
#> Fitting method: RS()
#>
#> ------------------------------------------------------------------
#> Mu link function: logit
#> Mu Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.694982 0.009005 -188.23 <2e-16 ***
#> log10(AB) 0.193192 0.002768 69.79 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> ------------------------------------------------------------------
#> Sigma link function: log
#> Sigma Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -6.3316 0.0225 -281.3 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> ------------------------------------------------------------------
#> No. of observations in the fit: 10240
#> Degrees of Freedom for the fit: 3
#> Residual Deg. of Freedom: 10237
#> at cycle: 5
#>
#> Global Deviance: 74715.37
#> AIC: 74721.37
#> SBC: 74743.07
#> ******************************************************************
#> Error: $ operator is invalid for atomic vectors
Created on 2022-05-31 by the reprex package (v2.0.1)
Hi, I'm trying to make a custom function to make ebbr_fit_prior estimates on many columns in a dataframe.
I'm running into a lot of issues in passing variable column names to ebbr when it is in a custom function - I've tried using !!as.symbol() but some people on reddit said this might not work given R's base NSE that you might be using to build this code.
Can you suggest a way by which to do this?
`library(tidyverse)
library(Lahman)
library(ebbr)
career <- Batting %>%
filter(AB > 0) %>%
anti_join(Pitching, by = "playerID") %>%
group_by(playerID) %>%
summarize(H = sum(H), AB = sum(AB)) %>%
mutate(average = H / AB)
#this works
career %>%
ebbr::ebb_fit_prior(H, AB)
#function that i can use to make a bunch of estimates
make_eb_estimate = function(data, success, total, method = "mle"){
fitted = data %>%
ebb_fit_prior(x = success, n = total, method = method) %>%
augment() %>%
.$.fitted
}
#this does not work
career %>%
make_eb_estimate("H", "AB")`
warning messages suggest the following:
data_frame()
was deprecated in tibble 1.1.0.
Please use tibble()
instead.
`
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Welcome at Sat Dec 01 13:41:05 2018
library(devtools)
install_github("dgrtwo/ebbr")
Downloading GitHub repo dgrtwo/ebbr@master
checking for file 'C:\Users\Jan\AppData\Local\Temp\RtmpchhTfy\remotes19f824eb4e27\dgrtwo-ebbr-4b9747d/DESCRIPTION' ...
checking for file 'C:\Users\Jan\AppData\Local\Temp\RtmpchhTfy\remotes19f824eb4e27\dgrtwo-ebbr-4b9747d/DESCRIPTION' ...
√ checking for file 'C:\Users\Jan\AppData\Local\Temp\RtmpchhTfy\remotes19f824eb4e27\dgrtwo-ebbr-4b9747d/DESCRIPTION'
preparing 'ebbr':
checking DESCRIPTION meta-information ...
checking DESCRIPTION meta-information ...
√ checking DESCRIPTION meta-information
checking for LF line-endings in source and make files and shell scripts
checking for empty or unneeded directories
building 'ebbr_0.1.tar.gz'
Welcome at Sat Dec 01 13:41:33 2018
Welcome at Sat Dec 01 13:41:54 2018
Goodbye at Sat Dec 01 13:41:55 2018
ERROR: loading failed for 'i386'
Using ebbr
for the first time, on a reasonably clean R 3.3.3. install, the calculation fails on the missing gamlss.data package
> trainer_sr_bbr <- trainer_sr %>%
+ ebbr::add_ebb_estimate(wins, runs, method = "gamlss", mu_predictors = ~ log10(runs))
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called ‘gamlss.data’
The ebbr
package installs fine and I notice gamlss
and gamlss.dist
are Imports in the DESCRIPTION file. Should gamlss.data
also be added here?
add_ebb_estimate
works correctly after manually installing gamlss.data
package.
Attempting to install ebbr
this morning on a Windows 7 machine, I encountered a build error on the pysch
package.
> devtools::install_github("dgrtwo/ebbr")
Downloading GitHub repo dgrtwo/ebbr@master
from URL https://api.github.com/repos/dgrtwo/ebbr/zipball/master
Installing ebbr
"C:/PROGRA~1/R/R-33~1.3/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL \
"C:/Users/phillc/AppData/Local/Temp/RtmpMVzQV6/devtools1dcc44f5442d/dgrtwo-ebbr-4b9747d" \
--library="C:/Program Files/R/R-3.3.3/library" --install-tests
* installing *source* package 'ebbr' ...
** R
** tests
** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called 'psych'
ERROR: lazy loading failed for package 'ebbr'
* removing 'C:/Program Files/R/R-3.3.3/library/ebbr'
Error: Command failed (1)
This is a reasonably clean R install, with a very small collection of external libraries installed. On my main Linux machine, running R 3.3.2, which has many additional libraries installed, ebbr
installed fine.
Manually installing psych
package fixes the issue and ebbr
installs subsequently without error.
I notice psych
is not listed as an Import or Suggest in the DESCRIPTION file.
When all observations (k) are 0, ebbr fails with an unhelpful error message:
> ebbr::add_ebb_estimate(data.frame(k=rep(0, 10), n=sample(100, 10)), k ,n)
Error in if (!all(lower <= start & start <= upper)) { :
missing value where TRUE/FALSE needed
If at least one observation is >1 it does work fine:
> ebbr::add_ebb_estimate(data.frame(k=c(1, rep(0, 9)), n=sample(100, 10)), k ,n)
k n .alpha1 .beta1 .fitted .raw .low .high
1 1 30 1.5202724 293.3042 0.005156533 0.03333333 3.829061e-04 0.015922746
2 0 74 0.5202724 338.3042 0.001535522 0.00000000 1.958268e-06 0.007564209
3 0 52 0.5202724 316.3042 0.001642147 0.00000000 2.094575e-06 0.008088588
4 0 97 0.5202724 361.3042 0.001437914 0.00000000 1.833525e-06 0.007084076
5 0 95 0.5202724 359.3042 0.001445906 0.00000000 1.843738e-06 0.007123393
6 0 58 0.5202724 322.3042 0.001611626 0.00000000 2.055554e-06 0.007938499
7 0 41 0.5202724 305.3042 0.001701212 0.00000000 2.170101e-06 0.008379020
8 0 39 0.5202724 303.3042 0.001712411 0.00000000 2.184422e-06 0.008434081
9 0 38 0.5202724 302.3042 0.001718066 0.00000000 2.191654e-06 0.008461884
10 0 44 0.5202724 308.3042 0.001684686 0.00000000 2.148968e-06 0.008297763
I'm running into errors using the example code in the documentation for the ebb_fit_mixture
function.
First, by_row
appears to be from purrrlyr
, which is not loaded at the top of the example. But even with this loaded, I get the following error:
# simulate some data
set.seed(2017)
sim_data <- data_frame(cluster = 1:2,
alpha = c(30, 35),
beta = c(70, 15),
size = c(300, 700)) %>%
by_row(~ rbeta(.$size, .$alpha, .$beta)) %>%
unnest(p = .out) %>%
mutate(total = round(rlnorm(n(), 5, 2) + 1),
x = rbinom(n(), total, p))
mm <- ebb_fit_mixture(sim_data, x, total)
Error: `.x` must be a vector, not a function
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
All elements of `...` must be named.
Did you want `data = c(id, x, n)`?
I ran last_error()
and got:
<error/purrr_error_bad_type>
`.x` must be a vector, not a function
Backtrace:
1. ebbr::ebb_fit_mixture(sim_data, x, total)
24. purrr:::stop_bad_type(...)
Run `rlang::last_trace()` to see the full context.
Any tips on how to proceed?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.