dgrtwo / ebbr Goto Github PK

View Code? Open in Web Editor NEW

69.0 69.0 13.0 166 KB

Empirical Bayes binomial estimation

License: Other

R 100.00%

ebbr's People

Stargazers

Watchers

Forkers

wavelets magielbruntink ibayer vishalbelsare mmcgowan13 theslav1959 sopkaki agasax jan-glx yanliangs positivesumdata jaseziv

ebbr's Issues

Extending ebbr past binomial distributions

Sorry this is not really an issue but I did not know of a better way to contact you.

I'm working through your book Empirical Bayes: Examples from Baseball Statistics! and it's a huge blast and a lot of fun but I've been having a hard time extending the concepts in the book and ebbr past success/total analyses.

I can extrapolate the book and package to something like k% (strikeouts/batters) or SwStr% (pitches swung at/pitches thrown) for pitchers but start to get conceptually tripped up for more complex, composite metrics like wOBA. Can ebbr be applied to metrics like this, if not, would you be able to point me in the direction of something in the same vein?

Also, do you have any idea of the implications of using ebb_fit_prior() fitted values in composite metrics?

prior_subset not working in combination with method = "gamlss"

Hello,

I try to add prior_subset to add_ebb_estimate in combination with gamlss as below:

eb_career_ab <- career %>%
  add_ebb_estimate(H, AB, method = "gamlss",
                    prior_subset = AB >= 500,
                   mu_predictors = ~ log10(AB))

Although this gives this error:

Error in lm.wfit(X[onlydata, , drop = FALSE], y, w) : 
  incompatible dimensions
In addition: Warning message:
`data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.

Alternatively I did succeed to fit the prior with ebb_fit_prior on a subset of the data, followed by augment(prior, newdata = full_dataset) on the complete dataset.

Thanks.

CRAN?

Is there a reason this was never released to CRAN?

ebb_fit_prior error when using beta-binomial regression

When debugging the error below it seems to be caused by the call parameters <- broom::tidy(fit) in ebb_fit_prior().

# recreating chapter 11 of David Robinson's Introduction to Empricial Bayes

library(Lahman)
#> Warning: package 'Lahman' was built under R version 4.1.3
library(ebbr)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.2
#> Warning: package 'ggplot2' was built under R version 4.1.2
#> Warning: package 'tibble' was built under R version 4.1.2
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
#> Warning: package 'purrr' was built under R version 4.1.2
#> Warning: package 'dplyr' was built under R version 4.1.2
#> Warning: package 'stringr' was built under R version 4.1.2
#> Warning: package 'forcats' was built under R version 4.1.2
theme_set(theme_light())

# grab career batting average of non-pitchers
pitchers <- 
  Pitching %>% 
  group_by(playerID) %>% 
  summarise(gamesPitched = sum(G)) %>% 
  filter(gamesPitched > 3)

# add player names
player_names <- 
  People %>% 
  tibble %>% 
  select(playerID, nameFirst, nameLast, bats) %>% 
  unite(name, nameFirst, nameLast, sep = " ")

career_full <- 
  Batting %>% 
  filter(AB > 0) %>% 
  anti_join(pitchers, by = "playerID") %>% 
  group_by(playerID) %>% 
  summarise(H = sum(H), AB = sum(AB), year = mean(yearID)) %>% 
  inner_join(player_names, by = "playerID") %>% 
  filter(!is.na(bats))

career <- 
  career_full %>% 
  select(-bats, -year)

# solve this with beta-binomial regression
eb_career_ab <- 
  career %>% 
  ebb_fit_prior(H, AB, method = "gamlss",
                mu_predictors = ~ log10(AB))
#> Warning in summary.gamlss(x): summary: vcov has failed, option qr is used instead
#> ******************************************************************
#> Family:  c("BB", "Beta Binomial") 
#> 
#> Call:  
#> gamlss::gamlss(formula = form, family = fam, data = tbl, sigma.predictors = sigma_predictors) 
#> 
#> 
#> Fitting method: RS() 
#> 
#> ------------------------------------------------------------------
#> Mu link function:  logit
#> Mu Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -1.694982   0.009005 -188.23   <2e-16 ***
#> log10(AB)    0.193192   0.002768   69.79   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ------------------------------------------------------------------
#> Sigma link function:  log
#> Sigma Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  -6.3316     0.0225  -281.3   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ------------------------------------------------------------------
#> No. of observations in the fit:  10240 
#> Degrees of Freedom for the fit:  3
#>       Residual Deg. of Freedom:  10237 
#>                       at cycle:  5 
#>  
#> Global Deviance:     74715.37 
#>             AIC:     74721.37 
#>             SBC:     74743.07 
#> ******************************************************************
#> Error: $ operator is invalid for atomic vectors

^{Created on 2022-05-31 by the reprex package (v2.0.1)}

ebbr doesn't allow for custom functions

Hi, I'm trying to make a custom function to make ebbr_fit_prior estimates on many columns in a dataframe.

I'm running into a lot of issues in passing variable column names to ebbr when it is in a custom function - I've tried using !!as.symbol() but some people on reddit said this might not work given R's base NSE that you might be using to build this code.

Can you suggest a way by which to do this?

`library(tidyverse)
library(Lahman)
library(ebbr)

career <- Batting %>%
filter(AB > 0) %>%
anti_join(Pitching, by = "playerID") %>%
group_by(playerID) %>%
summarize(H = sum(H), AB = sum(AB)) %>%
mutate(average = H / AB)

#this works
career %>%
ebbr::ebb_fit_prior(H, AB)

#function that i can use to make a bunch of estimates
make_eb_estimate = function(data, success, total, method = "mle"){
fitted = data %>%
ebb_fit_prior(x = success, n = total, method = method) %>%
augment() %>%
.$.fitted
}

#this does not work
career %>%
make_eb_estimate("H", "AB")`

change data_frame() references to tibble()

warning messages suggest the following:
data_frame() was deprecated in tibble 1.1.0.
Please use tibble() instead.

Cannot install on Windows 7 Home Premium under latest R (3.5.1), using latest devtools and install_github

`
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Welcome at Sat Dec 01 13:41:05 2018

library(devtools)
install_github("dgrtwo/ebbr")
Downloading GitHub repo dgrtwo/ebbr@master

checking for file 'C:\Users\Jan\AppData\Local\Temp\RtmpchhTfy\remotes19f824eb4e27\dgrtwo-ebbr-4b9747d/DESCRIPTION' ...

√ checking for file 'C:\Users\Jan\AppData\Local\Temp\RtmpchhTfy\remotes19f824eb4e27\dgrtwo-ebbr-4b9747d/DESCRIPTION'

preparing 'ebbr':
checking DESCRIPTION meta-information ...

checking DESCRIPTION meta-information ...

√ checking DESCRIPTION meta-information

checking for LF line-endings in source and make files and shell scripts
checking for empty or unneeded directories
building 'ebbr_0.1.tar.gz'

Welcome at Sat Dec 01 13:41:33 2018

installing source package 'ebbr' ...
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
converting help for package 'ebbr'
finding HTML links ... done
add_ebb_estimate html
add_ebb_prop_test html
ebb_fit_mixture html
ebb_fit_prior html
ebb_mixture_tidiers html
ebb_prior_tidiers html
h html
logLik.ebb_mixture html
logLik.ebb_prior html
model.frame.ebb_prior html
print.ebb_mixture html
print.ebb_prior html
reexports html
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64

Welcome at Sat Dec 01 13:41:54 2018

Goodbye at Sat Dec 01 13:41:55 2018
ERROR: loading failed for 'i386'

removing 'C:/Program Files/R/R-2.13.1/library/ebbr'
In R CMD INSTALL
Error in i.p(...) :
(converted from warning) installation of package ‘C:/Users/Jan/AppData/Local/Temp/RtmpchhTfy/file19f8474d1c01/ebbr_0.1.tar.gz’ had non-zero exit status

Failure on Missing gamlss.data package

Using ebbr for the first time, on a reasonably clean R 3.3.3. install, the calculation fails on the missing gamlss.data package

> trainer_sr_bbr <- trainer_sr %>%
+   ebbr::add_ebb_estimate(wins, runs, method = "gamlss", mu_predictors = ~ log10(runs))
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  there is no package called ‘gamlss.data’

The ebbr package installs fine and I notice gamlss and gamlss.dist are Imports in the DESCRIPTION file. Should gamlss.data also be added here?

add_ebb_estimate works correctly after manually installing gamlss.data package.

Install Fails on Missing 'psych' Package

Attempting to install ebbr this morning on a Windows 7 machine, I encountered a build error on the pysch package.

> devtools::install_github("dgrtwo/ebbr")
Downloading GitHub repo dgrtwo/ebbr@master
from URL https://api.github.com/repos/dgrtwo/ebbr/zipball/master
Installing ebbr
"C:/PROGRA~1/R/R-33~1.3/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL  \
  "C:/Users/phillc/AppData/Local/Temp/RtmpMVzQV6/devtools1dcc44f5442d/dgrtwo-ebbr-4b9747d"  \
  --library="C:/Program Files/R/R-3.3.3/library" --install-tests 

* installing *source* package 'ebbr' ...
** R
** tests
** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
  there is no package called 'psych'
ERROR: lazy loading failed for package 'ebbr'
* removing 'C:/Program Files/R/R-3.3.3/library/ebbr'
Error: Command failed (1)

This is a reasonably clean R install, with a very small collection of external libraries installed. On my main Linux machine, running R 3.3.2, which has many additional libraries installed, ebbr installed fine.

Manually installing psych package fixes the issue and ebbr installs subsequently without error.

I notice psych is not listed as an Import or Suggest in the DESCRIPTION file.

ebbr fails with unhelpful error when all observations are 0

When all observations (k) are 0, ebbr fails with an unhelpful error message:

> ebbr::add_ebb_estimate(data.frame(k=rep(0, 10), n=sample(100, 10)), k ,n)

 Error in if (!all(lower <= start & start <= upper)) { : 
  missing value where TRUE/FALSE needed

If at least one observation is >1 it does work fine:

> ebbr::add_ebb_estimate(data.frame(k=c(1, rep(0, 9)), n=sample(100, 10)), k ,n)
   k  n   .alpha1   .beta1     .fitted       .raw         .low       .high
1  1 30 1.5202724 293.3042 0.005156533 0.03333333 3.829061e-04 0.015922746
2  0 74 0.5202724 338.3042 0.001535522 0.00000000 1.958268e-06 0.007564209
3  0 52 0.5202724 316.3042 0.001642147 0.00000000 2.094575e-06 0.008088588
4  0 97 0.5202724 361.3042 0.001437914 0.00000000 1.833525e-06 0.007084076
5  0 95 0.5202724 359.3042 0.001445906 0.00000000 1.843738e-06 0.007123393
6  0 58 0.5202724 322.3042 0.001611626 0.00000000 2.055554e-06 0.007938499
7  0 41 0.5202724 305.3042 0.001701212 0.00000000 2.170101e-06 0.008379020
8  0 39 0.5202724 303.3042 0.001712411 0.00000000 2.184422e-06 0.008434081
9  0 38 0.5202724 302.3042 0.001718066 0.00000000 2.191654e-06 0.008461884
10 0 44 0.5202724 308.3042 0.001684686 0.00000000 2.148968e-06 0.008297763

ebb_fit_mixture error

I'm running into errors using the example code in the documentation for the ebb_fit_mixture function.

First, by_row appears to be from purrrlyr, which is not loaded at the top of the example. But even with this loaded, I get the following error:

# simulate some data
set.seed(2017)
sim_data <- data_frame(cluster = 1:2,
                       alpha = c(30, 35),
                       beta = c(70, 15),
                       size = c(300, 700)) %>%
  by_row(~ rbeta(.$size, .$alpha, .$beta)) %>%
  unnest(p = .out) %>%
  mutate(total = round(rlnorm(n(), 5, 2) + 1),
         x = rbinom(n(), total, p))

mm <- ebb_fit_mixture(sim_data, x, total)

Error: `.x` must be a vector, not a function
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
All elements of `...` must be named.
Did you want `data = c(id, x, n)`?

I ran last_error() and got:

<error/purrr_error_bad_type>
`.x` must be a vector, not a function
Backtrace:
  1. ebbr::ebb_fit_mixture(sim_data, x, total)
 24. purrr:::stop_bad_type(...)
Run `rlang::last_trace()` to see the full context.

Any tips on how to proceed?

dgrtwo / ebbr Goto Github PK

ebbr's People

Stargazers

Watchers

Forkers

ebbr's Issues

Extending ebbr past binomial distributions

prior_subset not working in combination with method = "gamlss"

CRAN?

ebb_fit_prior error when using beta-binomial regression

ebbr doesn't allow for custom functions

change data_frame() references to tibble()

Cannot install on Windows 7 Home Premium under latest R (3.5.1), using latest devtools and install_github

Failure on Missing gamlss.data package

Install Fails on Missing 'psych' Package

ebbr fails with unhelpful error when all observations are 0

ebb_fit_mixture error

Automatically run CI independently of commits (e.g. monthly)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent