mfasiolo / qgam Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 7.0 11.82 MB

Additive quantile regression R package

Home Page: http://mfasiolo.github.io/qgam/

R 97.92% C 2.08%

qgam's People

Contributors

Stargazers

Watchers

Forkers

minghao2016 davidruegamer guhjy lutliu ndevln seoncheolpark waternumbers

qgam's Issues

Allow for vector-valued err parameter.

At the moment we allow only for scalar err.

Send message when multicore == TRUE and control$loss = "calFast"

We need to tell users that multicore is useless when using the sandwich calibration

Find alternative to na.omit

Currently qgam calls na.omit( data ), but this is a serious problem because data might contain a column full of NAs, which is not included in the fit. Hence excluding the corresponding rows does not make sense.

Gamlss logF calibration

At the moment gamlss calibration is messed up, because I am dealing with the intercept in the wrong way. This is collapse when a factor variable is used.

Determining lambda under heteroscedasticity

At the moment lambda = err * sqrt(2pivarHat) / (2*log(2)*exp(lsig[ii])) where varHat is a vector, under a variable scale model. The problem is that lsig is a scalar, while in the model it is actually a function of the covariates. Hence the error bound is incorrect.

The proper solution is that, under logFlss, varHat should be an input of the logFlss family. Then lambda is calculated internally, with vector-valued lsig. Importantly, the derivatives of the log-likelihood w.r.t. lsig must be corrected, because now changes in lsig cause changes in lambda, which in turn changes the log-lik.

Error when fixing smoothing parameters

I would like to fix smoothing parameters to calculate the proportion of deviance explained by different model terms (like here), but I get the following error:

set.seed(123)
library(qgam)
x1 <- runif(100)
x2 <- runif(100)
y <- x1 + x2 + rnorm(100)

m1 <- qgam(y ~ s(x1) + s(x2), data = data.frame(y = y, x1 = x1, x2 = x2), 0.1)
m2 <- qgam(y ~ s(x2), data = data.frame(y = y, x2 = x2), 0.1, argGam = list(sp = m1$sp[2]))

Error in gam(formula = y ~ s(x2), family = elf(qu = NA, co = NA, theta = NA, :
formal argument "sp" matched by multiple actual arguments

Is there another way to specify "sp" arguments for qgam?

Thank you for your time,
Adrian

Sandwich estimator

Add Bayesian sandwich in order to get better credible intervals for regression coefficients.

Fails to compile on Linux with "Mutex creation failed" error

Trying to install qgam on Manjaro Linux 17.1.12 fails:

> install.packages("qgam")
Installing package into ‘/home/joeroe/R/x86_64-pc-linux-gnu-library/3.5’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/qgam_1.2.2.tar.gz'
Content type 'application/x-gzip' length 927339 bytes (905 KB)
==================================================
downloaded 905 KB

* installing *source* package ‘qgam’ ...
** package ‘qgam’ successfully unpacked and MD5 sums checked
** libs
gcc -I"/usr/include/R/" -DNDEBUG   -D_FORTIFY_SOURCE=2   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt  -c init.c -o init.o
gcc -I"/usr/include/R/" -DNDEBUG   -D_FORTIFY_SOURCE=2   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt  -c mgcv_wrap.c -o mgcv_wrap.o
gcc -shared -L/usr/lib64/R/lib -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -o qgam.so init.o mgcv_wrap.o -L/usr/lib64/R/lib -lR
installing to /home/joeroe/R/x86_64-pc-linux-gnu-library/3.5/qgam/libs
** R
** data
** inst
** byte-compile and prepare package for lazy loading
terminate called after throwing an instance of 'std::runtime_error'
  what():  Mutex creation failed
/usr/lib64/R/bin/INSTALL: line 34: 29302 Done                    echo 'tools:::.install_packages()'
     29303 Aborted                 (core dumped) | R_DEFAULT_PACKAGES= LC_COLLATE=C "${R_HOME}/bin/R" $myArgs --slave --args ${args}
Warning in install.packages :
  installation of package ‘qgam’ had non-zero exit status

The downloaded source packages are in
	‘/tmp/RtmpzHREJU/downloaded_packages’

I assume this is related to the release of glibc 2.28-1, which also affected the later package (issues #45 and #63, now fixed). In which case it will also be a problem on Arch Linux and Alpine.

Better initialisation to avoid divergence of smoothing parameter

require(RhpcBLASctl); blas_set_num_threads(1) #optional

data("UKload")

qu <- 0.5
form <- NetDemand~s(wM,k=20,bs='cr') + s(wM_s95,k=20,bs='cr') + 
  s(Posan,bs='ad',k=30,xt=list("bs"="cc")) + Dow + s(Trend,k=4) + NetDemand.48 + Holy

# Do library(debug); mtrace(tuneLearn); 
# and then mtrace(newton) after for( ii in 1:nt ) 

library(qgam)
require(RhpcBLASctl); blas_set_num_threads(1) #optional
qu <- 0.05
tic <- proc.time()
set.seed(41241)
sigSeq <- seq(3, 8, length.out = 16)
closs1 <- tuneLearn(form = form, data = UKload, err = 0.15,
                   lsig = sigSeq, qu = qu, control = list("K" = 2))
proc.time() - tic
#check(closs1)

# At the first iteration you will see

# Initial smoothing parameters
7.571046   9.556191   2.441944 -19.327599 -12.368925   5.065998   4.854984  -2.279645 

# After one iteration lsp get to 
-1.630176   1.364197 -12.974725 -19.327599 -12.840063 -11.929365  -9.659406 -11.917420

# with gradient
143.748873878   6.825546046   0.367822900   0.005996531  -0.019127832  -0.226958093  -0.786281196  13.154424694

# Hessian: very concave along the second variable!
[,1]          [,2]         [,3]         [,4]          [,5]          [,6]          [,7]          [,8]
[1,]  2.458789e+01  8.412941e+00 -0.011584408 8.621974e-05  0.0029556391 -0.0265641062 -0.1539408394 -0.3165983797
[2,]  8.412941e+00 -5.877177e+00 -0.001159426 3.529038e-06 -0.0001722144  0.0004438109 -0.0186866835 -0.1314088233
[3,] -1.158441e-02 -1.159426e-03  1.095528270 3.340906e-03  0.4568092625  0.0038351018  0.4755544119  0.0046309245
[4,]  8.621974e-05  3.529038e-06  0.003340906 6.018966e-03  0.0047966777  0.0002868956  0.0006440296  0.0000427432
[5,]  2.955639e-03 -1.722144e-04  0.456809262 4.796678e-03  1.9886273956  0.5466783024  0.0944302443  0.0183346497
[6,] -2.656411e-02  4.438109e-04  0.003835102 2.868956e-04  0.5466783024  0.3882664949  0.3151293611  0.0581675198
[7,] -1.539408e-01 -1.868668e-02  0.475554412 6.440296e-04  0.0944302443  0.3151293611  3.0543169197  0.2696595281
[8,] -3.165984e-01 -1.314088e-01  0.004630925 4.274320e-05  0.0183346497  0.0581675198  0.2696595281  7.4095616814

# Eig values
26.7637733  7.4210911  3.1796840  0.9925382  0.3475061 -8.0462096

# The hessian is not PD hence the following code flips the negative eigenvalue.
grad <- c(143.7488739, 6.8255460, 0.3678229, 0.005996531, -0.019127832, -0.2269581, -0.7862812, 13.1544247)

grad1 <- c(143.7488739, 6.8255460, 0.3678229, -0.2269581, -0.7862812, 13.1544247) 

hess1 <- matrix(c(24.58788695,  8.4129414625, -0.011584408, -0.0265641062, -0.15394084, -0.316598380,
           8.41294146, -5.8771772964, -0.001159426,  0.0004438109, -0.01868668, -0.131408823,
           -0.01158441, -0.0011594263,  1.095528270,  0.0038351018,  0.47555441,  0.004630925,
           -0.02656411,  0.0004438109,  0.003835102,  0.3882664949,  0.31512936,  0.058167520,
           -0.15394084, -0.0186866835,  0.475554412,  0.3151293611,  3.05431692,  0.269659528,
           -0.31659838, -0.1314088233,  0.004630925,  0.0581675198,  0.26965953,  7.409561681), 6, 6)

## get the trial step ...
eh <- eigen(hess1,symmetric=TRUE)
d <- eh$values;U <- eh$vectors
## set eigen-values to their absolute value - heuristically appealing
## as it avoids very long steps being proposed for indefinte components,
## unlike setting -ve e.v.s to very small +ve constant...
ind <- d < 0
pdef <- if (sum(ind)>0) FALSE else TRUE ## is it positive definite? 
d[ind] <- -d[ind] ## see Gill Murray and Wright p107/8
low.d <- max(d)*.Machine$double.eps^.7
ind <- d < low.d
if (sum(ind)>0) pdef <- FALSE ## not certain it is positive definite
d[ind] <- low.d 
ind <- d != 0
d[ind] <- 1/d[ind]

# This produces the following step: the second component is going in the wrong direction 
# (increasing, rather than decreasing)
-drop(U%*%(d*(t(U)%*%grad1))) # (modified) Newton direction
# -6.0315446  2.2039636 -0.4772446  0.4242607  0.2126593 -2.0296309

# This moves ends up lowering (negative) REML quite a lot. Probably because the first coefficient
# update (-6.03) corresponds to precipitation, while the incriminated update (2.203) corresponds to
# smoothed precipitation. Maybe they are quite correlated and get traded off each other?

# Now the smoothing parameters are
[1]  -6.630176   3.191227 -13.370349 -19.327599 -12.840063 -11.577664  -9.483116 -13.599933

# While the old were
-1.630176   1.364197 -12.974725 -19.327599 -12.840063 -11.929365  -9.659406 -11.917420

# The gradient is now
[1]  4.024835175  0.001849281  0.146872205  0.004788172  0.031598356  0.118494166 -0.287925519  3.607569377

# and the hessian 
[,1]          [,2]         [,3]         [,4]          [,5]          [,6]          [,7]          [,8]
[1,] 3.3560882713  8.002191e-04 4.697608e-03 4.038460e-05  7.289772e-03  1.344963e-02  6.638055e-02  4.888544e-02
[2,] 0.0008002191 -1.842555e-03 6.647676e-06 3.294869e-08 -1.114174e-05 -7.135799e-05 -3.043383e-05 -3.971898e-06
[3,] 0.0046976082  6.647676e-06 5.926385e-01 3.131827e-03  4.160236e-01  3.672006e-03  3.751701e-01  3.499639e-04
[4,] 0.0000403846  3.294869e-08 3.131827e-03 4.818144e-03  5.765253e-03  3.440575e-04  8.994476e-04  9.959019e-06
[5,] 0.0072897719 -1.114174e-05 4.160236e-01 5.765253e-03  2.007663e+00  5.485817e-01  8.480139e-02  2.516710e-03
[6,] 0.0134496271 -7.135799e-05 3.672006e-03 3.440575e-04  5.485817e-01  8.691455e-01  2.940997e-01  1.705993e-02
[7,] 0.0663805549 -3.043383e-05 3.751701e-01 8.994476e-04  8.480139e-02  2.940997e-01  3.495033e+00  6.174737e-02
[8,] 0.0488854368 -3.971898e-06 3.499639e-04 9.959019e-06  2.516710e-03  1.705993e-02  6.174737e-02  3.995061e+00

# So the objective is kind of flat and negative definite across the 2nd parameter. Still the grad[2] has the 
# right sign. Now this check
uconv.ind <- uconv.ind & abs(grad)>max(abs(grad))*.001 
# gives
[1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# Hence the second lsp is stuck where it is.
# Next iteration is kind of similar (similar gradient and hessian) and 2nd lsp fails the uconv.ind test and
# remains stuck.

# In the end convergence is achieved, that is this is FALSE
abs(old.score-score)>score.scale*conv.tol
# where the gradient is 
[1] 5.936495e-06 2.892744e-03 5.867093e-04 5.995174e-04 1.633661e-03 2.032578e-04 5.398356e-04 1.390044e-04
# and score.scale = 19303.56. Notice that here
score.scale <- abs(log(b$scale.est)) + abs(score)
# but log(b$scale.est) = 0, which might explain why convergence is slower when the response is normalized.
# Maybe we should set scale.est to the learning rate sigma?
# The final hessian is 

[,1]          [,2]         [,3]         [,4]          [,5]          [,6]          [,7]          [,8]
[1,] 9.906413e-01  9.528050e-05 2.017215e-03 7.067377e-06  2.865816e-02  3.390677e-02  9.211675e-02  2.808279e-02
[2,] 9.528050e-05 -2.851802e-03 2.235050e-06 3.550656e-09 -2.025883e-05 -9.484876e-05 -6.078402e-05 -5.010377e-05
[3,] 2.017215e-03  2.235050e-06 8.990583e-02 2.509880e-04  2.426490e-01  7.070839e-04  1.842829e-01  2.548583e-05
[4,] 7.067377e-06  3.550656e-09 2.509880e-04 6.004118e-04  1.184680e-03  3.313392e-05  2.237754e-04  3.348968e-07
[5,] 2.865816e-02 -2.025883e-05 2.426490e-01 1.184680e-03  2.828701e+00  4.866797e-01  1.712816e-01  6.120458e-04
[6,] 3.390677e-02 -9.484876e-05 7.070839e-04 3.313392e-05  4.866797e-01  4.255299e-01  2.428750e-01  2.172774e-03
[7,] 9.211675e-02 -6.078402e-05 1.842829e-01 2.237754e-04  1.712816e-01  2.428750e-01  4.008821e+00  1.476809e-02
[8,] 2.808279e-02 -5.010377e-05 2.548583e-05 3.348968e-07  6.120458e-04  2.172774e-03  1.476809e-02  9.706237e-01

potential conflict between doParallel / foreach and qgam

I just noticed a strange behaviour when loading the package qgam.

The following MWE works as expected:


R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.1

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

attached base packages:
[1] parallel  stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] doParallel_1.0.16 iterators_1.0.13  foreach_1.5.1     qgam_1.3.3        mgcv_1.8-36       nlme_3.1-152

####

library(doParallel)

no.cluster <- parallel::detectCores() - 2

cl <- parallel::makeForkCluster(no.cluster)
doParallel::registerDoParallel(cl)


foreach(i = seq(1,3,1)) %dopar% {
  
 say <- "hello"
 
}

However, when I load qgam, ich get the following error:

 library(qgam) 
 library(doParallel)
 
 no.cluster <- parallel::detectCores() - 2
 
 cl <- parallel::makeForkCluster(no.cluster)
 doParallel::registerDoParallel(cl)
 
 
 foreach(i = seq(1,3,1)) %dopar% {
  
  say <- "hello"
  
 }

> Error in serialize(data, node$con) : error writing to connection

It seems like, qgam opens any worker connection in the background

Bracket size in tuneLearnFast()

At the moment the function detects when the solution is close to the margins of the bracket, but it cannot be trusted 100%. It is not clear how close to the margins the brent() method can get, hence it is difficult to define a clear margin of safety (not the margin of safety is represented by aTol = 0.05).

Problems might arise if the bracket size brac gets small, so that its size is comparable to tol.

Adding k check to check.qgam

At the moment check.qgam only reports the effective degrees of freedom and
their maximum value (k'). To see whether k is big enough, gam.check performs
a test that does not quite apply to quantile regression. Maybe we can develop
a check for k under quantile regression.

small typo in vignette?

Should it be
wM_s95[i] = a*wM[i] + (1-a)*wM_s95[i-1] instead of
wM_s95[i] = a*wM[i] + (1-a)*wM_s95[i]
?

qgam/vignettes/qgam.Rmd

Line 368 in e2fde72

 - `wM_s95` exponential smooth of `wM`, that is `wM_s95[i] = a*wM[i] + (1-a)*wM_s95[i]` with `a=0.95`. 

qgam() gives error if only parametric terms included in the model

This code gives and error

`x <- rnorm(1e2)
y <- x^2 + rnorm(1e2)
qgam(y~x, qu = 0.5, data = data.frame("x" = x, "y" = y))

Error in if (sum(fix.ind)) { : argument is not interpretable as logical
`

Determining lambda for gamlss

Lambda depends on sigma, hence in logFlss we need a vector valued lambda. A vector of values
for sigma could come from gaulss fit.

Initialization of Gamlss models not working

This should be solved at mgcv level by Simon.

Parallelize tuneLearn() and tuneLearnFast()

Error when fitting qgam with only intercept included

This triggers an error:

set.seed(5125)
library(qgam)
x<-runif(100)
y<-x+rnorm(100)

qgam(y~1,data=data.frame(x=x,y=y),qu=0.1)

Memory error in PIRLS with zero weights?

This code:

library(qgam)
data("UKload")
form <- NetDemand~s(wM,k=20,bs='cr') + s(wM_s95,k=20,bs='cr') +
  s(Posan, k=30, bs = "ad") + Dow + s(Trend,k=4,bs='cr') +
  NetDemand.48 + Holy

qu <- 0.7
fit <- qgam(form = form, data = UKload, qu = qu, err = 0.05)

occasionally generates a warning. But sometimes it doesn't. Given that nothing is random here, this makes me think that there might be a memory error in the C code. This does not happen if we set err = 0.1.

Expand family in qgam

This is a request for an enhancement: It would be great if qgam would have the same range of families as family.mgcv in gam, e.g. the beta regression family to fit true % data.

More careful calculation of penalized EDF

The calculation of the optimal loss smoothness requires calculating the number of effective degrees of freedom. In .getErrParam we currently do

# Fixing dimension d to EDF of Gaussian fit. 
# First use anova to find degrees of freedom of parametric terms in equation for location
# Then find EDF of smooth terms in equation for location. 
# unique() needed for "adaptive" smooths
  anv <- anova( gFit )
  d <- sum( anv$pTerms.df[ !grepl("\\.1", rownames(anv$pTerms.table)) ] )
  d <- d + sum( unique(pen.edf(gFit)[!grepl("s\\.1|te\\.1|ti\\.1|t2\\.1", names(pen.edf(gFit)))]) )

Using unique() avoids counting the edf of the same effect multiple times when using adaptive smooths and tensor product smooth, but might be problematic when using by-factor smooths, for example.

Add ... to most function, in order to pass stuff to gam()

Better dynamic bracket size in tuneLearnFast()

The way this function shrinks the bracket size is not entirely satisfactory at the moment.
In particular, the methods often shrinks the bracket when fitting central quantiles, but the optimal
log(sigma) oscillates quite a lot when extreme quantiles are considered. Hence, using a narrow bracket
for extreme quantiles is inefficient.

In addition, the algorithm does not recognise immediately that the bracket is too narrow, because it
does not look at derivative at the limits of the bracket. What happens is that, when the bracket is too narrow, the algorithms slowly moves toward the one end of the bracket, wasting quite a lot of function evaluations in that way.

Wrong when bootstrapping lambda in Gamlss

TuneLearn and TuneLearnFast ended up using the lambda (which is a vector) used by the main fit, without taking into account that they are using different data when bootstrapping! This results in a total mess when the heteroscedasticity is strong.

Modify bootstrapping when model includes random effects

Currently bootstrapping can fail when random effects are included in the model. If a subject has very few associated observations, it might be left out the training or test set which results in an error. There should be a ways of resampling that makes sure that every subject appears in every bootstrap dataset. The same problem might occur in cross-validation.

number of items to replace is not a multiple of replacement length -> mObj issue?

Hi,

When trying to fit the below model, I get the following error message:

data <- readRDS("data.rds")
qgam(f1 ~ region + s(time,k=5,by=region) + s(speaker,bs="re") + s(time,speaker,bs="fs",k=5,m=1),data=data,qu=.5)

Error in X[, object$smooth[[k]]$first.para:object$smooth[[k]]$last.para] <- Xfrag : 
  number of items to replace is not a multiple of replacement length
In addition: Warning messages:
1: In gam.side(sm, X, tol = .Machine$double.eps^0.5) :
  model has repeated 1-d smooths of same variable.
2: In gam.side(sm, X, tol = .Machine$double.eps^0.5) :
  model has repeated 1-d smooths of same variable.
3: In gam.side(sm, X, tol = .Machine$double.eps^0.5) :
  model has repeated 1-d smooths of same variable.
4: contrasts dropped from factor region

The first three of those warnings are correct and no problem, the fourth one is suspicious because the Gaussian fit does not have any missing coefficients for region, as far as I can see. The actual error happens in predict.gam, which qgam calls from here:

qgam/R/tuneLearnFast.R

Line 251 in 7f5fc92

pMat <- predict.gam(mObj, newdata = data, type = "lpmatrix")

The error occurs when predict.gam is working on the factor smooth. At some point it is doing this:

Xfrag <- PredictMat(object$smooth[[k]], data)
X[, object$smooth[[k]]$first.para:object$smooth[[k]]$last.para] <- Xfrag

where, for some reason, Xfrag has the expected 160 columns (there are 160 speakers in my dataset), but object$smooth[[k]]$first.para:object$smooth[[k]]$last.para has only 159 elements. But running predict.gam manually on the gausFit object works without error, so this does not seem to be a bug in mgcv itself, but rather an an interaction with the fake model structure in the mObj object set up by qgam.

Data can be downloaded here: https://surfdrive.surf.nl/files/index.php/s/KXMxHGeSlNmXD3a

tuneLearnFast() crashes when "optimizer" = "efs"

This is probably a problem with the initialization of the smoothing parameters.

Prior weights under bootstrapping

Bootstrap calibration currently uses re-weighting, but does not take into account prior weights (that is the vector of weights provided by the user).

Error for small datasets - A term has fewer unique covariate combinations than specified maximum degrees of freedom

Hi Matteo,

in DHARMa, for small datasets, I have the problem that qgam may fail with

Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom

I realize that this goes back to mgcv, but in any case I was wondering: is there anything smarter one could do about this than just trying to catch the error?

Why does mgcv not fall back on a simpler spline automatically if there are not enough data points? I believe that, from a user viewpoint, it would be preferable that mgcv fits a simple line for small data, over getting an error.

Cheers,
Florian

Update tau in internal functions

In the paper I have changed all the taus to 1-tau, it makes sense to do the same in the code, to be consistent.

Stability for low n / low unique predictor values

Hi Matteo,

in DHARMa, I still have problems with the stability of qgam, for low n, or low unique values of the predictors, e.g. florianhartig/DHARMa#172

I'm not sure if the problem is in the quantiles, the gam, or the combination. The only thing I can say is that the package I used before (quantreg) was more stable in these cases (although it may be that the results were not very sensible).

In any case, maybe you can have an eye towards that during development, and / or I wonder if anything could be done. Do you think it would help / make sense to increase the penalty when the model crashes?

Best,
Florian

weights in qgam

Hi Matteo,
I am running a two-step analysis:

I estimate average marginal effects (i.e. derivatives at the response scale) from several binomial model (fitted with mgcv::gam()).
I want to estimate how these average marginal effects are affected by two predictors. The distribution of the average marginal effects has very heavy tails, so I thought quantile regression (i.e. qgam with qu = 0.5) may be a good option.

Now, I am struggling with how I can accurately propagate the uncertainty in these estimates. Option 1 would be to weight with 1/var, but I don't think this is accurate in a quantile setting. Option 2 would be to use the distribution of the posterior samples of the average marginal effects directly that I simulated to get the variances. So I tested this with qgam but then I realized that qgam doesnt downweight the repeated samples per average marginal effect when using weight = 1/nsamples. I tested this in glmmTMB, where the weights are directly acting on the likelihood, which would be what I would need for qgam too.

Any idea what I could do instead? Thanks for you help...

Here an illustrative example of what I am looking for:

# data generation ---------------------------------------------------------
set.seed(1)
dat = mgcv::gamSim(1, n = 30)
dat1 = dat[rep(1:nrow(dat), each = 10), ]

I simply repeat each observation 10 times in the second dataset.

# qgam --------------------------------------------------------------------

mod = qgam(y ~ x0 + x1 + x2
           , data = dat
           , qu = 0.5
           )
mod1 = qgam(y ~ x0 + x1 + x2
          , data = dat1
          , qu = 0.5
          , argGam = list(weights = rep(1/10, nrow(dat1)))
          )
summary(mod)
summary(mod1)

My hope was that both models result in the same effective sample size, but apperently they don't and the CI are more narrow for mod1.

# glmmTMB -----------------------------------------------------------------

mod = glmmTMB(y ~ x0 + x1 + x2
              , data = dat
)
mod1 = glmmTMB(y ~ x0 + x1 + x2
               , data = dat1
               , weights = rep(1/10,  nrow(dat1)) 
)
summary(mod)
summary(mod1)

For glmmTMB, the downweighting works and both models produce the same result.

random effects in qgam

Hi, can we add random effects to qgam? Just like s(random, bs = "re") in mgcv.

Add REML and ML version of qgam and mqgam

Given that the calibration is slow, someone might want to use REML and ML which are faster.

Error for small n (if (abs(old.score - score) > score.scale * conv.tol))

Hi Matteo,

I've been considering to switch from to qgam for the plots in DHARMa (see florianhartig/DHARMa#53), but I have the problem that for small n, I tend to get the error message

Error in if (abs(old.score - score) > score.scale * conv.tol) { : 
  missing value where TRUE/FALSE needed

Any ideas if this can be avoided?

Better parallelization for tuneLearnFast()

In tuneLearn each time a worker is called it creates a bootstrap datasets, and it uses it for all values of sigma. So each core creates only one copy of the data.

In tuneLearnFast the bootstrapped objects are created centrally and the whole list is passed to each node. So, if we suppose that we have N cores and N bootstrap datasets, the memory requirements are N times bigger in tuneLearnFast than in tuneLearn.

It would be better to export a specific bootstrapped object only to one specific core, which will have to handle it. Not sure clusterExport can be used to export objects to a specific core of a cluster.

Change parameter tau for desired quantile.

Ideally the user will specify the quantile not tau. Not clear what I should do internally (I don't want to change everything).

simulation from qgam posterior

I do not see any current method for simulating from a qgam posterior distribution. In the meantime, any recommendations for approximating such sampling? Perhaps after some exploratory work in qgam, try to run the model through gam directly? Thanks for the wonderful package.

Avoid gam.setup() in Extended Gam calibration

Most of the time is currently spent by gam.setup() which creates the model.

undesirable effect of do.call

Hi Matteo, after looking into the code, maybe the do.call("gam", list(parameters)) may have undesirable effects. Typically, the last element of a normal gam object is only the call object, but by do.call we find a list of function object, formula and the training dataset.
Are these elements necessary for other functions ?

qgam/R/tuneLearn.R

Line 119 in 721bb83

gausFit <- do.call("gam", c(list("formula" = form, "data" = data), argGam))

logFlss outer newton convergence

When using tuneLearn or tuneLearnFast with logFlss, I often get complaints like:

log(sigma) = 0.428571 : outer Newton did not converge.

This happens even for not-so-extreme quantiles (say qu = 0.4).

Notice that this are simpleWarnings, so they are seen only if gam() gets called within
a withCallingHandlers() call.

Maybe it makes sense to mtrace() mgcv:::newton and see what is happening?

Add offset to shashlss and gamlss

The loglikelhood function of these families currently throws an error is this is used.

mfasiolo / qgam Goto Github PK

qgam's People

Contributors

Stargazers

Watchers

Forkers

qgam's Issues

Recommend Projects

Recommend Topics

Recommend Org