koalaverse / sure Goto Github PK

View Code? Open in Web Editor NEW

9.0 6.0 2.0 10.33 MB

Surrogate residuals for cumulative link and general regression models in R

Home Page: https://koalaverse.github.io/sure/index.html

R 100.00%

residuals diagnostics categorical-data ordinal-regression

sure's People

Contributors

Stargazers

Watchers

Forkers

ericagoto ugurdar

sure's Issues

Faster bootstrap code

Will need to use Rcpp or vectorize the truncated distribution functions; the latter might be the simpler approach.

Use NSE in autoplot code

This will avoid R CMD CHECK warnings about undefined globals.

Jittering on the probability scale

Jittering on the response scale (jitter.scale = "response")seems to work fine, but seems problematic on the probability scale (jitter.scale = "probability").

Add tests

Will need to add extensive tests: testthat for typical unit tests, and longer tests stored in a slowtests directory that checks the package against the examples from the JASA paper.

Extracting mean response from "polr" objects

Now using:

#' @keywords internal
getMeanResponse.polr <- function(object) {
  # qfun <- getQuantileFunction(object)
  # fv <- object$fitted.values[, 1L, drop = TRUE]
  # fv[fv == 0] <- 1e-05
  # -qfun(fv)
  object$lp - object$zeta[1L]
}

Allow more than one plot

Something like

autoplot(fit, what = c("qq", "fitted"), ncol = 2)

The easiest way seems to be storing the plots in a list and then passing them to the grobs argument in grid.arrange.

plist <- list(p1, p2)
grid.arrange(grobs = plist, ncol = ncol)

Rerun code from The R Journal article

Paper was accepted, so need to make sure there are no breaking changes!

Support for Ordinal models with with BRMS

Would it be possible to add support for ordinal models fit with the bayesian BRMS package:

# List required packages 
Pkgs <- c("tidyverse", "ordinal", "brms",  "rstan")

# Load packages ====
lapply(Pkgs, require, c = T)

# Load data
data <- ordinal::wine 

# Model formula
fmla <- bf(Rating ~ temp + contact +temp:contact + 1|judge) ) 

# Compile and fit model
mod <- brm(fmla,
            data = data,
            family = cumulative("probit"),
            inits = 0,
            iter = 2000,
            warmup = 1000,
            chains = 2,
            cores = 2,
            sample_prior = TRUE,
            save_all_pars = TRUE)
        )

Residual generation fails when some parameter is aliased

I found that an aliased parameter will throw an error while residual/surrogate calculation. I tried:

library(ordinal)
data(wine)

fmm1 <- clm(rating ~ 1 + judge + temp, data = wine)
fmm2 <- clm(rating ~ 1 + judge + bottle, data = wine)
fmm3 <- clm(rating ~ 1 + judge + temp + bottle, data = wine)
fmm4 <- clm(rating ~ 1 + judge + temp + contact, data = wine)

sure::resids(fmm3)
sure::surrogate(fmm3)

Only fmm3 produces the error:
Error in X[, -1L, drop = FALSE] %*% object$beta : non-conformable arguments

I could bypass the problem as following: I changed the code in generate_residuals from
drop(X[, -1L, drop = FALSE] %*% object$beta - object$alpha[1L])
to
drop(X[, -1L, drop = FALSE] %*% object$beta[!is.na(object$beta)] - object$alpha[1L])

This way I get residual values. I don't know if this is legit or if it's better to throw an error saying 'there was one unidentified parameter (aliased) so residuals are not valid'. Can we use the residuals even if we drop this value?

As far as I understood your code you intended to just drop the value in
if(sum(object$aliased$beta) > 0) { X <- X[, !c(FALSE, object$aliased$beta), drop = FALSE] }

Sincerely Simon

Add link to paper in DESCRIPTION

Will also need to eventually update the link and reference to Dungang's JASA paper.

Consistent plotting methods

Switch to ggplot2 for "gof" objects for consistency.

Add latent method for binary GLMs

Binary GLMs when response is not a factor

heart <- data.frame(
  ck = 0:11*40 + 20,
  ha = c(2, 13, 30, 30, 21, 19, 18, 13, 19, 15, 7, 8),
  ok = c(88, 26, 8, 5, 0, 1, 1, 1, 1, 0, 0, 0)
)
mod.0 <- glm(cbind(ha,ok) ~ ck, family = binomial(link = "logit"),
             data = heart)
sure::resids(mod.0)

Please consider upgrading to R (≥ 4.0.0)

The autoplot function doesn't work for VGAM objects (i.e., vglm and vgam fits).

Add Travis CI support

@bradleyboehmke are you familiar with Travis CI? I can write the yaml file to include, but you'll need to login with the AFIT-R GitHub account and flip the switch for the packages you want supported (ideally all of them). This will make sure that the packages are re-built and checked every time we push changes and make sure nothing is broken. We should also do the same with codecov.

Fix autoplot

autoplot is not supporting object types. Need to change auto.default to individual object methods (i.e. autoplot.polr) or find another, more efficient approach.

Plotting functions

Bring in @Andy-McCarthy's plotting functions. Following base R and ggplot2 convention, these should look similar to the following:

library(MASS)
fit <- polr(y ~ x, data = d, method = "logistic")
plot(fit, which = 1:2)  # base plotting method
autoplot(fit, which = "qq")  # ggplot2 method

is it possible to apply "sure " R package to ordinal models "svyolr"?

Dear prof. Greenwell,

I am trying to use “sure” package to make diagnostics on a logit ordinal model generated with “svyolr” function (“survey” R package”), in particular i am trying to use “resids” function, but i get the error “Error in eval(predvars, data, env) : object "MR" not found”, where “MR” is the response variable (ordered factor) of the svyolr model.

I tryed to assign the class “polr” to the svyolr model (that is based on polr function) but the error persists.

I apologize if the question is trivial but I can’t get rid of the error.

Yours sincerely,

Please find below an example.

Fabio Maria Colombo

library(sure)
library(polr)
library(survey)

#Simulated dataset
RM=ordered(c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 4, 4, 4, 3, 3, 4, 4, 4, 4))
data=data.frame(age=rnorm(mean=7,sd=2,49),
weight=rnorm(mean=10,sd=3,49),
nose.length=rnorm(49,mean=3,sd=1.5),
head.length=rnorm(49,mean=8,sd=2),
w=abs(1+trunc(rnorm(mean=2, sd=2,49))))
#survey design
svydesign<-svydesign(id=~1,data=data,weights=~w)

#svyolr model
model=svyolr(RM~nose.length+head.length,
design=svydesign
)
#resids(sure) function
resids(model)

Error in resids.default(model) :

model should be of class "clm", "glm", "lrm", "orm", "polr", "vgam", or "vglm".

class(model)='polr' #assigns the class "polr" to model
resids(model)
#Error in eval(predvars, data, env) : object "nose.length" not found
####################################################################

New data sets

Add new data sets that follow examples 5-6 in Dungang and Zhang (2017).

Coefficient of determination?

Seems that the surrogate response values can be used to develop a coefficient of determinations (i.e., R-squared) type statistic!

Make autoplot more like ggfortify::autoplot

Put surrogate function into its own package

Package name is taken on CRAN.

Multinomial fits

Jittering (at least on the response scale) does show some promise for multinomial fits:

library(nnet)
data(df1, package = "sure")
fit <- multinom(y ~ x, data = df1)
# fit <- multinom(y ~ x + I(x ^ 2), data = df1)
y <- as.integer(df1$y)
s <- runif(length(y), min = y, max = y + 1)
prob <- fit$fitted.values
j <- seq_len(ncol(prob))  # j = 1, 2, ..., J
jmat <- matrix(rep(j, times = nrow(prob)), ncol = ncol(prob),
               byrow = TRUE)  # 1, 2, ..., J; 1, 2, ..., J; ...
es <- rowSums((jmat + 0.5) * prob)
r <- s - es
scatter.smooth(df1$x, r, lpars = list(lwd = 2, col = "red"))

Incorporate jittering technique

Will need to learn all the properties of these residuals. The residual-by-covariate plots don't look right for the quadratic example.

Write helper functions for extracting the fitted probabilities

Something like

getFittedProbs <- function(object) {
  UseMethod("getFittedProbs")
}

Add support for rms package

rms has the lrm function for fitting proportional odds ordinal logistic regression models.

Missing values

Should make use of the naresid method. Something like:

if(!is.null(object$na.action)) {
  res <- naresid(object$na.action, res)
}

Proper handling of weights

Create vignette

Incorporate bulk of R Journal paper as a vignette

Consider adding a new Fitted() generic

Idea is to make it easier for users to construct their own fitted-vs-residual plots. It's capitalizes to it does not interfere with R's built-in generic and the specific packages that have methods defined.

For example:

Fitted.polr <- function(object) {
  object$lp - object$zeta[1L]  # Xb - a1
}