Comments (7)
Thanks for the post. Can you please provide a minimal reproducible example calculating the pvalue?
Can you also link to a manuscript using the test in the way you're suggesting?
from gtsummary.
Hi Daniel! Thanks for replying!
I have the r-bloggers page as an example: https://www.r-bloggers.com/2014/06/example-2014-6-comparing-medians-and-the-wilcoxon-rank-sum-test/
But this question is based on a false premise: that the the Wilcoxon rank-sum test is used to compare medians. The premise is based on a misunderstanding of the null hypothesis of the test. The actual null hypothesis is that there is a 50% probability that a random value from one population exceeds an random value from the other population. The practical value of this is hard to see, and thus in many places, including textbooks, the null hypothesis is presented as “the two populations have equal medians”. The actual null hypothesis can be expressed as the latter median hypothesis, but only under the additional assumption that the shapes of the distributions are identical in each group.
In other words, our interpretation of the test as comparing the medians of the distributions requires the location-shift-only alternative to be the case. Since this is rarely true, and never assessed, we should probably use extreme caution in using, and especially in interpreting, the Wilcoxon rank-sum test.
I've paraphrased the code on to the following:
#Import libraries
library(tidyverse)
library(gtsummary)
library(quantreg)
#Create a reproducible dataset
set.seed(123)
y1 = as_tibble(rexp(10000))
y2 = as_tibble(rnorm(10000) + log(2))
y1$group = "Y1"
y2$group = "Y2"
df = rbind(y1,y2)
#Run wilcox.test manually
wilcox.test(value ~ group, data=df)
#Show using gtsummary
df %>%
tbl_summary(
by = group,
statistic = list(all_continuous() ~ "{median} ({p25}, {p75})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = list(all_continuous() ~ c(1,1)),
) %>%
bold_labels() %>%
add_p(pvalue_fun = ~style_pvalue(.x, digits = 2))
#Run a quantile regression, for the median (tau=0.5)
qr <- rq(value ~ group, data=df, tau = 0.5)
qr
#Calculate standard errors using bootstrapping with 100 replications, and respective p-values
set.seed(123)
summary(qr, se="boot", R=100)
from gtsummary.
Maybe I'm misunderstanding the syntax for add_p?
For example, when I attempt to replace wilcoxons test with kolmorov-smirnov, which is called by "ks.test" with identical syntax, the p-value is omitted in the output
df %>%
tbl_summary(
by = group,
statistic = list(all_continuous() ~ "{median} ({p25}, {p75})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = list(all_continuous() ~ c(1,1)),
) %>%
bold_labels() %>%
add_p(pvalue_fun = ~style_pvalue(.x, digits = 2),
test = list(all_continuous() ~ "ks.test"))
Similarily, I don't see where you could add parameters to the test-call for example, which would be needed in the case for quantreg (specifying the bootstrap estimator and number of replications)
from gtsummary.
Thanks @spiralparagon for the additional context!
Frist, regarding adding a quantile regression test: unfortunately, I don't think we will add it natively in the package as it depends on the {quantreg} package and I don't want to take on more dependencies than we already have. But that doesn't mean you can't include these results in your gtsummary tables. You can include any p-value by creating a custom function: https://www.danieldsjoberg.com/gtsummary/reference/tests.html#custom-functions-1
here's an example:
library(gtsummary)
#> #BlackLivesMatter
add_p_median_reg <- function(data, variable, by, ...) {
# build regression model
quantreg::rq(
reformulate(variable, response = by),
data = data |>
tidyr::drop_na(all_of(c(variable, by))) |>
dplyr::mutate("{by}" := factor(.data[[by]]) |> as.numeric()),
tau = 0.5
) |>
# use summary to get p-value
summary(se = "boot") |>
# format results like you'd see from `broom::tidy()`
purrr::pluck(coefficients) |>
as.data.frame() |>
tail(n = 1) |>
setNames(c("estimate", "std.error", "statistic", "p.value")) |>
dplyr::mutate(method = "Median Regression")
}
add_p_median_reg(mtcars, variable = "mpg", by = "am")
#> estimate std.error statistic p.value method
#> mpg 0.05649718 0.01887302 2.993542 0.005478198 Median Regression
add_p_median_reg(trial, variable = "age", by = "trt")
#> estimate std.error statistic p.value method
#> age 0 0.007789238 0 1 Median Regression
tbl <-
trial |>
tbl_summary(
by = trt,
missing = "no",
include = c(age, marker)
) |>
add_p(test = all_continuous() ~ add_p_median_reg)
Created on 2024-04-06 with reprex v2.1.0
Second, we have not yet included support for the ks.test
(no one has requested it before. Since it's included in the base R, we can include it. To keep things clean, if you'd like, can you request support in a separate issue?
from gtsummary.
Hi Daniel!
Thank you so much for the help. I understand not wanting to increase dependencies and I appreciate your thorough example on how to implement this despite the fact.
Strangely, I don't see the expected behaviour with my previous code sample - the output of add_p_median_reg is correct on all accounts except for the p-value where it gives a 0.
library(tidyverse)
library(gtsummary)
library(quantreg)
#Create a reproducible dataset
set.seed(123)
y1 = as_tibble(rexp(10000))
y2 = as_tibble(rnorm(10000) + log(2))
y1$group = "Y1"
y2$group = "Y2"
df = rbind(y1,y2)
qr <- rq(value ~ group, data=df, tau = 0.5)
set.seed(123)
summary(qr, se="boot", R=100)
add_p_median_reg <- function(data, variable, by, ...) {
# build regression model
quantreg::rq(
reformulate(variable, response = by),
data = data |>
tidyr::drop_na(all_of(c(variable, by))) |>
dplyr::mutate("{by}" := factor(.data[[by]]) |> as.numeric()),
tau = 0.5
) |>
# use summary to get p-value, added 200 reps
summary(se = "boot", R=200) |>
# format results like you'd see from `broom::tidy()`
purrr::pluck(coefficients) |>
as.data.frame() |>
tail(n = 1) |>
setNames(c("estimate", "std.error", "statistic", "p.value")) |>
dplyr::mutate(method = "Median Regression")
}
set.seed(123)
add_p_median_reg(df, variable = "value", by = "group")
I can't see what's wrong in the code - even running a subset of the function directly gives expected result
set.seed(123)
summary(qr, se = "boot", R=200) |>
# format results like you'd see from `broom::tidy()`
purrr::pluck(coefficients) |>
as.data.frame() |>
tail(n = 1) |>
setNames(c("estimate", "std.error", "statistic", "p.value"))
I'm comparing the output of age to trt in the trial dataset
qr <- rq(age ~ trt, data=trial, tau = 0.5)
set.seed(123)
summary(qr, se="boot", R=100)
I'll create a seperate thread for ks.test. Thank you again!
from gtsummary.
After exploring this a bit looking at the model, the problem seems to be in reformulate not giving expected behaviour to the rq function.
Attempted to use paste instead, aided with the magic of chatgpt (which also had an issue with the factor to numeric conversion, not sure how accurate it is). Caveat emptor as I'm not experienced :(
add_p_median_reg <- function(data, variable, by, ...) {
# build regression model
data <- data |>
tidyr::drop_na(all_of(c(variable, by))) |>
dplyr::mutate(!!sym(by) := factor(.data[[by]]))
# Construct the formula directly
formula_str <- paste(variable, "~", by)
formula <- as.formula(formula_str)
# Build regression model using the constructed formula
quantreg::rq(
formula,
data = data,
tau = 0.5
) |>
# use summary to get p-value
summary(se = "boot", R=200) |>
# format results like you'd see from `broom::tidy()`
purrr::pluck(coefficients) |>
as.data.frame() |>
tail(n = 1) |>
setNames(c("estimate", "std.error", "statistic", "p.value")) |>
dplyr::mutate(method = "Median Regression")
}
seems to give expected behaviour
from gtsummary.
Ahh, i had flipped the dependent and independent variables in the reformulate 😱 glad you have a solution
from gtsummary.
Related Issues (20)
- v2.0 print engines
- v2.0 select helpers
- v2.0 Custom Tidiers
- v2.0 `tbl_likert()` ?
- v2.0 vignettes and articles
- Questioning `add_stat_label()`
- Add `modify_table_styling()` checks?
- Question / Feature request: type "dichotomous" option in tbl_summary not displaying both levels HOT 3
- v2.0 `modify_footnote()` updates HOT 1
- v2.0 README
- Wording in `tbl_uregression()` documentation: univariable instead of univariate? HOT 2
- Compatibility with VGAM HOT 3
- Improve `style_number()` speed?
- Feature request: Informative caption when using `tbl_regression()` with `mice::mira` HOT 2
- v2.0 Add arguments `style_*(prefix, suffix)` ?
- Feature request: Automated suppression of small counts HOT 1
- v2.0 `tbl_stack()`, `tbl_merge()`, `tbl_strata()`, etc.
- `show_header_names()` updates HOT 1
- Feature request: Extend `tbl_merge` to all `gt` tables HOT 2
- Review `tbl_custom_summary()`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gtsummary.