kaigu1990 / stabiot Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 194 KB

Common Statistical Analysis for Clinical Trials in Biotech

License: GNU General Public License v3.0

R 100.00%

stabiot's People

Contributors

Stargazers

Watchers

stabiot's Issues

Response Rate and Odds Ratio

Create several functions below:

Response Rate and CI by Clopper-Pearson exact method
Response difference and CI by Wald asymptotic
Odds ratio without stratification
Odds ratio is from a Cochran-Mantel-Haenszel test stratified by different factors

Regarding response rate and CI, its's a question of confidence Intervals for proportions. I prefer to utilize the functions that have been built up in existing packages.

DescTools::BinomCI(x = 54, n = 100, method = "clopper-pearson")

But how about handling the stratified Wilson? A few packages wrap it up. Such as tern::s_proportion(). However, I feel like it should remain as is and not be added because it is not often utilized.

The response difference can be computed by DescTools::BinomDiffCI() as well, so just wrap this function inside.

DescTools::BinomDiffCI(60, 100, 45, 100, method=c("wald"))

Regarding odds ratio and CI, it depends on if we need to add stratified variables. If not, it's a common odds ratio calculation that can be done with the DescTools::OddsRatio function.

DescTools::OddsRatio(matrix(c(20, 22, 30, 28), nrow = 2, byrow = TRUE), 
                     method = "wald", conf.level = 0.95)

Otherwise we can use logistic regression to calculate the odds ratio as well. And then if we use confint() to obtain the CI , that is Wald's CI actually.

set.seed(12)
dta <- data.frame(
  rsp = sample(c(TRUE, FALSE), 100, TRUE),
  grp = factor(rep(c("A", "B"), each = 50), levels = c("B", "A")),
  strata = factor(sample(c("C", "D"), 100, TRUE))
)

fit <- glm(rsp ~ grp, data = dta, family = binomial(link = "logit"))
exp(cbind(Odds_Ratio = coef(fit), confint(fit)))[-1, , drop = FALSE]

If we want to compute the odds ratio from the Cochran-Mantel-Haenszel test with stratified variables, I know there will be two approaches to handle it. One is mantelhaen.test() function, another is conditional logistic regression like survival::clogit() function. Given the results from those two way is not same, I'm going to compare with SAS and refer the documents to decide which one will be wrapped. As I know, tern package use the survival::clogit() to deal with the stratified CMH analysis.

set.seed(12)
dta <- data.frame(
  rsp = sample(c(TRUE, FALSE), 100, TRUE),
  grp = factor(rep(c("A", "B"), each = 50), levels = c("B", "A")),
  strata1 = factor(sample(c("C", "D"), 100, TRUE)),
  strata2 = factor(sample(c("E", "F"), 100, TRUE)),
  strata3 = factor(sample(c("G", "H"), 100, TRUE))
)

Using mantelhaen.test() function:

df <- dta %>% 
  count(grp, rsp, strata1, strata2, strata3)
tab <- xtabs(n ~ grp + rsp + strata1 + strata2 + strata3, data = df)
tb <- as.table(array(c(tab), dim = c(2, 2, 2 * 2 *2)))
mantelhaen.test(tb, correct = FALSE)

Using conditional logistic regression:

library(survival)
fit <- clogit(formula = rsp ~ grp + strata(strata1, strata2, strata3), data = dta)
exp(cbind(Odds_Ratio = coef(fit), confint(fit)))

Simulation of Posterior Probability of Response

Create a function to handle these cases, as shown below:

Assuming that the sample size is 22, number of responders is 4, what is the response rate and corresponding 95% credible interval?

Simply by 4/22=18% to get the response rate, and qbeta(c(0.025, 0.975), shape1 = 4, shape2 = 18) to get the 95% credible interval like (5%, 36%). There is no need for an additional new function.

For the true response rate, it is assumed that the number of responders follows a binomial distribution with a weekly informative prior (Beta (0.5, 0.5)). And given that the operational characteristics are based on at least 75% posterior confidence that the true rate exceeds a threshold of interest, what is the chance of confirming a true response of at least 15% given the assumptions of various sample sizes and true response rates? It should be noted that operational characteristics are based on a Bayesian model without considering baseline stratification factors.

This logic simulation should be wrapped into a function.

It is assumed that you have 23 responders out of n=25 in IA at Week 12 visit, what's the posterior predictive probability of having at least 53 responders at Week 24 visit when the total sample number becomes N=66?

This case can be simply conducted by extraDistr::pbbinom() function, like

N = 66
n = 25
nresp = 23
a = 1 + nresp 
b = 1 + (n - nresp)
extraDistr::pbbinom(56 - nresp, size = N - n, alpha = a, beta = b, lower.tail = F)

It is assumed that you have observed the 40% response (6 out of N=15), what is the chance of confirming a target response of at least 45%?

This logic seems like can be conducted by dbeta() function, but I'm not sure.

n <- 15
a <- 0.5
b <- 0.5
nresp <- 6
tag <- 0.3
pbeta(tag, nresp + a, n - nresp + b, lower.tail = FALSE)

Create a summary function for survival analysis

I need a function to help me summarize the results from survfit() and survdiff(). For example, if the primary endpoint is PFS, I would like to know how many events and censors there are in each treatment, adding the median PFS, corresponding CI, PFS rate at different periods, and log-rank P-value. If I set the stratified factors, the output should also include the stratified results. Regarding to Cox model, I would like to know the HR and corresponding CI as well. The most important thing is that the above results should have the same method as SAS, so that this R function could be used as a QC role for monitoring.

Add s_get_surv() to summarize the essential information for the survival analysis.
- add life table if need?
Add s_get_cox() to summarize the essential information for the cox model.
Add print method for both of them.

Update few functions:

Add s_surv_rate() to analyze the survival rate.
Check the default arguments with SAS.

Create Example Data Sets - Oncology or Non-Oncology

Choose which ADaM datasets will be examples for testing.

Manually create a suitable ADaM through admiral and admiralonco package.
Import ADaM datasets from other packages straightforwardly.
Use random.cdisc.data to create randomized ADaM datasets.

Although I prefer the first option, there are too few responses in rs domain. And there are also some logic errors in the random.cdisc.data. So it seems that I have to create a dummy dataset separately.

In the current stage, preparing adrs is a priority.

MMRM R package for superiority or non-inferiority trial design

How do I conduct the MMRM analysis in R for superiority or non-inferiority trial design?

A brief summary can be seen at https://www.bioinfo-scrounger.com/archives/mmrm_hypothesis/.

So maybe a summary function could be better when I want to get a summary result. So simply wrap several steps into one function.

Well, I'm imagining that the S3 structure is more suitable. In this case, I can use the custom tidy method tidy.mmrm for mmrm summary results. Besides, how about the tidymodels package?

And I also want to wrap the emmeans functions inside, such as confint(), test() to get the CI and hypothesis testing.

The preliminary idea is to create a function called summarize_lsmeans(), or other names like s_mmrm_lsmeans() or s_get_lsmeans(). Considering the lsmeans can be computed by different models, such as mmrm, ancova and others, so we'd better define this function with corresponding classes.

Afterwards, the summarized function should return a class like stabiot.lsmeans so that users can use the tidy() function to get a tibble table for downstream analysis if needed.

Thus the steps should be these:

Create a function to integrate calculation steps, including emmeans(), contrast() and test() from emmeans.
Set up a S3 class to store results.
Create a print method for the above S3 class.
Add tests.

Best Overall Response By RECIST

The BOR calculation is a very common analysis in oncology trials for solid tumor, so it's necessary to create a function. Maybe it can be used as a supplement code or for internal analysis.

BOR of confirmation or not confirmation following RECIST guideline (version 1.1).
Consider the days between responses.
Consider SD duration.
Maybe also need to consider if we should cut the response before first PD.

Related documents are shown below.
https://www.pharmasug.org/proceedings/2023/QT/PharmaSUG-2023-QT-047.pdf

And

The following is the conversion to programming logic:

Best Overall Response without confirmation

Using investigator/IRC assessed responses after the first treatment date/randomized and up to the earliest of the first 'PD', and the start of non-protocol anti-cancer therapy (anti-cancer therapy including radiotherapy, surgery, etc.).
Set to 'CR' if there exists an observation of 'CR'
Else set to 'PR' if there exists an observation of 'PR'
Else set to 'SD' if there exists an observation of 'SD' after study protocol specified time (6 weeks from first treatment date (for non-randomized study)/randomized date).
Else set to 'PD' if there exists an observation of AVALC='PD'
Else set to 'NE' if there exists an observation of AVALC= 'NE' or if the subject has only 'SD' before the study protocol specified time (6 weeks from first treatment date (for non-randomized study) or randomized date).
Else set to ‘NA’ if the subject did not have any responses (none of the above 1–5 criteria is fulfilled) or has no post-baseline record.
Use the order CR>PR>SD>PD>NE

Best Overall Response when confirmation of response is required

Using investigator/IRC assessed responses on or after the first treatment date and up to the earliest of the first 'PD', the start of non-protocol anti-cancer therapy, and the study-specific cutoff
Set to 'CR' if patient had CR on each of two tumor assessments which are >=28 days apart with either
(i) no other tumor assessments in between or
(ii) the tumor response between these two assessments can only be CR or NE;
Else set to 'PR' if subject had two tumor assessments >=28 days apart with the first being PR and the second being PR or CR with either
(i) no other tumor assessments in between or
(ii) the tumor response between these two assessments can be PR or NE.
Else set to 'PR' if patient had a "CR” following a "PR" and there is >=4 weeks apart between the two tumor assessments.
Else set to 'SD' if there is at least one response assessment of 'CR', 'PR' or 'SD', and minimum of 6 weeks after first treatment date without any PD in between.
Else set to 'PD' if
(i) If a "PR" is after a "CR", and the assessment date of the "PR" is within 6*7 days from first treatment date. The “PR” should be PD. Same for “SD” following CR.
(ii) If a patient does not meet the criteria of CR/PR/SD and at least one response assessment is PD
Else set to 'NE' if the subject only had NE (Not Evaluable) or CR, PR, SD and did not meet any of the above criteria.
Else set to ‘NA’ if the subject did not have any responses (none of the above 1-6 criteria is fulfilled) or has no post-baseline record.

Compare datasets with a flexible way

In the QC process, we always need to compare two or more datasets to check if there is any difference. The common demand is to compare the value, for example the response is identical or not.

Currently I know there is a package diffdf can deal with this question excellently. But actually I don't need so much information to compare sometimes. Do I need to create a specific function or modify the functions of diffdf?

Update derive_bor function with assert checking and adding notes

A few updates are needed:

add assert checking for each argument and input data.
add notes to specify the ref_start_window and ref_interval demonstrating how to calculate.

Survival Simulation

Could I find out an appropriate method for survival simulation?

kaigu1990 / stabiot Goto Github PK

stabiot's People

Contributors

Stargazers

Watchers

stabiot's Issues

Response Rate and Odds Ratio

Simulation of Posterior Probability of Response

Create a summary function for survival analysis

Create Example Data Sets - Oncology or Non-Oncology

MMRM R package for superiority or non-inferiority trial design

Best Overall Response By RECIST

Best Overall Response without confirmation

Best Overall Response when confirmation of response is required

Compare datasets with a flexible way

Update derive_bor function with assert checking and adding notes

Survival Simulation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent