Giter Club home page Giter Club logo

zibr's Introduction

ZIBR (Zero-Inflated Beta Random Effect model)

R-CMD-check codecov CRAN_Status_Badge Published in Bioinformatics

Introduction

The longitudinal microbiome compositional data are highly skewed, bounded in [0,1), and often sparse with many zeros. In addition, the observations from repeated measures are correlated. We propose a two-part zero-inflated Beta regression model with random effects (ZIBR) for testing the association between microbial abundance and clinical covariates for longitudinal microbiome data. The model includes a logistic component to model presence/absence of the microbe in samples and a Beta component to model non-zero microbial abundance and each component includes a random effect to take into account the correlation among repeated measurements on the same subject.

The details of the statistical model are as follows:

The ZIBR model combines the logistic regression and Beta regression in one model. Each regression part includes random effects to account for correlations acorss time points. We call these two regressions in ZIBR model as logistic component and Beta component. These two components model two different aspects of the data. The logistic component models presence/absence of the microbe and Beta component models non-zero microbial abundance.

Accordingly, we can test three biologically relevant null hypotheses:

  • H0: α_j = 0. This is to test the coefficients in the logistic component, if the covariates are associated with the bacterial taxon by affecting its presence or absence;
  • H0: β_j = 0. This is to test the coefficients in the Beta component, if the taxon is associated with the covariates by showing different abundances;
  • H0: α_j = 0 and β_j = 0 for each covariate X_j and Z_j. This is to joinly test the coefficients in both logistic and Beta components, if the covariates affect the taxon both in terms of presence/absence and its non-zero abundance.

Installation

You can install our ZIBR package from CRAN

install.packages("ZIBR")

Or get the dev version from GitHub

#install.packages("devtools")
devtools::install_github("PennChopMicrobiomeProgram/ZIBR")
library(ZIBR)

Basic Usage

zibr_fit <- zibr(logistic_cov=logistic_cov,beta_cov=beta_cov,Y=Y,subject_ind=subject_ind,time_ind=time_ind)
  • logistic_cov: covariates for the logistic component. Rows: samples. Columns: covariates.
  • beta_cov: covariates for the Beta component. Rows: samples. Columns: covariates.
  • Y: the response variable (i.e the bacterial relative abundance). It is a vector with values in [0,1).
  • subject_ind: the variable with subject IDs.
  • time_ind: the variable with time points.

The ordering of the samples in the above matrix or vectors must be consistent.

The zibr function will return the following results

zibr_fit
  • logistic_est_table: the estimated coefficients for logistic component.
  • logistic_s1_est: the estimated standard deviation for the random effect in the logistic component.
  • beta_est_table: the estimated coefficients for Beta component.
  • beta_s2_est: the estimated standard deviation for the random effect in the Beta component.
  • beta_v_est: the estimated dispersion parameter in the Beta component.
  • joint_p: the p-values for jointly testing each covariate in both logistic and Beta component.

Troubleshooting

Missing values

If there are missing values in certain time points, they can be imputed as following:

  1. Calculate the mean or median of values from previous time point(s) and later time points(s). Use such values to replace the missing values.
  2. Group the time point with missing values with other time points. For example, if you have T1, T2, T3 and T4 and T1 has missing values, you can group T1 and T2 as one time point.

After the missing values are imputed, the data can be fed into ZIBR.

Other issues

If you have other problems with the package or features/fixes to suggest, please open an issue on the GitHub issues page.

Citation

Eric Z. Chen and Hongzhe Li (2016). A two-part mixed effect model for analyzing longitudinal microbiome data. Bioinformatics. Link

Contact

Maintained by Charlie Bushman ([email protected]) and the Penn CHOP Microbiome Program.

Updates

zibr's People

Contributors

ulthran avatar chvlyl avatar marwahaha avatar

Stargazers

 avatar  avatar Juan A. Ugalde avatar  avatar GuoHao avatar Harithaa Anandakumar avatar David Tsai avatar s.watanabe avatar  avatar  avatar  avatar ZHOU Chao  avatar  avatar  avatar Jiayi Liu avatar Subasish Das avatar Han Hu avatar Chunyu Zhao avatar Asif Rahman avatar Inti Pedroso avatar Josh Herr avatar Colin J. Brislawn avatar Ashish Damania avatar Zachary Kurtz avatar zhanxw avatar Fan Li avatar Joseph Elsherbini avatar Brian Balgley avatar Erik Clarke avatar

Watchers

James Cloos avatar  avatar Chunyu Zhao avatar  avatar Sam Dunn avatar

zibr's Issues

How to test for contrasts?

Hello,

I have an interest in your package but have an issue with the model outputs. From what I get, I can have a global estimation of each covariate effect. I can thus have a global time effect and global effect of covariate over time. Nevertheless, if any of my covariate have more than 2 groups or I'd like to have the effect of a treatment at a specific time point (i.e. performing contrasts), I'm stuck.

Did I miss something?

Thanks,

Time*treatment interactions with the ZIBR model?

Hi @chvlyl
Thank you for the development of this interesting and useful ZIBR model!
I'm trying to use four covariates (Baseline, Treatment, Time, Treatment*Time) to figure out the influence of the baseline abundance and the different microbial trend between treatments. As for the IBD demo data in the paper, should I modify the code like this?
X <- data.frame(Baseline=taxa.data[reg.cov.t234$baseline.sample, spe]/100, reg.cov.t234[,c('Time','Treat','Time.X.Treatment')])
Since the ENN treatment in Time.X.Treatment column in the reg.cov data frame is always 0, is it right to use the 'Time.X.Treatment' column as the interaction covariate directly? Or should I include the Time information for the ENN treatment in the 'Time.X.Treatment' column, like 1, 2, 3, 4, and 5, 6, 7, 8 for the 'antiTNF' treatment?
Could you help me with this problem? Any help would be much appreciated!
Thank you for your help and I look forward to hearing from you soon.
Hongbin Liu

muliple corrections

Thank you for updating the package documentation and install etc etc, much appreciated! I had previously been trying to replicate the results in your recent paper using the PLEASE data also on your repository and have now successfully done so (pretty much). I was wondering, however, about the multiple corrections. In table 2 you are displaying the joint.p values which seem to be BH corrected for an n of 3 (time, treat and baseline abundance). I was wondering if it would be necessary to correct for the number of taxa analysed * 3 or if each run of the zibr regression negates this. Anyway, really nice idea,

all the best,

PJ

Error: singularity in backsolve at level 0, block 1

Hi @chvlyl. I found your package very interesting and useful to use the ZIBR model for my microbiome project. I was trying to reproduce the steps for my longitudinal data, however, I receive the following error:

Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1

I am using the following code:

`#### Fit ZIBR model on the real data
spe.all <- colnames(taxa_data)
p.species.list.zibr <- list()
p.species.list.lme <- list()
for (spe in spe.all){

create covariates

X <- data.frame(
Baseline=taxa_data[reg.cov.t234$baseline.sample, spe]/100,
reg.cov.t234[,c('Description','Treat','Time.X.Treatment')]
)
rownames(X) <- reg.cov.t234$SampleName...1
Z <- X
subject.ind <- reg.cov.t234$SubjectID
time.ind <- reg.cov.t234$Description

cat(spe,'\n')
Y <- taxa_data[reg.cov.t234$SampleName...1, spe]/100
cat('Zeros/All',sum(Y==0),'/',length(Y),'\n')

####################

fit linear random effect model with arcsin square transformation on Y

tdata <- data.frame(Y.tran=asin(sqrt(Y)),X,SID=subject.ind)
lme.fit <- lme(Y.tran ~ Baseline + Description + Treat,random=~1| SID, data = tdata)
coef.mat <- summary(lme.fit)$tTable[-1,c(1,5)]
p.species.list.lme[[spe]] <- coef.mat[,2]

###################
if (sum(Y>0)<10 | sum(Y==0) <10 | max(Y)<0.01){
print('skip')
p.species.list.zibr[[spe]] <- p.species.list.lme[[spe]]
next
}else{
est <- zibr(logistic.cov=X,beta.cov=Z,Y=Y,
subject.ind=subject.ind,
time.ind=time.ind,
quad.n=30,verbose=TRUE)
p.species.list.zibr[[spe]] <- est$joint.p

}
#break
}`

I would really appreciate it if you can help me in understanding what is that I am doing wrong/differently. This code breaks with an error at the very last set of taxa in my dataset, meaning it runs well and gives me a list of genera but breaks from the loop for the last few genera.

Any help is appreciated, thanks!

Release ZIBR 1.0.0

First release:

Prepare for release:

  • git pull
  • usethis::use_github_links()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • git push
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('major')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Add preemptive link to blog post in pkgdown news menu
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Finish blog post
  • Tweet

Example replication

Hi, I am trying to replicate your example with my own data. The methods for getting the data into the model are extremely complicated, to the point of not being repeatable with any other data set. Do you have any plans to make this process simpler, e.g. input biom file with metadata from QIIME or metaphlan? If not, then I fear that your methodw ill not be used by other researchers.
Thanks.

T.

Can ZIBR be used with non-longitudinal data?

I have a twin metatranscriptomics dataset where I want the random factor to be the familyID. Would ZIBR be applicable in this setting? The counts would be int the form of pathway RPK values.

jointp = 0 when logistic and beta p > 0

I am able to reproduce your examples and get sensible results for most of my analyses, but I have one model with joint p = 0, despite the individual p-values being much larger than 0. I have 4 covariates in the model; this is only an issue for one of the 4. It is also the only covariate that is continuous (the others are factors, coded with integers).

Update: I've just tried running zibr on a different set of covariates, and it looks like this issue comes up when p is quite high for the covariate in at least one of the two component regression models.

Cannot replicate your analysis

I cannot replicate your posted analysis. I get this error message "Error in p.species.list.lme : object 'p.species.list.lme' not found"

Following that I keep getting error messages with no success in replicating your results.

nested and/or crossed random effect

Hi,

I am trying to fit a model for a data among which the response value is within [0,1). ZIBR can do exactly what I need. However, my data also contains both nested factors and crossed factors, which means I need to add nested random effects and crossed random effects to both logistic model part and beta model model part. From my understanding by far ZIBR can only fit repeated measures, so I am just wondering if there is anyway to handle crossed/nested random effects using ZIBR? Or will it be feasible to modify the existing code to do that?

Thank you very much for your help!

Best regards

availability on CRAN and/or conda

I know that ZIBR and the associated manuscript is relatively new, but I was wondering whether the package will be made available on CRAN and/or conda anytime in the near future. My research lab uses conda for software installation/management, which greatly helps with data analysis reproducibility. Some of us are interested in using ZIBR, but the lack of availability on CRAN/conda makes it hard to integrate ZIBR into our bioinformatics software management setup.

Sorry if ZIBR is already on CRAN and/or conda and I've just missed it.

Prevalence

I have a question concerning the prevalence of OTUs when using the ZIBR. It would be great if someone can help me with that issue.
Looking at the example code, the prevalence of zeros are defined as:

  1. there are too few observations larger than 0, OR
  2. there are too few observation that are zero, OR
  3. all measurements are below a threshold 0.01 then it is skipped.
    I’m guessing that there have to be enough observations to both model the probability of being a zero AND enough observations to model the probability whatever happens if it is NOT zero.

In consequence, does that mean that the model kicks out all OTUs, which are prevalent in 100 percent of the samples to fulfil the assumtions for the logistic regression? Would that make sense from a biological point of view? In my longitudinal data set, I'm still very interested in changes of abundance of OTUs that are present in all individuals of two groups.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.