Randomization-based causal inference framework to analyze 16s rRNA gut microbiome data.

Causal inference framework for environment-microbiome data applied to American Gut Project (AGP) data.

Framework

Data access

American Gut Data subset [paper in preparation, Mishra and Müller 2021].

Stage 2: Design

The R code for our pair matching implementation and diagnostic plots generation can be found in the design_AG file. The matrix of 10,000 possible randomization of the intervention assignment is also generated directly after matching.

Note 1: the matching functions Stephane_matching.R were written in Rcpp by Stéphane Shao.

Note 2: other matching strategies are valid. The researcher should take the conceptual hypothetical experiment into account when choosing its strategy.

Stage 3: Analysis

The ASV (or OTU) data table and matched dataset are combined in a phyloseq object before making statistical analyses. Thus, the following code can be used for any other data combined in a phyloseq object.

Diversity

Richness and alpha-diversity

R code in 1_alpha_diversity_AG folder.

We used Amy Willis’ R packages breakaway for richness estimation [Willis and Bunge, 2015] and DivNet for Shannon index estimation [Willis, 2020].

Richness result:

estimate: 108.3931; p-value: 0.133

Shannon index result:

estimate: -0.008072164; p-value: 0.659

Beta-diversity

R code in 2_beta_diversity_AG folder.

The distance calculations where done with the phyloseq package and we used Anna Plantinga’s R package MiRKAT for the test statistic calculations [Zhao et al., 2015].

Results:

Aitchison: estimate: 822866.9; p-value(adj.): 0.002
Jaccard: estimate: 132.9856; p-value(adj.): 0.002
Gower: estimate: 0.3761873; p-value(adj.): 0.501

Compostion

Compositional equivalence

R code in 3_mean_diff_test_AG folder.

Cao, Lin, and Li’s github repository: composition-two-sampe-test [Cao, Lin, and Li, 2018].

Result:

estimate: 50.0806; p-value: 0.001

Differential abundance

R code in 4_differential_abundance_AG folder.

We use the function dacomp.test() of Barak Brill’ R package: dacomp to calculate the test statistic for all taxa at once [Brill, Amir, and Heller, 2020].

Reference set:

k_Bacteria;p_Firmicutes;c_Clostridia;o_Clostridiales;f_Lachnospiraceae;g_Dorea
k_Bacteria;p_Firmicutes;c_Clostridia;o_Clostridiales;f_Lachnospiraceae;g_NA

Results:

Genera with p-value <= 0.02.
k_Bacteria;p_Proteobacteria;c_Gammaproteobacteria;o_Enterobacteriales;f_Enterobacteriaceae;g_Raoultella
k_Bacteria;p_Firmicutes;c_Clostridia;o_Clostridiales;f_Lachnospiraceae;g_Anaerostipes
k_Bacteria;p_Proteobacteria;c_Alphaproteobacteria;o_Rickettsiales;f_mitochondria;g_Sarcandra

Correlation structure

R code in 5_networks_AG folder.

Peschel et al.’s (2020) R package NetCoMi enables the estimation and comparision of networks for compositional data.

References

[Holle et al., 2005] Holle R, Happich M, Löwel H, Wichmann HE (2005); MONICA/KORA Study Group. KORA–a research platform for population based health research. Gesundheitswesen, 67.

[Willis and Bunge, 2015] Willis A and Bunge J (2015); Estimating diversity via frequency ratios. Biometric Methodology, 71:1042-1049.

[Willis and Bryan, 2020] Willis A and Bryan DM (2020); Estimating diversity in networked ecological communities Biostatistics, kxaa015.

[Zhao et al., 2015] Zhao N, Chen J, Carroll IM et al. (2015); Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test. Am J Hum Genet., 96(5):797-807.

[Cao, Lin, Li, 2018] Cao Y, Lin W, and Li H (2018); Two-sample tests of high-dimensional means for compositional data. Biometrika, 105:115-132.

[Brill, Amir, and Heller, 2020] Brill B, Amir A, and Heller R (2020) Testing for differential abundance in compositional counts data, with application to microbiome studies.] arXiv

[Peschel et al., 2020] Peschel et al. (2020) NetCoMi: network construction and comparison for microbiome data in R. Briefings in Bioinformatics, bbaa290.

vlasovets / causal_microbiome_tutorial Goto Github PK

causal_microbiome_tutorial's Introduction

Randomization-based causal inference framework to analyze 16s rRNA gut microbiome data.

Framework

Data access

Stage 2: Design

Stage 3: Analysis

Diversity

Richness and alpha-diversity

Beta-diversity

Compostion

Compositional equivalence

Differential abundance

Correlation structure

References

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent