frederickhuanglin / ancombc Goto Github PK

View Code? Open in Web Editor NEW

94.0 8.0 23.0 445 KB

Differential abundance (DA) and correlation analyses for microbial absolute abundance data

Home Page: https://www.nature.com/articles/s41467-020-17041-7

R 95.65% TeX 4.35%

ancom differential-abundance-analysis microbiome normalization sequencing correlation secom ancombc ancombc2

ancombc's Introduction

ANCOMBC

Author & Maintainer: Huang Lin: [email protected] & [email protected]

ANCOMBC is a package containing differential abundance (DA) and correlation analyses for microbiome data. Specifically, the package includes Analysis of Compositions of Microbiomes with Bias Correction 2 (ANCOM-BC2), Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC), and Analysis of Composition of Microbiomes (ANCOM) for DA analysis, and Sparse Estimation of Correlations among Microbiomes (SECOM) for correlation analysis. Microbiome data are typically subject to two sources of biases: unequal sampling fractions (sample-specific biases) and differential sequencing efficiencies (taxon-specific biases). Methodologies included in the ANCOMBC package are designed to correct these biases and construct statistically consistent estimators.

To install the latest release version of ANCOMBC

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("ANCOMBC")

Usage

library(ANCOMBC)
?ancombc2
?ancombc 
?ancom
?secom_linear
?secom_dist

Commonly asked questions

1. Q: What are the differences between the formula and group arguments in ancombc and ancombc2?

A: The formula and group arguments serve different purposes in the ancombc and ancombc2 functions. Here's a breakdown of their differences:

formula: This argument is used to specify the variables in your experiment that can potentially influence microbial abundances. It is essential to include all relevant variables in the formula to ensure proper adjustment and accurate results. For example, if you have a continuous variable like age as your main variable of interest, and you have additional categorical variables that need adjustment but are not directly related to your research question, you can include them in the formula while leaving group as NULL.
group: The group argument is optional and should only be specified if you want to detect structural zeros (presence/absence test) or perform multi-group comparisons, such as the global test, pairwise directional test, Dunnett's type of test, or trend test. If your variable of interest is a categorical variable with more than three levels and you want to conduct multi-group comparisons, you should include the group argument. It is important to note that group is not the same as main_var in ancom. In ancombc and ancombc2, group is used for multi-group comparisons and correction of p-values for multiple comparisons.

Remember not to include the main_var in the adj_formula in ancom, but always include group in the formula or fix_formula (in ancombc and ancombc2, respectively) if group is not NULL. This ensures that the appropriate adjustments and comparisons are made in the analysis.

2. Q: Why are some taxa absent from the primary results?

A: There are a couple of reasons why certain taxa may be absent from the primary results in ancombc or ancombc2. Here's an explanation:

Prevalence Exclusion: Taxa with prevalences below the specified threshold (prv_cut) will be excluded from the analysis. The prv_cut value determines the minimum prevalence required for a taxon to be considered in the analysis. If a taxon's prevalence falls below this threshold, it will not be included in the primary results.
Structural Zeros: Taxa that exhibit structural zeros, meaning they consistently have zero counts across all samples, will be considered significant only by the presence/absence test. The ANCOM-BC and ANCOM-BC2 methodologies are not designed to detect significant differences in taxa with structural zeros. As a result, these taxa are summarized separately and not included in the primary results of ancombc or ancombc2.

To access the results of the presence/absence test, you can refer to the zero_ind output. This will provide information on the taxa that exhibit structural zeros.

3. Q: In the primary results, what do lfc_(Intercept), lfc_groupB, and lfc_groupC represent if I have a group variable with categories A, B, and C?

A: In the primary results, the terms lfc_groupB and lfc_groupC represent the log fold changes (logFC) relative to the reference group, which is group A by default. These logFC values indicate the difference in abundance between group B and group A, and between group C and group A, respectively.

On the other hand, lfc_(Intercept) refers to the log fold change of the grand mean, which may not be a parameter of particular interest in this context.

It's worth mentioning that if you wish to change the reference group, you can use the factor function in R to rearrange the levels of the group variable accordingly.

4. Q: I encountered an error message stating "'rank' must be a value from 'taxonomyRanks()'. What does it mean and how can I resolve it?

A: The error message "'rank' must be a value from 'taxonomyRanks()'" typically occurs when the rank names in your tax_table are not properly labeled. In order to resolve this issue, it is recommended to use the taxonomyRanks(se) function, where se is your tse object.

Firstly, ensure that the rank names in your tax_table are correctly named as one of the standard taxonomic ranks, such as "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", or "Species". If the rank names are currently labeled as something else, such as "ta1", "ta2", "ta3", and so on, you will need to update them accordingly.

This issue commonly occurs when a tax_table is formed from a data.frame instead of a matrix. Therefore, it's important to ensure that the rank names are correctly assigned before proceeding with the analysis.

Once you have updated the rank names in your tax_table, you can utilize the ancombc2 function, specifying the desired taxonomic level using the tax_level argument (e.g., "Genus"). This will enable you to perform statistical analyses on your microbiome data at the specified taxonomic level.

5. Q: I encountered an issue while using rand_formula in ancombc2. What is the correct syntax for specifying random effects?

A: When specifying random effects using rand_formula in ancombc2, it is important to follow the syntax conventions used in the lmerTest package. Pay close attention to the placement of parentheses and vertical bars.

To correctly specify a random subject effect, the syntax should be in the form of "(1|subjid)", where subjid represents the variable name for the subject identifier. This syntax ensures that the random subject effect is properly accounted for in the analysis.

On the other hand, it is incorrect to use rand_formula as "1|subjid" or "(subjid)" for specifying random effects. The correct syntax should always include parentheses around the random effect and a vertical bar to separate it from the fixed effects.

By using the correct syntax for specifying random effects, you will be able to accurately incorporate these effects into your ancombc2 analysis.

6. Q: What are the differences between the primary results and the results of Dunnett's type of test in ANCOM-BC2?

A: The primary results and the results of Dunnett's type of test in ANCOM-BC2 provide information on differentially abundant taxa, but there are differences in the correction of p-values.

In the primary results, the p-values are corrected across taxa, meaning that they account for multiple comparisons among different taxa. This correction helps control the false positive rate when determining the significance of individual taxa.

On the other hand, Dunnett's type of test not only corrects the p-values across taxa but also corrects for multiple comparisons between groups. Specifically, it compares the abundance of each taxon in groups B and C with the reference group A. The correction for multiple comparisons in Dunnett's type of test results in a more conservative outcome, reducing the likelihood of false positive results.

Therefore, while both the primary results and Dunnett's type of test provide information on differentially abundant taxa, the results of Dunnett's type of test offer additional control for multiple comparisons, making them more conservative and reliable.

7. Q: Can the ancombc or ancombc2 function handle interaction terms in the analysis?

A: Unfortunately, the inclusion of interaction terms in the fix_formula argument of ancombc or ancombc2 can lead to complexities and potential confusion in the multi-group comparisons. To address this, it is recommended to manually create the interaction term of interest outside of the formula and perform the analysis accordingly.

By manually creating the interaction term, you can ensure that the analysis accurately captures the interaction effect between variables. Once the interaction term is created, you can include it in the fix_formula argument or any other relevant part of the analysis, depending on your specific research question and design.

8. Q: Can the ANCOM-BC methodology be applied to other data types such as functional abundances, RNA-seq, or single-cell RNA data?

A: The ANCOM-BC methodology can be applied to other data types as long as they are considered compositional. However, it is essential to be aware that the methodology has been primarily benchmarked and validated using microbiome data. For more discussions, you can refer to this post.

9. Q: What does "not a positive definite matrix" mean in fitting the ancombc2 mixed effects model? How can I debug this issue?

A: The error message "not a positive definite matrix" indicates that the correlation matrix used in the mixed effects model is not positive definite. A positive definite matrix is a square matrix where all eigenvalues are positive. This error typically occurs when there is an issue with the data or model specification.

To debug this issue, I recommend fitting a mixed effects model to your RAW data using the lmerTest package in R. Use the same fixed effects and random effects specifications that you used in the ancombc2 function. By fitting the model directly, you may receive more informative error messages that can help diagnose the problem.

Here are the steps you can follow to debug the issue:

Install and load the lmerTest package in R: install.packages("lmerTest") and library(lmerTest).
Prepare your data in its raw format without any transformation or preprocessing.
Specify the fixed effects and random effects in the model formula, similar to what you used in ancombc2.
Fit the mixed effects model using the lmer() function from the lmerTest package.
Check if the model fitting process encounters any errors or warnings. These messages can provide valuable insights into the issue.
Analyze the error or warning messages to identify the underlying problem. It could be related to the data structure, model specification, or potential collinearity among variables.
Address the issue based on the information provided in the error or warning messages. This may involve revising the model specification, examining the data for anomalies, or resolving any collinearity issues.

By following these steps, you can gain a better understanding of the problem causing the "not a positive definite matrix" error and take appropriate actions to address it.

If you continue to encounter difficulties or need further assistance, it may be helpful to seek advice from statisticians or experts in your specific field of research.

10. Q: If a higher LFC value corresponds to a larger abundance, why did some of my OTU/ASV counts show opposite directions?

A: It's important to note that the log-fold change (LFC) values in the context of ANCOM-BC or ANCOM-BC2 do not directly reflect the relative abundances (such as proportions) or observed abundances (such as OTU or ASV counts). The LFC values represent the difference in bias-corrected abundances between groups.

In ANCOM-BC or ANCOM-BC2, a higher LFC value indicates a larger difference in bias-corrected abundances between groups. However, this does not necessarily mean that the group with higher LFC has a higher relative abundance or larger observed counts for a specific OTU or ASV.

11. Q: Can you give me a more complicated example of performing ANCOM-BC2 trend test?

A: For example, when using the trend test with a group variable of 5 ordered categories (A, B, C, D, E) in R, we are actually estimating 4 contrasts, which are (B-A, C-A, D-A, E-A). Testing the trend of A < B < C < D < E is equivalent to testing 0 < B - A < C - A < D - A < E - A. Therefore, we can specify the contrast matrix as follows:

# B-A    C-A     D-A    E-A
    1      0      0     0
    -1     1      0     0
    0     -1     1      0
    0     0     -1      1

In R, it should be

matrix(c(1, 0, 0, 0,
  -1, 1, 0, 0,
   0, -1, 1, 0,
   0, 0, -1, 1),
   nrow = 4, 
   byrow = TRUE)

For more in-depth discussions, you can refer to this post.

12. Q: OMG, I am still very confused at structural zeros. What are they? What do the struc_zero and neg_lb arguments do?

A: A taxon is considered to have structural zeros in some (>=1) groups if it is completely or nearly completely absent in those groups. For example, if there are three groups, g1, g2, and g3, and the counts of taxon A are 0 in g1 but non-zero in g2 and g3, taxon A will be considered to contain structural zeros in g1. In this scenario, taxon A is declared to be differentially abundant between g1 and g2, g1 and g3, and is consequently globally differentially abundant with respect to the group variable. Such taxa are not further analyzed using ANCOM-BC or ANCOM-BC2, but the results are summarized in the zero_ind. You can treat the detection of structural zeros as performing a presence/absence test.

The detection of structural zeros is based on a separate paper, ANCOM-II. Specifically, setting neg_lb = TRUE indicates that both criteria stated in section 3.2 of ANCOM-II are used to detect structural zeros. Alternatively, setting neg_lb = FALSE will only use equation 1 in section 3.2 of ANCOM-II to declare structural zeros, making it a more conservative approach. As the OTU/ASV table is usually very sparse, it is recommended to choose neg_lb = FALSE to prevent false discoveries. However, if you have a more dense table such as a family level table with a sufficiently large sample size, using neg_lb = TRUE may be a better idea. It is important to note that neg_lb has no function if struc_zero is set to FALSE. Therefore, there are three possible combinations: struc_zero = FALSE (regardless of neg_lb), struc_zero = TRUE, neg_lb = FALSE, or struc_zero = TRUE, neg_lb = TRUE.

ancombc's People

Contributors

Stargazers

Watchers

ancombc's Issues

unused argument (prv_cut = 0.1)

Hi Dr. Lin,

I was running the example code in your vignettes and I got an error from ## 4.1 Run ancombc function:

out = ancombc(phyloseq = family_data, formula = "age + region + bmi_group", p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, group = "region", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5, max_iter = 100, conserve = TRUE, alpha = 0.05, global = TRUE),

the error message is :

Error in ancombc(phyloseq = family_data, formula = "age + region + bmi_group", : unused argument (prv_cut = 0.1),

I could not debug this error and all codes in the vignettes before ## 4.1 ran well, so I don't think I have missed something in the previous codes.

Here are my session info:

`R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggforce_0.3.3 limma_3.50.3 corrplot_0.92 DT_0.23
[5] ANCOMBC_1.4.0 qwraps2_0.5.2 magrittr_2.0.3 microbiome_1.16.0
[9] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4
[13] readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6
[17] tidyverse_1.3.1 phyloseq_1.38.0

loaded via a namespace (and not attached):
[1] Rtsne_0.16 colorspace_2.0-3 ggsignif_0.6.3
[4] ellipsis_0.3.2 XVector_0.34.0 fs_1.5.2
[7] rstudioapi_0.13 farver_2.1.0 ggpubr_0.4.0
[10] fansi_1.0.3 lubridate_1.8.0 xml2_1.3.3
[13] codetools_0.2-18 splines_4.1.2 knitr_1.39
[16] polyclip_1.10-0 ade4_1.7-19 jsonlite_1.8.0
[19] nloptr_2.0.0 broom_0.8.0 cluster_2.1.3
[22] dbplyr_2.1.1 BiocManager_1.30.18 compiler_4.1.2
[25] httr_1.4.3 backports_1.4.1 assertthat_0.2.1
[28] Matrix_1.4-1 fastmap_1.1.0 cli_3.3.0
[31] tweenr_1.0.2 htmltools_0.5.2 tools_4.1.2
[34] igraph_1.3.1 gtable_0.3.0 glue_1.6.2
[37] GenomeInfoDbData_1.2.7 reshape2_1.4.4 Rcpp_1.0.8.3
[40] carData_3.0-5 Biobase_2.54.0 cellranger_1.1.0
[43] vctrs_0.4.1 Biostrings_2.62.0 rhdf5filters_1.6.0
[46] multtest_2.50.0 ape_5.6-2 nlme_3.1-157
[49] iterators_1.0.14 xfun_0.31 rbibutils_2.2.8
[52] rvest_1.0.2 lifecycle_1.0.1 gtools_3.9.2
[55] rstatix_0.7.0 zlibbioc_1.40.0 MASS_7.3-57
[58] scales_1.2.0 hms_1.1.1 parallel_4.1.2
[61] biomformat_1.22.0 rhdf5_2.38.1 yaml_2.3.5
[64] stringi_1.7.6 S4Vectors_0.32.4 foreach_1.5.2
[67] permute_0.9-7 caTools_1.18.2 BiocGenerics_0.40.0
[70] GenomeInfoDb_1.30.1 Rdpack_2.3 rlang_1.0.2
[73] pkgconfig_2.0.3 bitops_1.0-7 evaluate_0.15
[76] lattice_0.20-45 Rhdf5lib_1.16.0 htmlwidgets_1.5.4
[79] tidyselect_1.1.2 plyr_1.8.7 R6_2.5.1
[82] IRanges_2.28.0 gplots_3.1.3 generics_0.1.2
[85] DBI_1.1.2 pillar_1.7.0 haven_2.5.0
[88] withr_2.5.0 mgcv_1.8-40 survival_3.3-1
[91] abind_1.4-5 RCurl_1.98-1.6 modelr_0.1.8
[94] crayon_1.5.1 car_3.0-13 KernSmooth_2.23-20
[97] utf8_1.2.2 tzdb_0.3.0 rmarkdown_2.14
[100] grid_4.1.2 readxl_1.4.0 data.table_1.14.2
[103] vegan_2.6-2 reprex_2.0.1 digest_0.6.29
[106] stats4_4.1.2 munsell_0.5.0 `

User-friendly Result table

As posted in Issue #73:

Considering you are performing major updates to the package, have you considered to organize the final results more in a user-friendly table, a similar table to the one DESeq2 (https://bioconductor.org/packages/release/bioc/html/DESeq2.html) produces, for example? It would summarize the results in a more consise fashion and would make the end-user life easier, that would not need to merge several tables.

Cheers,

André

Incorrect labelling of groups in output?

Hello!

I have three treatment groups in the dataset I'm analyzing: Limit Fed, Ad Libitum, and High Fat. They are all listed as a "diet" variable, which i indicate in my code (see below). However, when I look at the output, I see the High Fat group listed twice with different numbers, and no Ad Libitum group. Am i inputing something incorrectly? I double checked my metadata and it seems correct. I have attached a sample out$res file for you to see.

here is the line of code I run:
P0Ancom = ANCOMBC::ancombc(phyloseq = phylum_data0, formula = "Diet", p_adj_method = "holm", zero_cut = 0.90, group = "Diet", struc_zero = TRUE, conserve = TRUE, neg_lb = FALSE, tol = 1e-5, max_iter = 100, alpha = 0.05)

P0pval = P0Ancom$res

P0stats.csv

Best,
Haley

z-transformed covariates

Hi Frederick,

thank you for developing such amazing tool! I am using it for my studies on gut microbiome and stress and I was wondering if I could z-transform my covariate (stress) before runing ancombc() to better interpret the results.

Do you think it could work? my code is:

out = ancombc(phyloseq = phylum_data, formula = "stress + season + sex + age_cat", p_adj_method = "holm",
zero_cut = 0.90, lib_cut = 1000, group = NULL, struc_zero = FALSE, neg_lb = FALSE, tol = 1e-5, max_iter = 100,
conserve = TRUE, alpha = 0.05, global = FALSE)

but before of it, i guess I should z-transform the covariate in the data summary part or even before? should i do it before and then run the following?

summary_template =
list("stress" =
list("min" = ~ min(.data$stress, na.rm = TRUE),
"max" = ~ max(.data$stress, na.rm = TRUE),
"mean (sd)" = ~ mean_sd(.data$stress, na_rm = TRUE,
show_n = "never")),
"sex" =
list("F" = ~ n_perc0(.data$sex == "female", na_rm = TRUE),
"M" = ~ n_perc0(.data$sex == "male", na_rm = TRUE),
"NA" = ~ n_perc0(is.na(.data$sex))), and so on...

Thank you!

Cheers,

Simone

Diff_abn for first variable depends on the second variable reference level - expected?

Dear Frederick,

I have been happily using ANCOM-BC for a while and plan to keep using it in the future. However, something is puzzling me now that I'm working on a data set with two variables, of which the variable Status has 2 levels and the variable Treatment has 3 levels.

I ran ancombc with the formula "Status*Treatment". This gave me two of the three possible comparisons for Treatment, but the third comparison for the variable Treatment is also relevant in this case, so I repeated ancombc with the same formula but with the Treatment levels switched around so that another level was the reference, which achieved the goal.

However, I noticed that the 79 ASVs that were differentially abundant between Status levels in the first comparison only partially overlapped the 18 ASVs that were differentially abundant between Status level in the second comparison (5 ASVs in common). I switched around or omitted Treatment factor levels some more and saw that the ASVs differentially abundant between the Status levels depended heavily on which Treatment factor level was the reference.

Did I make a mistake somewhere? Is this expected behaviour and the comparison for the first factor have to be repeated with every level of the second one as reference (and vice versa)? Do you have any suggestions on how to proceed in cases like this and how to interpret these results? I would be grateful for any comment or advice!

With kind regards,
Marko

Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed

Hi!

I am running ANCOMBC with a phyloseq object. And I got this error:

Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed

I know this error has been discussed more than once. But what makes it curious for me this time, is that it's definitely not due to small number of taxa. When I agglomerated my phyloseq object at genus level (204 taxa) or order level (44 taxa), the function run through no problem. But when I try to agglomerated my phyloseq object at family (67 taxa), it fails.

Any suggestions?

PS: I do have a fairly complicated formula which contains 4 terms. Still it doesn't make sense it worked for order level but not for family level.

which group the differential features are enriched

Hi Frederick,

Thanks for your great work.

In ANCOMBC, the global test is for testing whether there are some taxa that are differentially abundant in at least one group, and how can I determin which group the differential features are enriched.

Assume there are three groups A, B anc C, featureA is differential among these three groups after running ANCOMBC, which group (A, B or C) the featureA is enriched in? Is the group which has largest mean abundance of featureA? If it is, how to got the absolute abundance, is the Bias-adjusted abundances of the result represents the
corrected absolute abudance?

I'm sorry If my question are too basic. Any help is much appreciated.

Thanks for your attention.

Longitudinal analysis

Hi @FrederickHuangLin

I have seen the package and I find it very interesting. I have to do an analysis with longitudinal data and wanted to ask if there is any date for the new release containing longitudinal data analysis

Thanks in advance

Dolors Pelegrí

Variables 'change' after using ANCOMBC function

Hello dr Lin,

I am very interested in using ANCOMBC to analyse a human microbiome dataset.
Unfortunately, I am running into a problem using the ANCOMBC function.

I am not an experienced R user but I'll try to explain as best as I can.

e.g.: the variable “City” consisting of “Bur”, “Man”, “Kna”, “Tow”.
After running ancombc, the columns of the output tables are headed: CityMan, CityKna, CityTow.
The variable Bur has disappeared altogether. This happens for all the variables; the column name of the variable seems to replace one of the variables.
In the merged phyloseq file used for ancombc this doesn’t seem to be an issue:
sam_data list40x5 A data.frame with 40 rows and 5 columns
Town factor Factor with 4 levels: “Bur”, “Man”, “Kna”, “Tow”

I am sure I am missing something here and hopefully you can help.
Thank you very much in advance.
Regards, Ica

Log fold change

Hi,

Very naive question: I wonder if negative log-Fold change are conserved in ANCOMBC output.

Cheers,

More informative error message in 'Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed'

Hi @FrederickHuangLin !

I've been working with the ANCOM-BC package for the past couple of days and it's very intuitive and interesting. However, I would like to know more about this error message :

'Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed'

If perhaps you could add an enhancement in the package such that the user knows more details about why they are getting such an error, it would be useful. I have referred gone through your conversation in this issue : #16 , but it's still a little unclear, and may not be applicable in this case. Given that I am using the latest version of ANCOM-BC, could you provide me with more details about this error?

Thank you so much! Looking forward to hearing from you.

Best,
Sharvari Narendra

Foldchange value vs logFoldchange

Hi,
Is it correct to say that the log ratio between the tested conditions corresponds to the dataframe res$W=(beta/se)

In such case the Foldchange would be exp(1)^(abs(res$w) ?

So for example if res$w returns -1.363898358 than the FC will be exp(1)^(1.363898358) = -3.91 ?

Is it correct ?

Error in ancombc

Hi,

This package is very interesting. I tried it and found that it reported the error: "Error in eval(predvars, data, env) :
numeric 'envir' arg not of length one" when the data is sparse. I am wondering if you have any suggestions on it.

Shulei

package not available for R version 3.6.1?

Hi,

I am trying to install ANCOMBC using the following command:

package ‘ANCOMBC’ is not available (for R version 3.6.1)

But then I get the following result:
package ‘ANCOMBC’ is not available (for R version 3.6.1)

Is there a way I could fix this?

Cheers,
Pablo

Global test results omit certain taxa

Hi,

I'm trying to identify taxa that are differentially abundant between different sequencing batches. I have two metadata columns, 'site' and 'kit'. My original otu_table has 663 samples and 3986 taxa. However, after running ANCOM-BC, the global test result table has only dimensions of 3834 x 6. Some taxa are not included in this result table. May I know why this may be the case? My command parameters are shown below.

column <- "kit"
out <- ancombc(phyloseq = phy_subset, formula = column,
p_adj_method = "BH", zero_cut = 1, lib_cut = 0,
group = column, struc_zero = FALSE, neg_lb = FALSE,
tol = 1e-5, max_iter = 1, conserve = F,
alpha = 0.05, global = TRUE)

Thanks in advance.

Unable to install

Thanks for making this package available!

I am unable to install your package on my system using Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.3 (2020-10-10). Installation produced the following warning:
Warning message: package ‘ANCOMBC’ is not available for this version of R

My version of R was 4.0.3. See full session info as following:

R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Registered S3 method overwritten by 'treeio':
  method     from
  root.phylo ape 


> if (!requireNamespace("BiocManager", quietly = TRUE))
+     install.packages("BiocManager")
> BiocManager::install("ANCOMBC")

Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.3 (2020-10-10)
Installing package(s) 'ANCOMBC'
Warning message:
package ‘ANCOMBC’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages 

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
  [1] Rtsne_0.15          colorspace_2.0-0    ggtree_2.2.4        ggsignif_0.6.0      class_7.3-17        ellipsis_0.3.1      rio_0.5.16         
  [8] htmlTable_2.1.0     XVector_0.30.0      base64enc_0.1-3     aplot_0.0.6         rstudioapi_0.13     farver_2.0.3        ggpubr_0.4.0       
 [15] graphlayouts_0.7.1  ggrepel_0.9.0       codetools_0.2-18    splines_4.0.3       knitr_1.30          polyclip_1.10-0     ade4_1.7-16        
 [22] Formula_1.2-4       jsonlite_1.7.2      phyloseq_1.34.0     broom_0.7.3         cluster_2.1.0       png_0.1-7           ggforce_0.3.2      
 [29] BiocManager_1.30.10 readr_1.4.0         compiler_4.0.3      rvcheck_0.1.8       backports_1.2.1     Matrix_1.3-0        lazyeval_0.2.2     
 [36] tweenr_1.0.1        htmltools_0.5.0     prettyunits_1.1.1   tools_4.0.3         igraph_1.2.6        gtable_0.3.0        glue_1.4.2         
 [43] reshape2_1.4.4      dplyr_1.0.2         Rcpp_1.0.5          carData_3.0-4       Biobase_2.50.0      cellranger_1.1.0    vctrs_0.3.6        
 [50] Biostrings_2.58.0   rhdf5filters_1.2.0  multtest_2.46.0     ape_5.4-1           nlme_3.1-151        iterators_1.0.13    ggraph_2.0.4       
 [57] xfun_0.19           stringr_1.4.0       openxlsx_4.2.3      lifecycle_0.2.0     fuzzyjoin_0.1.6     rstatix_0.6.0       zlibbioc_1.36.0    
 [64] MASS_7.3-53         scales_1.1.1        tidygraph_1.2.0     hms_0.5.3           parallel_4.0.3      biomformat_1.18.0   rhdf5_2.34.0       
 [71] RColorBrewer_1.1-2  yaml_2.2.1          curl_4.3            gridExtra_2.3       ggplot2_3.3.3       rpart_4.1-15        latticeExtra_0.6-29
 [78] stringi_1.5.3       S4Vectors_0.28.1    foreach_1.5.1       e1071_1.7-4         tidytree_0.3.3      checkmate_2.0.0     permute_0.9-5      
 [85] BiocGenerics_0.36.0 phylosmith_1.0.5    zip_2.1.1           rlang_0.4.10        pkgconfig_2.0.3     evaluate_0.14       lattice_0.20-41    
 [92] sf_0.9-6            purrr_0.3.4         Rhdf5lib_1.12.0     treeio_1.12.0       patchwork_1.1.1     htmlwidgets_1.5.3   tidyselect_1.1.0   
 [99] plyr_1.8.6          magrittr_2.0.1      R6_2.5.0            IRanges_2.24.1      generics_0.1.0      Hmisc_4.4-2         DBI_1.1.0          
[106] pillar_1.4.7        haven_2.3.1         foreign_0.8-81      mgcv_1.8-33         units_0.6-7         survival_3.2-7      abind_1.4-5        
[113] nnet_7.3-14         tibble_3.0.4        crayon_1.3.4        car_3.0-10          KernSmooth_2.23-18  microbiome_1.10.0   phylofactor_0.0.1  
[120] rmarkdown_2.6       viridis_0.5.1       jpeg_0.1-8.1        progress_1.2.2      grid_4.0.3          readxl_1.3.1        data.table_1.13.6  
[127] vegan_2.5-7         forcats_0.5.0       classInt_0.4-3      digest_0.6.27       tidyr_1.1.2         stats4_4.0.3        munsell_0.5.0      
[134] viridisLite_0.3.0   qiime2R_0.99.23

Confusion about results interpretion

Hi Dr. Lin,

I met some confusions when I tried to interpret the ANCOM-BC results.

First, I am trying to compare the DA(Differential Abundant) phyla between "Positive" group and "Negative" group, I believe the "Negative" group was used as the reference group here in my analysis. According to the LFC( Log Fold Change) values, I want to conclude that phyla such as Halobacterota and Campilobacterota are more abundant in the "Positive" group, because their LFC values are positive. But when I checked their RA(relative abundance), I found the RA of Campilobacterota was higher in the "Negative" group when compared with the "Positive" group. Similarly, the LFC value for Bacteroidota is negative, which means it should be more abundant in the reference ("Negative") group, but the RA of Bacteroidota was higher in the "Positive" group than the "Negative" group. I know I am not supposed to compare the RA values directly, but I just want to make sure that it is possible to see such scenarios.

Second, I am trying to compare the DA phyla between "Treatment" group and "Control" group, I believe the "Control" group was used as the reference group here in my analysis. Here, I saw Thermoplasmatota has a negative LFC value, which should lead to the conclusion that" Thermoplasmatota is more abundant in the "Control" group". However, the RA of Thermoplasmatota in the "Control" group is 0.00%. Is this also because of the structural zeros?

I hope I have described my questions clearly, if not, please feel free to let me know.

Best regards,
Jinji Pang

Help with out file

First, Thank you for this tool!!
It is very impressive...

I do have a question, I am not that code smart, so I have a question/request help.

In the final output "out" all the tables has as rownames the taxa information. I wonder if it is possible to get that output with the tax_table rownames information and not the actual taxa info. Thanks.

Error Message

Hello! I am currently using ANCOMBC on a phyloseq object. When I run the function:

G0Ancom = ANCOMBC::ancombc(phyloseq = pig0, formula = "Diet", p_adj_method = "holm", zero_cut = 0.90, group = "Diet", struc_zero = TRUE, conserve = TRUE, neg_lb = FALSE, tol = 1e-5, max_iter = 100, alpha = 0.05)

I am getting this error:

Error in while (epsilon > tol & iterNum < max_iter) { : missing value where TRUE/FALSE needed

It seems that this issue has been brought up before, but i am not using a small amount of taxa (I have 48 OTUs). Any advice on how to remedy this?

Thanks!

Confidence intervals from standard errors output by model?

Hi, I'm really excited about this method and package! Thanks for developing it.

I noticed in the original publication (https://doi.org/10.1038/s41467-020-17041-7) that log-fold changes were plotted with Bonferroni-corrected 95% confidence intervals (CIs) as the error bars, but the current release outputs standard errors (SEs) and not CIs directly. I haven't been able to find the documentation in the manuscript methods or in the vignette or package documentation for calculating CIs from the SEs - can a normal distribution be assumed to calculate those?

Thanks for any help!

Unclear output with four conditions

Dear Frederick,

first of all many thanks for this package! Just tested to detect taxa enriched by season using

bac <- pseq %>% tax_glom("Genus") %>% filter_taxa(.,function(x) mean(x) > 10, T)

out = ancombc(bac, formula = "season", p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000, group = "season", struc_zero = T, neg_lb = F, tol = 1e-5, max_iter = 100, conserve = T, alpha = 0.05, global = T)

I do get an output, which is however confusing: there is "seasonspring", "seasonsummer" and "seasonwinter" but no "seasonfall".

Also, I'd interpret that TRUE in res$diff_abn means that a given taxon is enriched under the specific condition (for example, if TRUE under seasonspring it's enriched in spring). But the results do not match other DA tools I've been testing.

Many thanks for your help!
Matthias

Longitudinal analysis using ancombc

Dear lin:

About the analysis of the longitudinal data using ANCOM_BC;
I know that ANCOM can be used to analyses longitudinal data;
Whether ANCOM_BC would provide simialr function ?
or if we can use bias corrected data by ANCOM_BC to perform ANCOM ?

Thanks for your great work ~
best
huahui

Another error message in ANCOM BC

Hi,

I have created a phyloseq object and try to run ANCOM BC on it - the phyloseq object contains three files, a tsv metadata file, and 2 qiime qza files - the taxa and the samples (table.qza).

When I try to run ANCOM BC on the phylum level (I have used phylum_data = aggregate_taxa(physeq_sopp, "Phylum"), with the following code
phylum_data_OW_Sarcoidosis_Control = subset_samples(phylum_data, (Category%in%c("Sarcoidosis", "Control")) & (SampleType=="OW"))

out_OW_Sarcoidosis_Control = ancombc(phyloseq = phylum_data_OW_Sarcoidosis_Control, formula = "Category",

                                 p_adj_method = "holm", zero_cut = 0.90, lib_cut = 500,

                                 group = "Category", struc_zero = TRUE, neg_lb = FALSE,

                                 tol = 1e-5, max_iter = 100, conserve = TRUE,

                                 alpha = 0.05, global = TRUE)

I get this error message:

Error in while (epsilon > tol & iterNum < max_iter) { :
missing value where TRUE/FALSE needed

and I cannot get around this. I have tried tweeking the neg_lb, tpl, and max_iter to extremes to no avail, and I can not see any errors in the metadatafile regarding the "Category" values.

Any idea what this means?

Does ANCOMBC supports only one variable of interest?

Hi developer,

Does ANCOMBC support only one continuous variable of interest?
For example, in the vignettes:

out = ancombc(phyloseq = phylum_data, formula = "age + nation + bmi_group",
p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000,
group = "nation", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5,
max_iter = 100, conserve = TRUE, alpha = 0.05, global = TRUE)

Is that possible, if I want to test the association between age(only one continuous variable) and each taxa？
Appreciate it!

transformation for plots

Dear Huang,

Thanks for providing your continuous support to ANCOM and ANCOM-BC users!

I'm wondering what is the best way to plot changes in abundances of DA features identified with ANCOM-BC. I used to plot clr-transformed counts on heatmaps when I was using ANCOM but now that I switched to ANCOM-BC I get very conflicting results regarding more or less abundant features in two conditions in comparison with ANCOM-BC results. For instance, ANCOM-BC identifies featureX as differentially more abundant in treatmentA vs treatmentB but when I plot the clr-transformed abundances I get the opposite pattern. This only happens for some features, the majority of them are consistent with ANCOMBC results, but sometimes I need to focus on those exceptions and haven't found a better way to plot them.

Any help is appreciated!

Thanks,
Anny

What to do with the group argument when your predictors are all continuous

Hello,

I apologize if this has been answered elsewhere, but it's not clear to me what I should do with the group = argument when all of my predictors are continuous. I would like to control for structural zeros if possible, but I don't think I can give a continuous variable to the group argument and I don't have any categorical predictors in my model.

Thanks!

Repeated measures & using bias-adjusted counts elsewhere?

Hello,
You mention in your paper that it is possible to use ANCOMBC with repeated-measures designs. I was wondering if this had been implemented in ANCOMBC already (perhaps as support for random effects)?

If not, is it reasonable to run ancombc(), take the bias-adjusted counts, and use them in, say, lme4::lmer() to take advantage of random effects?

Thanks

input ASV table

Hi Frederick,

What a great software to analyze differential abundance!
I just have a brief question regarding the input ASV table. I noticed that you stressed multiple times that the input should be an absolute count table instead of a relative abundance table. I am wondering if you would recommend using a rarefied table as input or just raw data?

Thank you!

q_val of 0.000000000

Hi, thanks for your efforts in developing this package!

I had a doubt: I run ancombc using "fdr" adjustment and I have a lot of q-values of 0. I couldn't get exactly whether these are actually low q-values (so high significance) or an artefact from the test itself.
eg. this is the output I got (transformed to significance level), where 0 are marked as "?"
? * ** *** **** ns
634 46 20 7 7 2222

inconsistent data structures

For my convenience, I transform parts of the result structure into one data frame. I noticed that when using one independent variable with two levels in the formula, the results for $res$diff_abn, $res$p_val, and $res$q_val can either be a simple vector when the parameter struc_zero = FALSE or they are a matrix with one column when struc_zero = TRUE. To me it makes most sense that they are always a matrix since this is the case when more than two levels are involved anyway. Either way, I believe that the outcome should be consistent independent of the struc_zero parameter.

ANCOM-BC global test and multiple comparisons

Dear all,
First of all, thanks for developing such a tool, it is proving very helpful in many of my current studies.
I have a couple of questions regarding its utilization in one of my recent study designs. I have a batch of samples with varying feature abundances; I have identified a total of 6 groups (C1 to C6) and I'd like to identify differentially abundant features across the dataset using ANCOM-BC. However, the test itself relies on comparisons against a reference, which I don't have. To avoid resorting to manually performing a batch of multiple comparisons each time defining a different group as a reference, I tried to do the following:

have ANCOM-BC define globally differentially-abundant features through the global test;
recover the identified features from a CLR-normalized table and plot their abundances by group.

I am however unsure about this. What does really the global test procedure do? Should I normalize the results by subtracting the sampling fraction from log-normalized feature abundances as the vignette shows?
Thanks for the advices!
Best regards

Understanding the Output

Hi Frederick,

Thanks for developing the tool for compositional data.
I am new to microbiome analysis and trying to understand the output result from ANCOM-BC

I was trying use the data to identify differentially abundant KOs from PICRUST2 output.

I have an abundance table for features as rows (KO dataset) with samples as columns. I have 3 treatment groups (treatA, treatB, treatC), with sample size of 10/group. I modified the it formula from the example for my dataset;

out = ancombc(phyloseq = physeq2, formula = "treatment", p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000, group = "treatment", struc_zero = TRUE, neg_lb = FALSE, tol = 1e-5, max_iter = 100, conserve = TRUE,
alpha = 0.05, global = TRUE)

The output has two result files; res and res_global. I am trying to understand these files, and having a bit of trouble.

res_global = is the output for comparing the overall features that are differentially abundant between the three groups in my case. Can we think of the W statistic as analogous to the F statistic in one-way ANOVA?
res = is the output for evaluating the pairwise comparisons for which I have attached a snapshot of the res$beta gives me file with three columns. beta being a coefficient based on which the W and SE are calculated.
a). Is the beta coefficient listed below the two columns treatB or treatC based comparison with treatA (treatB vs treatA; treatC vs treatA)? but there is no comparison between treatB vs treatC. Is this issue associated with the way I have setup the ancombc to run as mentioned above?
b). How do I interpret the first column the "Intercept"? I understand it is a result of applying linear regression to the differential abundance analysis.
c). How do I calculate the effect size for each feature?

out1.txt

If any of my questions are too basic, my apologies. Any help is much appreciated.
Thank you.
Shrez

Error in data_prep(..)

Hi Frederick,
Thanks for developing the methodology and the R package!

When I run the toy example from ?help (GlobalPatterns) data, it runs with no problems. However, when I plug in my data, it givees me this error message:


Error in data_prep(phyloseq, group, zero_cut, lib_cut, global = global) : 
  The group variable should have >= 2 categories.

Any help is much appreciated!

Best. Stan

Using ANCOM-BC with known sampling fractions

Dear,

I find the ANCOM-BC approach to identify deferentially abundant taxa very interesting and was wondering if the estimated sample-specific offset term could be replaced by a known sampling fraction. The ecosystem absolute abundance can easily be determined through e.g. flow cytometry. Combined with the library sizes this allows for the calculation of sampling fractions. With quantitative microbiome profiling (https://www.nature.com/articles/nature24460) gaining popularity, an option to use the known sampling factions in the ANCOM-BC model seems very interesting to me.

Kind regards,
Kim

Is the ANCOMBC currently available?

Hi, Huang,

I am wondering if the ANCOMBC is currently available? I am really interested in trying it. I tried to install either the latest released or development version using code listed in the install section, but none of them worked for me.

Thanks!

Best regards,
Zhe

ANCOM v2.1 and ANCOMBC, difference?

Hi developer,
I am a little confused, what's the difference between ANCOM-BC and ANCOM v2.1?
I have been always using ANCOM v2.1, is it different from ANCOM-BC?

Thanks!!!!

adjusting for random effects

Really nice contribution to the field. I am trying ANCOMBC with my own dataset. Is there currently a way to adjust for random effects as with ANCOM-II? I use the rand_formula = "~ 1 | random_effect_var" option with ANCOM-II.

Support for (Tree)SummarizedExperiment

ANCOMBC does currently support phyloseq. We are now actively working on an alternative framework - miaverse - based on the newer (Tree)SummarizedExperiment class, providing some enhanced capabilities. It would be relatively straightforward to add support for this class because they are closely related, phyloseq can always be converted to TSE mia::makeTreeSummarizedExperimentFromphyloseq and TSE to phyloseq in many cases (not always as it is a superset). An example is shown in Conversions between data formats in R.

Support for (Tree)SummarizedExperiment could be added ancombc function (and possibly others?).

ANCOM-BC / missing comparisons

Hello,

Thanks to have published the ANCOM-BC tool.

I wonder why ANCOM-BC doesn’t do all possible pairwise comparisons?

For example for the variables “nationality” and “BMI” in your Bioconductor webpage there is only any nationality versus “CE” and any BMI versus “Lean” (see screenshot below)? I think the choice of the versus group “CE” or “Lean” is selected alphabetically, right?

Cheers,
Jérémy Tournayre

technical replicates

Hi again,

Many questions as a new user... Please let me know if you prefer to have e-mail conversations or you could enable the 'Discussion' feature for this repository; but issues are just fine for me.

I wonder how you suggest to handle technical replicates. As an example, I have multiple samples per patient in each group. In a linear mixed-effect model I could specify this as a random effect, e.g.,

lme4::lmer(abundance ~ group + (1 | patient), ...)

One possible way is to combine abundance tables using mean values, I suppose, but that seems like a huge loss of information. Thank you for any insights.

error in evaluating the argument 'y' in selecting a method for function 'plot'

Hi,
I am running ANCOMBC to compare abundance levels between countries using the yes/no variable. I try to run the code to make a volcano plot but I get this error:
Error in h(simpleError(msg, call)) : error in evaluating the argument 'y' in selecting a method for function 'plot': non-numeric argument to mathematical function.
I have attached a photo of this issue with the R code. I wonder what is wrong here.

Best,
Nermin

Group variables and global test results

Dear ANCOMBC developer,

I am trying to run ANCOMBC with 5 confounders and 1 group variable with 2 levels as follows

fit.ancombc0 = ancombc(phyloseq = phydata, formula = "x1 +x2+x3+x4+x5+ group" ,
p_adj_method = "BH", zero_cut = 0.90, lib_cut = 1000,
struc_zero = TRUE, neg_lb = FALSE,
tol = 1e-5, max_iter = 100, conserve = TRUE,
group = group, alpha = 0.05, global = TRUE)

I get the following warning message and my res_global output is NULL

Warning message:
In data_prep(phyloseq, group, zero_cut, lib_cut, global = global) :
The multi-group comparison will be deactivated as the group variable has < 3 categories.

I am interested to find the Differentially abundant (DA) taxa between the two levels of my group variable. Does the above warning mean that ANCOMBC cannot provide DA taxa between two groups?

Global test with multiple group variables

Hello Mr. Developer,

I'm now working on an analysis project. There are 3 major environmental factors (e.g. A, B and C) which we think would affect the abundance of microbiomes. In one step I'd like to test the association between the abundance of microbiomes and 2 of the factors (say B and C), but I am not sure how I could write my formula properly (sorry that this question is rather basic, my apologies). Could I perform a global test on both factors in this case? If so, I'm still wondering if ANCOM-BC support multiple group variables as input.

Any help would be much appreciated and thank you in advance!

Pan

Comparison of 4 groups

Hello,

I am Ana, a Veterinary Student form Spain. I would like to compare the microbiome composition of sows given 4 different treatments, and see if there is any difference in the relative abundance. I performed ANCOMBC with global results, but I am not sure if that is the best way, nor how to know which of the 4 groups is the one having the significant differencial abundant data.

Thank you in advance for your help.
Ana

Error message after adjusting formula

Hello,

Thank you for developing and sharing the ancombc package.

I have been running ancombc on a dataset using 2 covariates in the formula argument and it seems to be working great.
However, when I try to add a third covariate, I receive the following error message;

Error in vapply(seq(n_taxa), function(i) dnorm(Delta[i], delta, sqrt(nu0[i])),  : 
  values must be length 1,
 but FUN(X[[2]]) result is length 0
In addition: Warning messages:
1: In min(Delta, na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
2: In max(Delta, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf

Would you happen to have any advice on where this is coming from and how to remedy it?

Thank you!

Error with contrasts in ancombc

I call ancombc as follows with a phyloseq object called physeq.

result <-
  ANCOMBC::ancombc(
    phyloseq = physeq,
    formula = "treatment",
    group = "treatment",
    lib_cut = 500,
    p_adj_method = "holm",
    conserve = FALSE
  )

which leads to the error with traceback

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
9. stop("contrasts can be applied only to factors with 2 or more levels")
8. `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
7. model.matrix.default(mt, mf, contrasts)
6. model.matrix(mt, mf, contrasts)
5. lm(tformula, data = df)
4. FUN(X[[i]], ...)
3. lapply(seq_len(n_taxa), function(i) { df = data.frame(y = unlist(y[i, ]) - d, meta_data) return(lm(tformula, data = df)) })
2. para_est(y, meta_data, formula, tol, max_iter)
1. ANCOMBC::ancombc(phyloseq = physeq, formula = "treatment", group = "treatment", lib_cut = 500, p_adj_method = "holm", conserve = FALSE)

Treatment is a factor with two levels "baseline" "A" so I'm not sure where the problem is coming from.

Unfortunately, I can't share any data but I'm happy to answer any questions to solve this. I use this phyloseq object in other analyses so I generally think that the data is in the right shape.

sessionInfo()

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_BE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_BE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_BE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_BE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] lubridate_1.7.10 magrittr_2.0.1   phyloseq_1.34.0  nloptr_1.2.2.2   ANCOMBC_1.0.3    dplyr_1.0.4     

loaded via a namespace (and not attached):
 [1] Biobase_2.50.0      mustashe_0.1.2      tidyr_1.1.2         jsonlite_1.7.2      splines_4.0.4       foreach_1.5.1      
 [7] RcppParallel_5.0.3  assertthat_0.2.1    Rdpack_2.1.1        BiocManager_1.30.10 stats4_4.0.4        cellranger_1.1.0   
[13] renv_0.13.0         yaml_2.2.1          progress_1.2.2      pillar_1.5.0        lattice_0.20-41     glue_1.4.2         
[19] digest_0.6.27       XVector_0.30.0      rbibutils_2.0       picante_1.8.2       stringfish_0.15.0   colorspace_2.0-0   
[25] htmltools_0.5.1.1   Matrix_1.3-2        plyr_1.8.6          pkgconfig_2.0.3     microbiome_1.12.0   zlibbioc_1.36.0    
[31] purrr_0.3.4         scales_1.1.1        Rtsne_0.15          RApiSerialize_0.1.0 rcartocolor_2.0.0   tibble_3.1.0       
[37] mgcv_1.8-34         generics_0.1.0      IRanges_2.24.1      ggplot2_3.3.3       ellipsis_0.3.1      BiocGenerics_0.36.0
[43] cli_2.3.1           readxl_1.3.1        survival_3.2-7      crayon_1.4.1        evaluate_0.14       fansi_0.4.2        
[49] nlme_3.1-152        MASS_7.3-53.1       vegan_2.5-7         tools_4.0.4         data.table_1.14.0   prettyunits_1.1.1  
[55] hms_1.0.0           formatR_1.7         lifecycle_1.0.0     stringr_1.4.0       Rhdf5lib_1.12.0     S4Vectors_0.28.1   
[61] munsell_0.5.0       cluster_2.1.1       Biostrings_2.58.0   ade4_1.7-16         compiler_4.0.4      qs_0.23.6          
[67] rlang_0.4.10        rhdf5_2.34.0        grid_4.0.4          iterators_1.0.13    rhdf5filters_1.2.0  biomformat_1.18.0  
[73] rstudioapi_0.13     igraph_1.2.6        rmarkdown_2.7       gtable_0.3.0        codetools_0.2-18    multtest_2.46.0    
[79] reshape2_1.4.4      R6_2.5.0            knitr_1.31          utf8_1.1.4          permute_0.9-5       readr_1.4.0        
[85] ape_5.4-1           stringi_1.5.3       parallel_4.0.4      Rcpp_1.0.6          vctrs_0.3.6         tidyselect_1.1.0   
[91] xfun_0.21

Filtering low abundance Taxa

Hello,

I have a count table of ASV with two populations: P1 and P2 (5 samples/group), with-out filtering ASVs by abundance:

out = ancombc(phyloseq = pseq_A1_raw, formula = "population",
p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000,
group = "population", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5,
max_iter = 100, conserve = TRUE, alpha = 0.05, global = FALSE)

However, I find some ASVs with a significant q_val, but when a see the corresponding row in the count table show very low differences, because they are low abundand ASVs.

For example:
P1_1 P1_2 P1_3 P1_4 P1_5 P2_1 P2_2 P2_3 P2_4 P2_5
ASV_2 2 2 11 1 0 0 0 0 0 0
[...]
ASV_726 1 43 0 15 1 0 0 0 0 0

I wonder if it would be okey to apply a filter to remove low abundand taxa? And when should I used before or after doing differential abundance with ancombc?

Thanks in advance

Co-occurrence network with ANCOM-BC derived absolute abundances

Hello,

We have performed differential abundance analysis using ANCOM-BC specifying the model "case_status + batch", where "case_status" is our variable of interest and denotes what samples came from a case or control individual, while "batch" is a covariate and denotes what sequencing batch a sample belonged to.

We would now like to construct species to species co-occurrence networks of tested species. I would like to use estimated absolute abundances as input to the co-occurrence networks, but I'm unsure if this can be done, as I'm aware the estimated sampling fractions (and therefore the bias-adjusted abundances) are dependent on the model given to ANCOM-BC. In order to preserve any differences that would be present in absolute abundances of case vs control samples, would it make sense to just supply ANCOM-BC with the "batch" variable, compute estimated absolute abundances using the resulting sampling fractions, and use those abundances for input to network analysis (after any needed transformations or normalizations)?

Any info/suggestions is much appreciated!

Adjusting for covariates

Dear Huang,

I'm looking forward to trying out ANCOM-BC on our data, and was wondering if the option to adjust for covariates will be implemented in this method (as it is in ANCOMv2.1)?

Many thanks,

Roos

Relation of W statistic and beta coefficient

Hi Huang,

I've been using your tool happily and I've read your paper (Lin & Peddada, 2020), the vignette, and the ANCOM-BC manual.

What is not entirely clear to me is how the effect size (beta coefficient of the log-linear model) relates to the ANOVA-like model proposed in Lin & Peddada, 2020. And how the significance of the effect is actually tested for.

Writing Eq(4) of Lin & Peddada, 2020 in log form, we have: log(Oijk) = log (cjk) + log(Θij) + eijk

From what I understand from the paper, this model could also be written in a log-linear form to explicitly account for the effect of a covariate: log(Oijk/cjk) = βi0 + βi1X + eijk

But the log-linear model equation is not written in the paper.

Then, for the W test statistic we have Eq(33) from the paper and the information in the manual: "W = beta/se" (standardized effect size).

So, how does W relates to the beta coefficient of the covariate X we're testing for ? How is the p-value of W computed ?

Kind regards,
Rudy

frederickhuanglin / ancombc Goto Github PK

ancombc's Introduction

ANCOMBC

To install the latest release version of ANCOMBC

Usage

Commonly asked questions

ancombc's People

Contributors

Stargazers

Watchers

Forkers

ancombc's Issues

Recommend Projects

Recommend Topics

Recommend Org