Dear, I find the ANCOM-BC approach to identify deferentially abundan

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Using ANCOM-BC with known sampling fractions about ancombc HOT 4 OPEN

frederickhuanglin commented on June 18, 2024 2

Using ANCOM-BC with known sampling fractions

from ancombc.

Comments (4)

FrederickHuangLin commented on June 18, 2024

Hi @kdpaepe,

First of all, thanks for your interest in ANCOM-BC!

That is a great question! Honestly, we did not develop ANCOM-BC under quantitative microbiome profiling (QMP), but we are aware of it as it is gaining popularity. Adding a known sampling fraction from QMP is surely an option for future ANCOMBC updates!

Additionally, we are aware of the current limitations of quantification methods. For example, flow cytometry requires intact cells and qPCR is not accurate. We need to look into the technical details of these mechanical biases so we might incorporate these features into the ANCOM-BC model.

Again, thanks for posting this great question!

Best,
Huang

from ancombc.

kdpaepe commented on June 18, 2024

Thank you very much for the swift response!

It would be great if this option would be implemented in ANCOM-BC in the future.

In the meantime, I have tried to (quickly) compare the ANCOM-BC sampling fractions and measured sampling depths (reads divided by total cell counts determined by SYBR green staining) for a couple of samples.
I used the github ANCOM-BC and ancom functions as the one from the Bioconductor package returned a single value for the sampling fractions instead of a vector (Based on the following line of code d_hat = colMeans(d_hat, na.rm = TRUE) from the fit_summary function I assume the bioconductor version returns the mean sampling fraction?)

It is nice to see that the ANCOM-BC results are consistent across phylogenetic levels (for phylum and genus level the per-sample sampling fractions are even identical, at OTU level the pattern is well preserved (despite differences in absolute values compared to the phylum/genus level).

I also made some plots where I converted the ANCOM-BC sampling fractions resp. the measured sampling depths taking into account the fact that ANCOM-BC makes use of log scales:

The estimated and measured sampling fractions seem hard to compare. Is there any transformation I can apply to the ANCOM-BC results such that I can better compare them to the measured values? Intuitively I would expect the sampling fractions to be negative on a log scale, as in most ecosystems the microbial load is much higher than the library sizes obtained by sequencing. I am not an expert in this area, so forgive me my very naïve question, but would it be possible to somehow constrain the parameter values such that the log-likelihood EM maximization returns negative delta EMs?

Maybe another naïve question, how much does the outcome of ANCOM-BC depend on the supplied model formula? E.g. for a very stratified and complex design: is it important to pass all possible factor combinations identifying unique strata to the model? Or would it be ok to leave out some factors and start from a minimal model? Because I can imagine that if you use the information on ecosystem membership to take sample averages per ecosystem which are then used to infer sampling fractions, the model formula matters more than in other applications such as deseq, where you rely on the median of ratios between samples (or with a pseudo-reference sample) for the purpose of normalization? In other words, is it assumed that samples within the same ecosystem should not significantly differ from one another and hence if differences occur they are due to different sampling fractions? This could be a bit problematic in a setting with high within-ecosystem variability and many possible confounding factors (eg. within a cohort in an intervention trial) I guess?

Best,
Kim

from ancombc.

FrederickHuangLin commented on June 18, 2024

Dear Kim,

I am deeply sorry for the delay in response. Happy New Year to you and hope you had a great holiday!

Please kindly find below my response to your questions:

I used the github ANCOM-BC and ancom functions as the one from the Bioconductor package returned a single value for the sampling fractions instead of a vector

You are right, the old version of ANCOMBC function has the bug of returning mean sampling fraction instead of a vector, I have addressed this issue, and the new version of ANCOMBC has been released. May you please update the package and let me know if it returns the desired output?

The estimated and measured sampling fractions seem hard to compare.

As indicated in our paper, the estimated sampling fractions are shifted by the same constant to the true sampling fractions (in log scale). Hence, if the true sampling fraction for each sample is d_j, and the corresponding estimated sampling fraction is \hat{d}_j, then we would expect d_j = \hat{d}_j + constant. In other words, we could expect a linear trend between d_j and \hat{d}_j if making a scatter plot between them. Would you please inform me if that is the case using your data?

How much does the outcome of ANCOM-BC depend on the supplied model formula?

The answer is yes and no. You are right, the precise estimation of sampling fractions relies on the correct specification of the formula because the Gaussian mixture model is applied to each variable you specified in the formula individually. Only if we have corrected all (correct) variables, we then are able to obtain precise estimates of sampling fractions. On the other hand, as you can see, the formula is not that important if the estimation of sampling fractions is not your primary interest. If the goal is to tell whether there are taxa that are differentially abundant with respect to a covariate of interest, since its effect size will be corrected separately, that task is still plausible.

Hope it helps,
Huang

from ancombc.

aimirza commented on June 18, 2024

Another reason for adding a predetermined sampling fraction is related to how we prepare our count table for ANCOM-BC analysis. Often, we pre-filter this table, removing ASVs (Amplicon Sequence Variants) with a minimum relative abundance in a sample below 0.25%. ANCOM-BC doesn't include this option (to filter by relative abundance), so we do it manually. However, if ANCOM-BC then calculates the sampling fraction using this already-filtered dataset, it may produce inaccurate results. To address this when its possible to supply a predetermined sampling fraction, my approach is to first run ANCOM-BC on the unfiltered dataset to determine the correct sampling fraction. I can then apply ANCOM-BC to the filtered dataset, using the sampling fraction obtained from the initial analysis on the unfiltered data.

from ancombc.

Using ANCOM-BC with known sampling fractions about ancombc HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent