I am trying to understand how the significant meta-data associated with CRC are identi

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

anova-type analysis to identify confounders about crc_meta HOT 1 CLOSED

naarkhoo commented on August 14, 2024

anova-type analysis to identify confounders

from crc_meta.

Comments (1)

jakob-wirbel commented on August 14, 2024

Hey @naarkhoo,

Okay, I am not quite sure if I totally understand what specifically your question is, but I will try to answer what i can :-) Please let me know if anything is still unclear.

We did not use a single value to quantify the strength of the confounding effect (although that is a good idea! I will think about which measure would be best here...). Instead, we visually inspected the plots in Extended Data Figure 1 and decided that Study and Colonoscopy had the strongest effects and should be included in the association testing.

For library size (and Age), the confounding variables are continuous, but for this analysis, we split them into quartiles (so that we ended up with four groups) using the R functions cut and quantile.

meta <- meta %>%
  # age
  mutate(age_factor=as.factor(
    cut(meta$Age, breaks = quantile(meta$Age), labels=c(1,2,3,4)))) %>%
  # bmi
  mutate(bmi_factor=as.factor(
    cut(meta$BMI, breaks = c(0, 25, 30, 100),
        labels=c('lean', 'overweight', 'obese')))) %>%
  # library size
  mutate(lib_size_factor=as.factor(
    cut(meta$Library_Size, breaks = quantile(meta$Library_Size),
        labels=c(1,2,3,4))))

The code for the anova-type analysis would this part:

ss.var <- apply(feat.red, 1, FUN=function(x, label){
    rank.x <- rank(x)/length(x)
    ss.tot <- sum((rank.x - mean(rank.x))^2)/length(rank.x)
    ss.o.i <- sum(vapply(unique(label), function(l){
      sum((rank.x[label==l] - mean(rank.x[label==l]))^2)
    }, FUN.VALUE = double(1)))/length(rank.x)
    return(1 - ss.o.i/ss.tot)
  }, label=meta.c %>% pull(meta.var))

This is indeed a bit convoluted and probably not super easy to understand. Here, we loop over all the features (i.e. bacterial species)

apply(feat.red, 1, ...)

and first convert the relative abundances of this feature to relative ranks (so that we may be non-parametric):

rank.x <- rank(x)/length(x)

Then, we compute the total variance within a feature

ss.tot <- sum((rank.x - mean(rank.x))^2)/length(rank.x)

Lastly, we compute the variance explained by the confounding variable. In order to do so, we do not compare each value of rank.x to the overall mean (as for the overall variance), but instead to the mean of the group to which it belongs (within the confounding variable; for example, we would compare a value from a sample in the group female to the mean of all samples in this group). This part is done in the vapply loop over the groups in the confounding variable.

ss.o.i <- sum(vapply(unique(label), function(l){
      sum((rank.x[label==l] - mean(rank.x[label==l]))^2)
    }, FUN.VALUE = double(1)))/length(rank.x)

I hope this explanation made some sense to you 😃

Cheers,
Jakob

from crc_meta.

anova-type analysis to identify confounders about crc_meta HOT 1 CLOSED

Comments (1)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent