Giter Club home page Giter Club logo

Comments (16)

mblue9 avatar mblue9 commented on May 25, 2024 1

Heya just had a thought here, you have now flipped the deseq2 order of contrasts in tidybulk to match edgeR/limma is that right? i.e like in the workshop we use ~condition + type instead of ~type + condition for deseq2, same as the others. But is that documented anywhere in tidybulk so deseq2 users will know?

from tidybulk.

mblue9 avatar mblue9 commented on May 25, 2024 1

Ok :) So would something like this make sense?
"@param .formula A formula representing the desired linear model. If there is more than one factor, they should be in the order primary factor + additional factors"

Also if deseq2 input formula is being altered by tidybulk to be similar to edgeR/limma-voom then would it make sense to have the contrasts input in a similar format e.g. also use dextrt-dexuntrt instead of dex,trt,untrt for deseq2 or would that be too tricky to implement?

from tidybulk.

mblue9 avatar mblue9 commented on May 25, 2024 1

Correct (I think, do not have proven evidence).

Just tested it & get error:

Error in eval(ej, envir = levelsenv) : object 'dextrt' not found

As I realised that this probably wouldn't work:

counts_scal_PCA %>%
test_differential_abundance(.formula = ~ 0 + cell + dex, .contrasts = c("dextrt - dexuntrt"))

as adding 0 only omits the intercept for the first factor with edgeR/limma

So better to omit the bit about contrasts & say

"
@param .formula A formula representing the desired linear model. If there is more than one factor, they should be in the order factor of interest + additional factors
"

from tidybulk.

mblue9 avatar mblue9 commented on May 25, 2024 1

Just pushed here #128
Also has fixes for a couple of typos I saw

from tidybulk.

mblue9 avatar mblue9 commented on May 25, 2024

Ok can you point me to the code you’re using & I’ll have a look?

from tidybulk.

stemangiola avatar stemangiola commented on May 25, 2024

Even better, would you mind doing an independent edgeR and deseq2 analysis (better if not with tidybulk) of Pasilla with this formula ~ condition + type

I verified if the conversion to SE was causing mess, but the SE object that goes into deseq2 seems ok, and yet the fold changes are totally unrelated.


(FYI the function for deseq2 is called get_differential_transcript_abundance_deseq2 in functions.R)

The key code I use is

tidybulk_to_SummarizedExperiment(!!.sample, !!.transcript, counts) %>%

		# DESeq2
		DESeq2::DESeqDataSet( design = .formula) %>%
		DESeq2::DESeq() %>%
                DESeq2::results()

Unless I am missing something big with deseq2


You see, tidybulk is doing what is expected

DESeq2

# see vignette for suggestions on generating
# count tables from RNA-Seq data
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
cond <- factor(rep(1:2, each=5))

# object construction
dds <- DESeqDataSetFromMatrix(cnts, DataFrame(cond), ~ cond)

# standard analysis
dds <- DESeq(dds)
results(dds)

log2 fold change (MLE): cond 2 vs 1 
Wald test p-value: cond 2 vs 1 
DataFrame with 100 rows and 6 columns
     baseMean log2FoldChange     lfcSE      stat    pvalue      padj
    <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
1     95.9958      -0.664078  0.641516 -1.035170 0.3005895  0.759065
2    133.1349      -1.293486  0.709273 -1.823678 0.0682008  0.398194
3    100.5702       0.485982  0.585052  0.830665 0.4061631  0.834731
4    149.1573      -0.780331  0.550075 -1.418590 0.1560186  0.546065
5    140.8715      -0.850247  0.527924 -1.610549 0.1072781  0.477875
...       ...            ...       ...       ...       ...       ...
96    98.1838       0.633796  0.558242  1.135343 0.2562314  0.749565
97    90.5161      -1.236289  0.578663 -2.136457 0.0326422  0.314269
98   110.1423      -1.025949  0.495703 -2.069684 0.0384819  0.314269
99    58.6476       0.108693  0.661281  0.164368 0.8694416  0.960927
100   93.9168       0.155565  0.540197  0.287978 0.7733636  0.959362
t(cnts) %>% as.data.frame %>% as_tibble(rownames = "sample") %>% mutate(cond = cond) %>% gather(gene, count, -sample, -cond) %>% tidybulk(sample, gene, count) %>% test_differential_abundance(~ cond, method="deseq2", action="get")
tidybulk says: All methods use raw counts, 
irrespective of if scale_abundance or adjust_abundance have been calculated, 
therefore it is essential to add covariates such as batch effects (if applicable) in the formula.
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
tidybulk says: to access the raw results (fitted GLM) do `attr(..., "internals")$DESeq2`
# A tibble: 100 x 7
   gene  baseMean log2FoldChange lfcSE   stat pvalue  padj
   <chr>    <dbl>          <dbl> <dbl>  <dbl>  <dbl> <dbl>
 1 V1        96.0         -0.664 0.642 -1.04  0.301  0.759
 2 V2       133.          -1.29  0.709 -1.82  0.0682 0.398
 3 V3       101.           0.486 0.585  0.831 0.406  0.835
 4 V4       149.          -0.780 0.550 -1.42  0.156  0.546
 5 V5       141.          -0.850 0.528 -1.61  0.107  0.478
 6 V6        59.4          0.201 0.669  0.301 0.764  0.959
 7 V7        91.8          0.453 0.580  0.781 0.435  0.835
 8 V8        97.4         -0.164 0.689 -0.238 0.812  0.961
 9 V9        90.2         -1.10  0.618 -1.78  0.0748 0.399
10 V10       86.1          1.33  0.562  2.37  0.0177 0.314

from tidybulk.

stemangiola avatar stemangiola commented on May 25, 2024

Maybe the error is in the contrast for deseq2 maybe I am getting the statistics for the confounder and not the factor of interest

from tidybulk.

stemangiola avatar stemangiola commented on May 25, 2024

Solved, yes was the contract problem. :) Sorry to bother

from tidybulk.

mblue9 avatar mblue9 commented on May 25, 2024

Sorry only got to reply now yep it needs to be e.g. ~type + condition for deseq2 as shown in the vignette, glad you figured it out 👍

from tidybulk.

stemangiola avatar stemangiola commented on May 25, 2024

But is that documented anywhere in tidybulk so deseq2 users will know?

no :)

from tidybulk.

stemangiola avatar stemangiola commented on May 25, 2024

primary factor + additional factors

What about

"
@param .formula A formula representing the desired linear model. If there is more than one factor (and contrasts are not specified), they should be in the order factor of interest + additional factors
"

contrasts input in a similar format

I though about that, but what about (?)

group_1 - 1/3  * group_2 - 1/3 * group_3 - 1/3 * hroup_4

This is a classic constrast for "one versus everything else"

The parsing of arbitrary contrasts from edgeR into deseq2 is not trivial, and I don't know deseq2 enough to realise if they could be mapped 1 to 1

from tidybulk.

mblue9 avatar mblue9 commented on May 25, 2024

"
@param .formula A formula representing the desired linear model. If there is more than one factor and contrasts are not specified, they should be in the order factor of interest + additional factors
"

sounds good to me

The parsing of arbitrary contrasts from edgeR into deseq2 is not trivial, and I don't know deseq2 enough to realise if they could be mapped 1 to 1

I don't know enough about deseq2 contrasts at this point either to be sure about that so better just keep it as you have it.

from tidybulk.

stemangiola avatar stemangiola commented on May 25, 2024

sounds good to me

cool, I edited with parenthesis. See what you think.

from tidybulk.

mblue9 avatar mblue9 commented on May 25, 2024

ok sounds good, just to check, the bit in parenthesis "(and contrasts are not specified)" means they can put the factors in any order if they specify contrasts and all the methods can handle that? e.g.

    test_differential_abundance(
      .formula = ~ 0 + cell + dex,
      .contrasts = c("dextrt - dexuntrt")

gives the same result as below for all methods

    test_differential_abundance(
      .formula = ~ 0 + dex + cell,
      .contrasts = c("dextrt - dexuntrt")

from tidybulk.

stemangiola avatar stemangiola commented on May 25, 2024

ok sounds good, just to check, the bit in parenthesis "(and contrasts are not specified)" means they can put the factors in any order if they specify contrasts and all the methods can handle that? e.g.

    test_differential_abundance(
      .formula = ~ 0 + cell + dex,
      .contrasts = c("dextrt - dexuntrt")

gives the same result as below for all methods

    test_differential_abundance(
      .formula = ~ 0 + dex + cell,
      .contrasts = c("dextrt - dexuntrt")

Correct (I think, do not have proven evidence).

The point is the sentence above makes sense when contrasts are not specified, if they are specified you directly define you factor of interest etc.. If not we assume the first covariate is your factor of interest.

from tidybulk.

stemangiola avatar stemangiola commented on May 25, 2024

Hello @mblue9, should I take action about this issue? I don't want to duplicate changes if you have already. Otherwise, I will do no problem!

from tidybulk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.