The dsc-log-fold-change from stephenslab

dsc-log-fold-change's Issues

Simplify logic of the pipelines

Subject: pipe_null and pipe_power
Goal: simply the logic so that these two pipelines call the same score modules

Curent version

  define:
    data: data_poisthin_null, data_poisthin_power
    method: t_test, wilcoxon
    score: type_one_error, pval_adj, fdr, auc
  run:
    pipe_null: data_poisthin_null * method * type_one_error
    pipe_power: data_poisthin_power * method * pval_adj * (fdr * auc)

This is the first idea I tried but I got errors...

  define:
    data: data_poisthin_null, data_poisthin_power
    method: t_test, wilcoxon
    score: type_one_error, pval_adj * (fdr, auc)
  run:
    pipe_null: data_poisthin_null * method * score
    pipe_power: data_poisthin_power * method * score

qvalue function error

I got this error: Error in smooth.spline(lambda, pi0, df = smooth.df)

This issue has been discussed on the qvalue package GitHub site in multiple threads: StoreyLab/qvalue#9
StoreyLab/qvalue#13

qvalue returns this error when the p-value distribution is truncated, i.e., not spanning the entire range of [0,1]. The authors of the qvalue package offers an alternative function qvalue_truncp to estimate qvalue in this situation.

I opened this issue to remind myself to write an errorHandling function for truncated p-value distribution.

Methods to be added

Manage R package dependencies (conda)

Need a better way of calling R packages and their dependencies. Now I call inside R scripts.

Perhaps we can make a conda script to keep track of all these packages. What do you think? @jdblischak

Parameters to vary for simulated data sets

We would like the simulation function vary the following parameters. Next to each parameter is a link to relevant existing code:

Number of samples in each group
Number of genes
Independent vs. dependent inter-gene correlations (source)
Distribution of effect sizes (source)
Number of control genes (source)
Latent confounding (source)

cc @jhsiao999 @stephens999

Document pipeline variables and module input/output at the top of `benchmark.dsc`

document pipeline variables
document module input/output

Simulate module: allow sample sizes to vary

Problem: currently the module input is the total number of samples, then poisthin function splits this into two groups of equal sample size.
Ideas: add to poisthin function a new parameter of the ratio of sample size

syntax for logical pipeline variable

@gaow

In dsc-log-fold-change/dsc/benchmark.dsc, I'd to have in data_poisthin a logical argument shuffle_sample. This dsc module calls pois_thin function in the module folder. The syntax now only gives me one file. I expect two files: one when shuffle_sample=TRUE and one when shuffle_sample=FALSE. How do I do this? Thanks!

data_poisthin: R(counts = readRDS(dataFile)) + \
       dataSimulate.R + \
       R(set.seed(seed=seed); out = poisthin(mat=t(counts), nsamp=nsamp, ngene=ngene, gselect=gselect, shuffle_sample=shuffle_sample, signal_dist=signal_dist, prop_null = prop_null)) + \
       R(groupInd = out$X[,2]; Y1 = t(out$Y[groupInd==1,]); Y2 = t(out$Y[groupInd==0,]))
  dataFile: "data/pbmc_counts.rds"
  seed: R{2:101}
  nsamp: 90
  ngene: 1000
  prop_null: .5, .9, 1
  shuffle_sample: T, F
  gselect: "random"
  signal_dist: "bignormal"
  $Y1: Y1
  $Y2: Y2
  $beta: out$beta

Declare module specific R libraries

@jhsiao999 I've got 2 comments for DESeq2 module:

You might want to add a @CONF: R_libs=DESeq2 or @CONF: R_libs = DESeq2, voom, ... if there are more libraries used. That way the library not only gets automatically installed when others run it, but also gets loaded. Otherwise even your code as is would not work due to missing load lib statements.
Maybe consider putting the code under code folder instead? See my get_data.R example

Now you can make the changes and add your module to benchmark.dsc and give it a go using --target get_data * run_DESeq2 :)

stephenslab / dsc-log-fold-change Goto Github PK

dsc-log-fold-change's People

Contributors

Stargazers

Watchers

Forkers

dsc-log-fold-change's Issues

Simplify logic of the pipelines

qvalue function error

Methods to be added

Manage R package dependencies (conda)

Parameters to vary for simulated data sets

Document pipeline variables and module input/output at the top of `benchmark.dsc`

Simulate module: allow sample sizes to vary

syntax for logical pipeline variable

Declare module specific R libraries

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent