hms-dbmi / scde Goto Github PK

View Code? Open in Web Editor NEW

171.0 25.0 64.0 14.31 MB

R package for analyzing single-cell RNA-seq data

Home Page: http://pklab.med.harvard.edu/scde

License: Other

C 0.34% R 64.37% C++ 7.76% CSS 0.47% JavaScript 27.05%

bioinformatics single-cell transcriptomics heterogenity analysis r ngs

scde's Introduction

Overview of SCDE

The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify aspects of transcriptional heterogeneity among single cells.

The overall approach to the differential expression analysis is detailed in the following publication:
"Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi:10.1038/nmeth.2967)

The overall approach to pathways and gene set overdispersion analysis is detailed in the following publication: "Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis" (Fan J, Salathia N, Liu R, Kaeser G, Yung Y, Herman J, Kaper F, Fan JB, Zhang K, Chun J, and Kharchenko PV, Nature Methods, doi:10.1038/nmeth.3734)

For additional installation information, tutorials, and more, please visit the SCDE website ☞ and the Bioconductor package page

Note: We recommend that users also refer to the package pagoda2. While we do continue to maintain the Bioconductor package scde, we don't have the bandwidth to address all bugs and feature requests reported in this repo.

Sample analyses and images

Single cell error modeling

scde fits individual error models for single cells using counts derived from single-cell RNA-seq data to estimate drop-out and amplification biases on gene expression magnitude.

Differential expression analysis

             lb   mle     ub    ce     Z    cZ
Dppa5a    8.075 9.965 11.541 8.075 7.160 5.968
Pou5f1    5.357 7.208  9.178 5.357 7.160 5.968
Gm13242   5.672 7.681  9.768 5.672 7.159 5.968
Tdh       5.829 8.075 10.281 5.829 7.159 5.968
Ift46     5.435 7.366  9.217 5.435 7.150 5.968

scde compares groups of single cells and tests for differential expression, taking into account variability in the single cell RNA-seq data due to drop-out and amplification biases in order to identify more robustly differentially expressed genes.

Pathway and gene set overdispersion analysis

scde contains pagoda routines that characterize aspects of transcriptional heterogeneity in populations of single cells using pre-defined gene sets as well as 'de novo' gene sets derived from the data. Significant aspects are used to cluster cells into subpopulations. A graphical user interface can be deployed to interactively explore results. See examples from the PAGODA publication here. See analysis of the PBMC data from 10x Genomics here.

scde is maintained by Jean Fan and Evan Biederstedt of the Kharchenko Lab at the Department of Biomedical Informatics at Harvard Medical School.

Contributing

We welcome any bug reports, enhancement requests, and other contributions. To submit a bug report or enhancement request, please use the scde GitHub issues tracker. For more substantial contributions, please fork this repo, push your changes to your fork, and submit a pull request with a good commit message. For more general discussions or troubleshooting, please consult the scde Google Group.

Citation

If you find scde useful for your publication, please cite:

Kharchenko P, Fan J, Biederstedt E (2023). scde: Single Cell Differential Expression. 
R package version 2.27.1, http://pklab.med.harvard.edu/scde.

scde's People

Contributors

Stargazers

Watchers

scde's Issues

Memory usage with multithreading

Hi Dr. Fan et. al.,

When we used SCDE on a dataset with multiple cores, we noticed there seemedd to be an issue with high memory usage. I think this is related to a long-standing bug in R parallelization here: https://github.com/tdhock/mclapply-memory

I modified the parallelization loop in scde's calculate.individual.models function: https://github.com/traversc/scde/commit/be34c81fb55bad46de178fac72904475f2d148bf

This seemed to reduce memory usage for us when run on 40 cores.

Incompatibility with RStudio error

Commit c3f0537, seems to have broken RStudio error reporting: "ERROR: port is already being used. The PAGODA app is currently incompatible with RStudio. Please try running the interactive app the R console."

tools:::httpdPort() returns 14879 in RStudio since it uses a port.

Allow pagoda.reduce.loading.redundancy to be used with custom colors

Hi Jean and Peter,

I would like to see the clustered aspects of pagoda.reduce.loading.redundancy() in custom colors (from RColorBrewer). Unfortunately, pagoda.reduce.loading.redundancy doesn't know in advance how many clusters will be created, so I tried plotting the tamr-object with pagoda.view.aspects afterwards. I was able to use the row.cols argument with custom colors, since the number of colors needed equals the number of clusters, which is length(tamr$cname). Nevertheless, the clustering itself is missing, or only available inside pagoda.reduce.loading.redundancy(). See below for the original plot of the pagoda.reduce.loading.redundancy()-call

and my pagoda.view.aspects call with custom colors:

The easiest way is to make the clustering available through a new item returned by pagoda.reduce.loading.redundancy? Or am I missing something else? Please let me know what you think.

Best,
Jens

Tutorial update

From Nikolas:

"The last lines of the tutorial with the tSNE embedding don't quite work out of the box. You may want to substitute with the following:

tSNE.pagoda <- Rtsne(cell.clustering$distance,is_distance=T,initial_dims=100,perplexity=10)

embed.tSNE <- tSNE.pagoda$Y

rownames(embed.tSNE) <- cell.clustering$clustering$labels

app <- make.pagoda.app(tamr2, tam, varinfo, go.env, pwpca, clpca, col.cols = col.cols, cell.clustering = hc, title = "NPCs", embedding = embed.tSNE)"

Invalid row.names length error in error models

I ran scde about 6 months ago on a small dataset (Fluidigm C1 data) with an older scde version but just installed R-3.3.3 with the latest scde-1.99.2 with the proper flexmix package. Now with a 4810-cell dataset, i'm getting this classification error after a 5-hr run time on a box using 16 of 32-cores (128gb memory).

....snippped....
Classification: weighted
1 Log-likelihood : -12124.1522
2 Log-likelihood : -12011.5585
3 Log-likelihood : -12124.4153
4 Log-likelihood : -12124.4519
5 Log-likelihood : -12124.4040
6 Log-likelihood : -12124.3947
converged
ERROR fitting of 10 out of 2778 cells resulted in errors reporting remaining 2768 cells
Error in row.names<-.data.frame(*tmp*, value = value) :
invalid 'row.names' length
Calls: scde.error.models ... rownames<- -> row.names<- -> row.names<-.data.frame

Note: this error was found previously in #7 but the original poster solved the problem with a fresh R-3.2.1 installation. My sessionInfo is below

R version 3.3.3 (2017-03-06)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] scde_1.99.1 flexmix_2.3-13 lattice_0.20-35

loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 nloptr_1.0.4
[3] RColorBrewer_1.1-2 Lmoments_1.2-3
[5] base64enc_0.1-3 BatchJobs_1.6
[7] iterators_1.0.8 tools_3.3.3
[9] lme4_1.1-13 digest_0.6.12
[11] RSQLite_1.1-2 memoise_1.1.0
[13] checkmate_1.8.2 nlme_3.1-131
[15] mgcv_1.8-17 Matrix_1.2-10
[17] foreach_1.4.3 DBI_0.6-1
[19] parallel_3.3.3 SparseM_1.77
[21] RcppArmadillo_0.7.600.1.0 stringr_1.2.0
[23] extRemes_2.0-8 MatrixModels_0.4-1
[25] stats4_3.3.3 grid_3.3.3
[27] nnet_7.3-12 Biobase_2.26.0
[29] distillery_1.0-2 Rook_1.1-1
[31] fail_1.3 BiocParallel_1.0.3
[33] minqa_1.2.4 limma_3.22.7
[35] car_2.1-4 sendmailR_1.2-1
[37] magrittr_1.5 edgeR_3.8.6
[39] splines_3.3.3 backports_1.0.5
[41] BBmisc_1.11 pcaMethods_1.56.0
[43] codetools_0.2-15 modeltools_0.2-21
[45] BiocGenerics_0.12.1 MASS_7.3-47
[47] pbkrtest_0.4-7 RMTstat_0.3
[49] quantreg_5.33 brew_1.0-6
[51] stringi_1.1.5 rjson_0.2.15
[53] Cairo_1.5-9

Documentation issues

Undocumented code objects:
‘make.pagoda.app’ ‘o.ifm’ ‘pagoda.reduce.redundancy’
‘pagoda.view.aspects’ ‘scde.edff’

Undocumented data sets:
‘o.ifm’ ‘scde.edff’

Undocumented arguments in documentation object 'pagoda.cluster.cells'
‘verbose’ ‘return.details’

Undocumented arguments in documentation object 'pagoda.reduce.loading.redundancy'
‘abs’

Documented arguments not in \usage in documentation object 'pagoda.show.pathways':
‘show.Colv’

Undocumented arguments in documentation object 'scde.browse.diffexp'
‘port’

Undocumented arguments in documentation object 'scde.error.models'
‘linear.fit’ ‘local.theta.fit’ ‘theta.fit.range’

Undocumented arguments in documentation object 'scde.posteriors'
‘ensemble.posterior’

Argument items with no description in Rd object 'pagoda.show.pathways':
‘...’

knn.error.models fails - compatibility problem?

PAGODA used to work great for me, but for a few weeks now it fails when building the error models. I haven't been able to track down the issue, but I suspect some compatibility problem after an update to our R environment:

> library(scde)
> data(pollen)
> cd <- clean.counts(pollen)
> x <- gsub("^Hi_(.*)_.*", "\\1", colnames(cd))
> l2cols <- c("coral4", "olivedrab3", "skyblue2", "slateblue3")[as.integer(factor(x, levels = c("NPC", "GW16", "GW21", "GW21+3")))]
> knn <- knn.error.models(cd, k = ncol(cd)/4, n.cores = 1, min.count.threshold = 2, min.nonfailed = 5, max.model.plots = 10)
Error in FUN(X[[i]], ...) : 
  trying to get slot "logLik" from an object of a basic class ("function") with no slots
Error in FUN(X[[i]], ...) : 
  trying to get slot "logLik" from an object of a basic class ("function") with no slots
#
# ... 62 more of these ...
#
ERROR encountered in building a model for cell Hi_NPC_1 - skipping the cell. Error:
Error in FUN(X[[i]], ...) : 
  trying to get slot "logLik" from an object of a basic class ("function") with no slots
#
# ... 63 more of these ...
#
ERROR fitting of 64 out of 64 cells resulted in errors reporting remaining 0 cells
ERROR encountered during model fit plot outputs:
subscript out of bounds

any ideas? help would be greatly appreciated.

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] scde_2.2.0      flexmix_2.3-14  lattice_0.20-35

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11              edgeR_3.16.5              splines_3.4.0             BiocGenerics_0.20.0       MASS_7.3-47              
 [6] BiocParallel_1.8.2        rjson_0.2.15              brew_1.0-6                RcppArmadillo_0.7.900.2.0 minqa_1.2.4              
[11] distillery_1.0-2          car_2.1-4                 Rook_1.1-1                Lmoments_1.2-3            tools_3.4.0              
[16] pbkrtest_0.4-7            nnet_7.3-12               parallel_3.4.0            RMTstat_0.3               grid_3.4.0               
[21] Biobase_2.34.0            nlme_3.1-131              mgcv_1.8-17               quantreg_5.33             modeltools_0.2-21        
[26] MatrixModels_0.4-1        lme4_1.1-13               Matrix_1.2-9              nloptr_1.0.4              RColorBrewer_1.1-2       
[31] extRemes_2.0-8            limma_3.30.13             compiler_3.4.0            pcaMethods_1.66.0         stats4_3.4.0             
[36] locfit_1.5-9.1            SparseM_1.77              Cairo_1.5-9

Errors when running tutorial code

Hello,

I downloaded the scde package from the development version from GitHub (version 1.99.4) and am trying to learn how to use it so I can use it on another dataset. When I try running the tutorial code verbatim, I am getting errors.

The code I am running is:

library(scde)
# load example dataset
data(es.mef.small)
# factor determining cell types
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(es.mef.small)), levels = c("ESC", "MEF"))
# the group factor should be named accordingly
names(sg) <- colnames(es.mef.small)  
table(sg)
# clean up the dataset
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
# EVALUATION NOT NEEDED
# calculate models
o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 1, threshold.segmentation = TRUE, save.crossfit.plots = FALSE, save.model.plots = FALSE, verbose = 1)

However after this last line, I'm receiving the same error over and over again. For the sake of brevity I've only included the first few instances of the error:

> library(scde)
Loading required package: flexmix
Loading required package: lattice
> # load example dataset
> data(es.mef.small)
> # factor determining cell types
> sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(es.mef.small)), levels = c("ESC", "MEF"))
> # the group factor should be named accordingly
> names(sg) <- colnames(es.mef.small)  
> table(sg)
sg
ESC MEF 
 20  20 
> # clean up the dataset
> cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
> # EVALUATION NOT NEEDED
> # calculate models
> o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 1, threshold.segmentation = TRUE, save.crossfit.plots = FALSE, save.model.plots = FALSE, verbose = 1)
cross-fitting cells.
number of pairs:  190 
number of pairs:  190 
total number of pairs:  380 
cross-fitting 380 pairs:
building individual error models.
adjusting library size based on 2000 entries
fitting ESC models:
1 : ESC_10
Classification: weighted 
Error in FUN(X[[i]], ...) : 
  trying to get slot "logLik" from an object of a basic class ("function") with no slots
2 : ESC_11
Classification: weighted 
Error in FUN(X[[i]], ...) : 
  trying to get slot "logLik" from an object of a basic class ("function") with no slots
3 : ESC_12
Classification: weighted 
Error in FUN(X[[i]], ...) : 
  trying to get slot "logLik" from an object of a basic class ("function") with no slots

I've also tried running the same code using the scde package on Bioconductor (version 2.5.0), and am getting (different) errors as well.

> library(scde)
Loading required package: flexmix
Loading required package: lattice
> # load example dataset
> data(es.mef.small)
> # factor determining cell types
> sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(es.mef.small)), levels = c("ESC", "MEF"))
> # the group factor should be named accordingly
> names(sg) <- colnames(es.mef.small)  
> table(sg)
sg
ESC MEF 
 20  20 
> # clean up the dataset
> cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
> # EVALUATION NOT NEEDED
> # calculate models
> o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 1, threshold.segmentation = TRUE, save.crossfit.plots = FALSE, save.model.plots = FALSE, verbose = 1)
cross-fitting cells.
number of pairs:  190 
number of pairs:  190 
total number of pairs:  380 
cross-fitting 380 pairs:
building individual error models.
adjusting library size based on 2000 entries
fitting ESC models:
1 : ESC_10
Error in checkSlotAssignment(object, name, value) : 
  assignment of an object of class “function” is not valid for slot ‘defineComponent’ in an object of class “FLXMRglmC”; is(value, "expression") is not TRUE
2 : ESC_11
Error in checkSlotAssignment(object, name, value) : 
  assignment of an object of class “function” is not valid for slot ‘defineComponent’ in an object of class “FLXMRglmC”; is(value, "expression") is not TRUE
3 : ESC_12
Error in checkSlotAssignment(object, name, value) : 
  assignment of an object of class “function” is not valid for slot ‘defineComponent’ in an object of class “FLXMRglmC”; is(value, "expression") is not TRUE

Error when running pagoda.gene.clusters

Hello,

When I run the 'de novo' section in the tutorial for Pagoda on the pollen data, I get the following error:
Error in eval(expr, envir, enclos) : object 'n' not found

Upon debugging, the error stems from this part of the code:
x <- RMTstat::WishartMaxPar(n.cells, varm$n)
varm$pm <- x$centering - (1.206533574582) * x$scaling
varm$pv <- (1.607781034581) * x$scaling
clvlm <- lm(var ~ 0 + pm + n, data = varm)

Thanks,
Sam

Correction to error in knn.error.models has not yet been added to R package

In version 2.2.0 of the R package, the parameter save.model.plots to knn.error.models is unused and instead it saves plots only if (length(vic) < length(ids)) {.
In R/functions.R (line 1264) this was corrected to if (save.model.plots) { by commit 20c91c9 on January 16th, 2016, but it apparently wasn't incorporated into the R package...

Error in pagoda.cluster.cells

Hi,
pagoda.cluster.cells produces NA's and a subsequent error. I think the problem is in line 2646 of functions.R, which states: gw <- gw/gw
This produces an array of 1, taking out all variation.
Was it supposed to be: gw <- gw/max(gw, na.rm = T) (normalization)?
Thanks in advance!
Lotte

Error in scde.error.models

I have tried to run scde.error.models but all the time i achieve object VR_set_net not found error:

cross-fitting cells.
number of pairs: 435
number of pairs: 351
total number of pairs: 786
cross-fitting 786 pairs:
building individual error models.
adjusting library size based on 2000 entries
fitting WT, none models:
1 : CHFY019_S1_L012
Classification: weighted
Error in nnet::nnet.default(x, y, w, mask = mask, size = 0, skip = TRUE, :
object 'VR_set_net' not found
2 : CHFY020_S2_L012
Classification: weighted
Error in nnet::nnet.default(x, y, w, mask = mask, size = 0, skip = TRUE, :
object 'VR_set_net' not found

How can i fix it?

Problem with applying scde to data with more than 2 groups

Sry, wasn't an issue, I just misunderstood the vignette ;) Can be deleted...

Significant corrected Z-score?

Thank you for providing this great pipeline!

Could you recommend which cutoff to use?
What would be considered a significant corrected Z-score?

Is it possible to subtract aspect of a single gene?

Hi Jean-Fan,

in our sequencing, we see that a single, highly expressed marker gene is the main driver of overdispersion in many pathways related to it. It seems that the marker gene is expressed lower in a subpopulation of the cells, but still at a high level. I tried to subtract the aspects of those pathways, but they are still overdispersed after pagoda.subtract.aspect. Is there a way to explicitly account for the variability of a single gene?

Best,
Jens

scde.expression.difference outputs when batch defined

Hi Jean,
I have a question that in scde.expression.difference when i use batch, the result gives three lists batch.adjusted, batch.effect and results. which one should i use to find significant in differential expression with batch effect elimination?

ERROR in PAGODA

Hi!

Thank you so much for providing this code. Very useful!

I get an error when trying to run pagoda. This is something that didn't happen in the past.

When I run
tam <- pagoda.top.aspects(pwpca, clpca, n.cells = NULL, z.score = qnorm(0.01/2, lower.tail = FALSE))

I get the following error:
Error in tapply(abs(unlist(gl)), as.factor(unlist(lapply(gl, names))), :
arguments must have same length

Any idea how to solve it?

Thanks,
Livnat

Error in FUN(X[[i]], ...)

run command line: scde.error.models(counts = cd, groups = sg, n.cores = 1, threshold.segmentation = TRUE, save.crossfit.plots = FALSE, save.model.plots = FALSE, verbose = 1)

error info: Error in FUN(X[[i]], ...) :
trying to get slot "logLik" from an object of a basic class ("function") with no slots

thanks!~

Problem running scde on linux cluster. Fortran "dqrls" not resolved

Hi there,

I am trying to run SCDE on a Linux computational cluster.

I am getting the following error when running on a single core:

Error in .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars,  :
  "dqrls" not resolved from current namespace (scde)
Calls: scde.error.models ... <Anonymous> -> glm.nb.fit -> glm.fitter -> .Fortran
Execution halted

Which seems to be a similar issue to this thread:
(https://github.com/hms-dbmi/scde/issues/21)

I have recompiled SCDE from the binary of the developer version but this did not solve the issue. I also tried revertingg the flexmix version and using it with the stable scde version. I tried both of this on the cluster as well on my local machine and both gave the same error.

When trying with multiple cores, I am getting the same problem described here:
(https://github.com/hms-dbmi/scde/issues/31)

Any suggestions what I can try for either single core but preferably to get the multicore command do work?

My sessionInfo() :
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux release 6.3 (Carbon)

Matrix products: default
BLAS: /home/apps/Logiciels/R/3.4.0-gcc/lib64/R/lib/libRblas.so
LAPACK: /home/apps/Logiciels/R/3.4.0-gcc/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats4 grDevices datasets parallel stats graphics utils
[8] methods base

other attached packages:
[1] scde_2.7.0 flexmix_2.3-13
[3] lattice_0.20-35 bindrcpp_0.2
[5] qvalue_2.10.0 edgeR_3.20.1
[7] limma_3.34.1 gtools_3.5.0
[9] scater_1.6.0 SingleCellExperiment_1.0.0
[11] SummarizedExperiment_1.8.0 DelayedArray_0.4.1
[13] matrixStats_0.52.2 GenomicRanges_1.30.0
[15] GenomeInfoDb_1.14.0 IRanges_2.12.0
[17] S4Vectors_0.16.0 ggrepel_0.7.0
[19] RColorBrewer_1.1-2 ggsci_2.8
[21] dplyr_0.7.4 tidyr_0.7.2
[23] data.table_1.10.4-3 Seurat_2.1.0
[25] Matrix_1.2-9 cowplot_0.8.0
[27] ggplot2_2.2.1 pander_0.6.1
[29] knitr_1.16 bigmemory_4.5.31
[31] bigmemory.sri_0.1.3 Biobase_2.38.0
[33] BiocGenerics_0.24.0

loaded via a namespace (and not attached):
[1] shinydashboard_0.6.1 R.utils_2.6.0
[3] lme4_1.1-13 RSQLite_2.0
[5] AnnotationDbi_1.40.0 htmlwidgets_0.9
[7] grid_3.4.0 trimcluster_0.1-2
[9] ranger_0.8.0 BiocParallel_1.12.0
[11] Rtsne_0.13 munsell_0.4.3
[13] codetools_0.2-15 ica_1.0-1
[15] colorspace_1.3-2 ROCR_1.0-7
[17] robustbase_0.92-7 dtw_1.18-1
[19] distillery_1.0-4 NMF_0.20.6
[21] labeling_0.3 lars_1.2
[23] tximport_1.6.0 GenomeInfoDbData_0.99.1
[25] mnormt_1.5-5 bit64_0.9-7
[27] extRemes_2.0-8 rhdf5_2.22.0
[29] diptest_0.75-7 R6_2.2.2
[31] doParallel_1.0.11 ggbeeswarm_0.6.0
[33] VGAM_1.0-4 locfit_1.5-9.1
[35] RcppArmadillo_0.8.100.1.0 bitops_1.0-6
[37] assertthat_0.2.0 SDMTools_1.1-221
[39] scales_0.4.1 nnet_7.3-12
[41] ggjoy_0.3.0 beeswarm_0.2.3
[43] gtable_0.2.0 Cairo_1.5-9
[45] rlang_0.1.4 MatrixModels_0.4-1
[47] scatterplot3d_0.3-40 splines_3.4.0
[49] lazyeval_0.2.0 ModelMetrics_1.1.0
[51] acepack_1.4.1 brew_1.0-6
[53] checkmate_1.8.3 reshape2_1.4.2
[55] backports_1.1.0 httpuv_1.3.5
[57] Hmisc_4.0-3 caret_6.0-76
[59] tools_3.4.0 gridBase_0.4-7
[61] gplots_3.0.1 proxy_0.4-17
[63] Rcpp_0.12.14 plyr_1.8.4
[65] base64enc_0.1-3 progress_1.1.2
[67] zlibbioc_1.24.0 purrr_0.2.4
[69] RCurl_1.95-4.8 prettyunits_1.0.2
[71] rpart_4.1-11 pbapply_1.3-3
[73] viridis_0.4.0 cluster_2.0.6
[75] magrittr_1.5 SparseM_1.77
[77] pcaMethods_1.70.0 mvtnorm_1.0-6
[79] mime_0.5 xtable_1.8-2
[81] pbkrtest_0.4-7 XML_3.98-1.9
[83] mclust_5.3 RMTstat_0.3
[85] gridExtra_2.2.1 compiler_3.4.0
[87] biomaRt_2.34.0 tibble_1.3.4
[89] KernSmooth_2.23-15 minqa_1.2.4
[91] R.oo_1.21.0 htmltools_0.3.6
[93] segmented_0.5-2.1 mgcv_1.8-17
[95] Formula_1.2-2 tclust_1.2-7
[97] DBI_0.7 diffusionMap_1.1-0
[99] MASS_7.3-47 fpc_2.1-10
[101] car_2.1-6 R.methodsS3_1.7.1
[103] gdata_2.18.0 bindr_0.1
[105] igraph_1.1.2 pkgconfig_2.0.1
[107] sn_1.5-0 registry_0.3
[109] numDeriv_2016.8-1 foreign_0.8-67
[111] foreach_1.4.3 vipor_0.4.5
[113] rngtools_1.2.4 pkgmaker_0.22
[115] XVector_0.18.0 stringr_1.2.0
[117] digest_0.6.12 tsne_0.1-3
[119] Rook_1.1-1 htmlTable_1.9
[121] kernlab_0.9-25 Lmoments_1.2-3
[123] shiny_1.0.3 quantreg_5.34
[125] modeltools_0.2-21 rjson_0.2.15
[127] nloptr_1.0.4 nlme_3.1-131
[129] viridisLite_0.2.0 DEoptimR_1.0-8
[131] survival_2.41-3 glue_1.1.1
[133] FNN_1.1 prabclus_2.2-6
[135] iterators_1.0.8 bit_1.1-12
[137] class_7.3-14 stringi_1.1.5
[139] mixtools_1.1.0 blob_1.1.0
[141] latticeExtra_0.6-28 caTools_1.17.1
[143] memoise_1.1.0 irlba_2.2.1
[145] ape_4.1

core dump error

Hi!

thanks for this great package! I am trying to do the padoga vignette but I find this problem in this line:

varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = TRUE)

the error:

error: Mat::init(): requested size is not compatible with row vector layout

I see travis compilation shows the same but in other step.

Do you know what could be wrong?

my R is 3.2.1

thanks for your time in advance

How to load 10x data (.tsv files) into scde?

Can the output from 10x chromium be loaded into scde as a data frame?
The output is 3 files:
barcodes.tsv
genes.tsv
matrix.mtx

Correct construction of go.env

Hi all,
I'm trying to use the biomaRt-package together with GO.db to construct a proper go.env environment for evaluation of overdispered gene sets (like in http://hms-dbmi.github.io/scde/pagoda.html). Can someone clarify what the structure of go.env is, before it goes into the list2env function in the example code on the web site mentioned above? I failed to reproduce the code given there and a head() on go.env would be enough for me I guess.
Thanks a lot,
Jens

Incorrect version in DESCRIPTION file for v1.99.1

I downloaded the latest release 1.99.1 from http://hms-dbmi.github.io/scde/package.html, however, in the DESCRIPTION file it says that the version is 0.99.1. Is this indeed version 1.99.1, and just a case where the version number wasn't updated?

Potential user errors; recommendations for user-centered design.

If column colors are not provided in the app creation (ie. col.cols = NULL), the app still gets created but the heatmaps will not be outputting and throws error:

Error in nrow(results$colcol):1 : argument of length 0

Also in vignette, col.cols is made but never used. Instead we use Pollen's annotations.

Suspected error in pagoda.show.pathways

See https://groups.google.com/d/msg/singlecellstats/tqvc3cr-8Hs/7P3PRfdZCQAJ for further explanation.

Best,
Jens

num core > 1 causes issues

Multiple users have experienced errors when increasing the number of cores to > 1
https://groups.google.com/forum/#!topic/singlecellstats/Miwy1Jg6PRU
https://groups.google.com/forum/#!topic/singlecellstats/ij2GWC1JLr0

The error may stem from updates to mclapply or bclapply.

Error with LogLik() function

Hi,

I am getting the following error when I run this command even with the data specified in the tutorial.

o.ifm <- scde.error.models(counts = chk, groups = sg, n.cores = 1, threshold.segmentation = TRUE, save.crossfit.plots = FALSE, save.model.plots = FALSE, verbose = 1)

Error in FUN(X[[i]], ...) :
trying to get slot "logLik" from an object of a basic class ("function") with no slots

Any pointers to resolve it would be grately appreciated. I am currently using R version 3.3.1.

Thank you.
Swapna

error modeling knn takes up all cores

Hello,

I am trying to run the error modeling step on a sample with thousands of cells. I've increased K and min.nonfailed. But I find that even if I set n.cores = 1 , I have threads running on every core of the cluster. Furthermore I've never had the error modeling step finish.

Any thoughts on this issue?

RcppArmadillo does it still need to be pinned?

The vignette seems to run okay for me with the latest version of RcppArmadillo-- having a specific old version as a default makes it tough to include as a dependency. Do you think maybe the version could get unpinned?

Update differential expression tutorial

The tutorial that we actually had up (/var/www/scde/templates/tutorial/diffexp.html) was more detailed, didn’t include the shortcuts we had to put into the vignette to save time, and had an extra section on the adjusted distance measures.

Rmd file for it: /data2/peterk/lev/scde/v3/vignette/diffexp.Rmd

Correspondence between GitHub and Bioconductor verison numbers

Would it be possible to document the relationship between the version numbers of the GitHub releases and the Bioconductor releases?

There are 3 releases on GitHub: 1.99.0, 1.99.1, and 1.99.2

https://github.com/hms-dbmi/scde/releases

And the current GitHub version is 1.99.4.

There are 4 releases on Bioconductor: 2.0.1 (BioC 3.3), 2.2.0 (BioC 3.4), 2.4.1 (BioC 3.5), and 2.6.0 (BioC 3.6).

https://www.bioconductor.org/packages/3.3/bioc/html/scde.html
https://www.bioconductor.org/packages/3.4/bioc/html/scde.html
https://www.bioconductor.org/packages/3.5/bioc/html/scde.html
https://www.bioconductor.org/packages/3.6/bioc/html/scde.html

Documenting this somewhere in the README.md, NEWS, and/or the releases page would be convenient for debugging between versions. Thanks!

biocParallel error when installing

R 3.2.5

Installing from devtools gives me this error:

Installing BiocParallel
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD
INSTALL '/tmp/RtmpBv2SbE/devtools881174ee69a1/BiocParallel'
--library='/home/billylau/R/x86_64-pc-linux-gnu-library/3.2' --install-tests

installing source package ‘BiocParallel’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'BiocParallel', details:
call: makePSOCKcluster(spec, ...)
error: numeric 'names' must be >= 1
Error: loading failed
Execution halted
ERROR: loading failed

removing ‘/home/billylau/R/x86_64-pc-linux-gnu-library/3.2/BiocParallel’
Error: Command failed (1)

The same thing happens when directly instaling with biocLite.

Has anyone seen this problem before?

Error in names(rl)

I am attempting to process a dataset of about 1200 cells with scde. This works fine with a small subset of the data, but with the full dataset I get an error, which I presume is related to the cross fit failing on a relatively small number of pairs. Below is the call and resulting error as well as the outout of traceback(), and sessionInfo():

> o.ifm <- scde.error.models(counts=counts(eset),
+     groups=eset$Zone,
+     n.cores=40,
+     min.nonfailed = 30,
+     threshold.segmentation=TRUE,
+     save.crossfit.plots=FALSE,
+     save.model.plots=FALSE,
+     verbose=1)

cross-fitting cells.
number of pairs:  12403
reducing to a random sample of  5000  pairs
number of pairs:  9180
reducing to a random sample of  5000  pairs
number of pairs:  9591
reducing to a random sample of  5000  pairs
number of pairs:  8778
reducing to a random sample of  5000  pairs
number of pairs:  7260
reducing to a random sample of  5000  pairs
number of pairs:  10153
reducing to a random sample of  5000  pairs
number of pairs:  7626
reducing to a random sample of  5000  pairs
number of pairs:  9591
reducing to a random sample of  5000  pairs
number of pairs:  5671
reducing to a random sample of  5000  pairs
total number of pairs:  49956
cross-fitting 49956 pairs:
Error in names(rl) <- apply(cl, 2, paste, collapse = ".vs.") :
  'names' attribute [49956] must be the same length as the vector [49920]

>traceback()
2: calculate.crossfit.models(counts, groups, n.cores = n.cores,
       threshold.segmentation = threshold.segmentation, min.count.threshold = min.count.threshold,
       zero.lambda = zero.lambda, max.pairs = max.pairs, save.plots = save.crossfit.plots,
       min.pairs.per.cell = min.pairs.per.cell, verbose = verbose)
1: scde.error.models(counts = counts(eset), groups = eset$Zone,
       n.cores = 40, min.nonfailed = 30, threshold.segmentation = TRUE,
       save.crossfit.plots = FALSE, save.model.plots = FALSE, verbose = 1)

>sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] scater_1.2.0        ggplot2_2.2.1       Biobase_2.34.0
[4] BiocGenerics_0.20.0 scde_1.99.4         flexmix_2.3-13
[7] lattice_0.20-34

loaded via a namespace (and not attached):
 [1] viridis_0.3.4             edgeR_3.16.5
 [3] splines_3.3.2             shiny_1.0.0
 [5] assertthat_0.1            stats4_3.3.2
 [7] vipor_0.4.4               RSQLite_1.1-2
 [9] quantreg_5.29             limma_3.30.8
[11] digest_0.6.11             RColorBrewer_1.1-2
[13] minqa_1.2.4               colorspace_1.3-2
[15] htmltools_0.3.5           httpuv_1.3.3
[17] Matrix_1.2-7.1            plyr_1.8.4
[19] XML_3.98-1.5              biomaRt_2.30.0
[21] SparseM_1.74              zlibbioc_1.20.0
[23] xtable_1.8-2              scales_0.4.1
[25] brew_1.0-6                BiocParallel_1.8.1
[27] lme4_1.1-12               MatrixModels_0.4-1
[29] tibble_1.2                mgcv_1.8-16
[31] IRanges_2.8.1             car_2.1-4
[33] Lmoments_1.2-3            RMTstat_0.3
[35] nnet_7.3-12               lazyeval_0.2.0
[37] pbkrtest_0.4-6            magrittr_1.5
[39] distillery_1.0-2          mime_0.5
[41] memoise_1.0.0             nlme_3.1-128
[43] MASS_7.3-45               RcppArmadillo_0.7.600.1.0
[45] beeswarm_0.2.3            shinydashboard_0.5.3
[47] Cairo_1.5-9               Rook_1.1-1
[49] tools_3.3.2               data.table_1.10.0
[51] extRemes_2.0-8            matrixStats_0.51.0
[53] stringr_1.1.0             S4Vectors_0.12.1
[55] munsell_0.4.3             locfit_1.5-9.1
[57] AnnotationDbi_1.36.1      pcaMethods_1.66.0
[59] rhdf5_2.18.0              grid_3.3.2
[61] RCurl_1.95-4.8            nloptr_1.0.4
[63] tximport_1.2.0            rjson_0.2.15
[65] bitops_1.0-6              gtable_0.2.0
[67] DBI_0.5-1                 reshape2_1.4.2
[69] R6_2.2.0                  gridExtra_2.2.1
[71] dplyr_0.5.0               modeltools_0.2-21
[73] stringi_1.1.2             ggbeeswarm_0.5.3
[75] Rcpp_0.12.9

Any pointers on addressing this? I could work on matching the names attribute to exclude failing pairs, but I am not clear on what the downstream consequences are of the failed pairs, or whether failing pairs suggests a deeper problem with the data and the need to remove those cells.

make.pagoda.app bug if clpca is NULL

Currently:

if (!is.null(clpca)) {
    set.env <- list2env(c(as.list(env), clpca$clusters))  
} 
sa <- ViewPagodaApp$new(fres, df, gene.df, varinfo$mat, varinfo$matw,  set.env, name = title, trim = 0, batch = varinfo$batch)

Note that if clpca=NULL (which it is by default), then there is no set.env object and the following error will be produced Error in .Object$initialize(...) : object 'set.env' not found

Will need to add else statement to make set.env = env

Possible confusion between log10 and log for scde.expression.magnitude

Hi Jean,

recently, I plotted the scde.expression.magnitude for some marker genes in our single-cell experiments. Naturally, I labelled the x-axis as log10(FPM), but observed fpm values higher than 8 made me think. The scde.expression.magnitude function returns a numeric from the log() function, which calculates the natural logarithm of a number. In some of your examples (http://hms-dbmi.github.io/scde/diffexp.html, see "More detailed functions"), you also have log10 in the axis label - I hope those values are not confused with the logs from o.fpm object created.

Best,
Jens :)

Error in checkSlotAssignment(object, name, value)

running scde.error.models function using es.mef.small data，error information：
fitting ESC models:
1 : ESC_10
Error in checkSlotAssignment(object, name, value) :
assignment of an object of class “function” is not valid for slot ‘defineComponent’ in an object of class “FLXMRglmC”; is(value, "expression") is not TRUE

ERROR install SCDE

Hi,

I followed the instructions, successfully installed all the dependent packages but got following error when installing SCDE:

Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/scde/libs/scde.so':
dlopen(/Library/Frameworks/R.framework/Versions/3.3/Resources/library/scde/libs/scde.so, 6): Symbol not found: wrapper_ddot
Referenced from: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/scde/libs/scde.so
Expected in: flat namespace
in /Library/Frameworks/R.framework/Versions/3.3/Resources/library/scde/libs/scde.so

Could anyone help me figure it out?

Thanks!!

App incompatibility with R-Studio

Deploying Rook app appears to have issues from RStudio:

> show.app(app, "pollen", browse = TRUE, port = 1468)
Error in paste("http://", listenAddr, ":", listenPort, appList[[i]]$path,  : 
  cannot coerce type 'closure' to vector of type 'character'

Error does not appear for same commands run in R on command line.

License unclear

The DESCRIPTION file declares the GPL version 2 to be the license for SCDE, but the license.txt file is a non-free license incompatible with the GPL. Which license really applies to SCDE?

NAMESPACE issue, compiled code not found

Error shared object ‘plyr.so’ not found when running clpca <- pagoda.gene.clusters(varinfo,trim=7.1/ncol(varinfo$mat),n.clusters=150,n.cores=n.cores,plot=T)

scde dqrls on Windows

Hi.
I installed scde from your lab website and I cannot run the following command on Windows (R version 3.2.2):

o.ifm <- scde.error.models(counts = data, n.cores = 1, threshold.segmentation = FALSE, save.crossfit.plots = FALSE, save.model.plots = FALSE, min.nonfailed = 500, min.size.entries = 500, verbose = 1)

Error: Error in .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars, : "dqrls" not resolved from current namespace (scde)

Data is gene expression matrix (rows = genes, columns = samples). It works on Linux (the same versions of R and scde package).

Is there some way to run this script on Windows?

According to this
https://cran.r-project.org/bin/windows/base/old/3.0.2/NEWS.R-3.0.2.html
it is not possible to use dqrls on R since version 2.15.1. What is about Linux then?

P.S I simply want to calculate dist object (distances between cells) from gene expression dataset using scde.error.models if it helps...

Invalid row.names length error (scde.error.models)

Hi all,
when running scde.error.models on a count table with raw counts, I get the following error after several "Classification:weighted" messages:
Error in row.names<-.data.frame(*tmp*, value = value) : invalid 'row.names' length

head(ct) looks like:
X94_S62 X43_S29 X53_S32 X32_S22 X58_S35 X27_S18 X73_S47 X63_S38 X83_S55 X88_S59
ENSMUSG00000030105 2 5 72 0 0 34 0 0 0 0
ENSMUSG00000098001 0 0 0 0 0 0 0 0 0 0
ENSMUSG00000065904 0 0 0 0 0 0 0 0 0 0
ENSMUSG00000058979 0 0 0 0 73 0 0 0 0 0
ENSMUSG00000049536 0 0 1 645 0 0 0 0 0 0
ENSMUSG00000027333 0 0 0 0 0 0 0 0 0 0

The initial call was
o.ifm = scde.error.models(counts=ct,n.cores = n.cores, threshold.segmentation = T, save.crossfit.plots = F, save.model.plots = F, verbose = 1)
I can provide the full count table if neccessary. Maybe someone can point me in a direction :)
Best

SVG as well as PNG export

Previously had SVG exports. Recently changed to PNG for faster rendering to accommodate 1000s of cells. Provide both. This would also provide copyable gene names instead of a flattened render.

errors in if else codes in scde.failure.probability?

if("conc.a2" %in% names(models)) is used twice. Something wrong here?

scde.failure.probability <- function(models, magnitudes = NULL, counts = NULL) {
if(is.null(magnitudes)) {
if(!is.null(counts)) {
magnitudes <- scde.expression.magnitude(models, counts)
} else {
stop("ERROR: either magnitudes or counts should be provided")
}
}
if(is.matrix(magnitudes)) { # a different vector for every cell
if(!all(rownames(models) %in% colnames(magnitudes))) { stop("ERROR: provided magnitude data does not cover all of the cells specified in the model matrix") }
if**("conc.a2" %in% names(models))** {
x <- t(1/(exp(t(magnitudes)models$conc.a +t(magnitudes^2)models$conc.a2 + models$conc.b)+1))
} else {
x <- t(1/(exp(t(magnitudes)models$conc.a + models$conc.b)+1))
}
} else { # a common vector of magnitudes for all cells
## same if else statment here *
if**("conc.a2" %in% names(models))**{
x <- t(1/(exp((models$conc.a %% t(magnitudes)) + (models$conc.a2 %% t(magnitudes^2)) + models$conc.b)+1))
} else {
x <- t(1/(exp((models$conc.a %*% t(magnitudes)) + models$conc.b)+1))
}
}
x[is.nan(x)] <- 0
colnames(x) <- rownames(models)
x
}

Error in pagoda.pathway.wPCA: results not reproducible, even with seed

When I run pagoda.pathway.wPCA() multiple times on the same input I get different results, even if I set the seed before each run of this function. How can I ensure this function produces the same results with the same dataset?

R code to test on the tutorial data:

library("scde")
library(org.Hs.eg.db)
data(pollen)

cd <- clean.counts(pollen)
x <- gsub("^Hi_(.*)_.*", "\\1", colnames(cd))
l2cols <- c("coral4", "olivedrab3", "skyblue2", "slateblue3")[as.integer(factor(x, levels = c("NPC", "GW16", "GW21", "GW21+3")))]

data(knn)

varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = TRUE)

varinfo <- pagoda.subtract.aspect(varinfo, colSums(cd[, rownames(knn)]>0))

# translate gene names to ids
ids <- unlist(lapply(mget(rownames(cd), org.Hs.egALIAS2EG, ifnotfound = NA), function(x) x[1]))
rids <- names(ids); names(rids) <- ids 
# convert GO lists from ids to gene names
gos.interest <- unique(c(ls(org.Hs.egGO2ALLEGS)[1:100],"GO:0022008","GO:0048699", "GO:0000280", "GO:0007067")) 
go.env <- lapply(mget(gos.interest, org.Hs.egGO2ALLEGS), function(x) as.character(na.omit(rids[x]))) 
go.env <- clean.gos(go.env) # remove GOs with too few or too many genes
go.env <- list2env(go.env) # convert to an environment

# test without seed

pwpca1 <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 35)

pwpca2 <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 35)

ae_noseed<-all.equal(pwpca1,pwpca2)

if( isTRUE(ae_noseed)) {
    print("No seed: pwpca1 pwpca2 are equal")
} else {
    print("No seed: pwpca1 pwpca2 are not equal")
}
# test with seed
set.seed(0)
pwpca3 <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 35)

set.seed(0)
pwpca4 <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 35)

ae_seed<-all.equal(pwpca3,pwpca4)

if( isTRUE(ae_seed) ) {
    print("With seed: pwpca3 pwpca4 are equal")
} else {
    print("With seed: pwpca3 pwpca4 are not equal")
}

Output:

> source("test.R")
[1] "No seed: pwpca1 pwpca2 are not equal"
[1] "With seed: pwpca3 pwpca4 are not equal"

Single example of differences in results:

> summary(pwpca1$"GO:0048699"$z)
       V1       
 Min.   :6.684  
 1st Qu.:6.769  
 Median :6.945  
 Mean   :6.979  
 3rd Qu.:7.174  
 Max.   :7.390  
> summary(pwpca2$"GO:0048699"$z)
       V1       
 Min.   :6.804  
 1st Qu.:6.992  
 Median :7.123  
 Mean   :7.134  
 3rd Qu.:7.218  
 Max.   :7.499  
> summary(pwpca3$"GO:0048699"$z)
       V1       
 Min.   :6.864  
 1st Qu.:6.954  
 Median :7.038  
 Mean   :7.137  
 3rd Qu.:7.258  
 Max.   :7.764  
> summary(pwpca4$"GO:0048699"$z)
       V1       
 Min.   :6.797  
 1st Qu.:6.965  
 Median :7.056  
 Mean   :7.077  
 3rd Qu.:7.113  
 Max.   :7.516

sessionInfo()

> sessionInfo("scde")
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/arrajpur/Install/R3.4.0-install/lib/R/lib/libRblas.so
LAPACK: /home/arrajpur/Install/R3.4.0-install/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
character(0)

other attached packages:
[1] scde_1.99.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11              compiler_3.4.0           
 [3] BiocInstaller_1.26.0      RColorBrewer_1.1-2       
 [5] nloptr_1.0.4              methods_3.4.0            
 [7] Lmoments_1.2-3            utils_3.4.0              
 [9] tools_3.4.0               grDevices_3.4.0          
[11] digest_0.6.12             bit_1.1-12               
[13] lme4_1.1-13               memoise_1.1.0            
[15] tibble_1.3.3              RSQLite_2.0              
[17] nlme_3.1-131              lattice_0.20-35          
[19] mgcv_1.8-17               pkgconfig_2.0.1          
[21] rlang_0.1.1               Matrix_1.2-10            
[23] DBI_0.7                   parallel_3.4.0           
[25] SparseM_1.77              RcppArmadillo_0.7.900.2.0
[27] IRanges_2.10.2            S4Vectors_0.14.3         
[29] graphics_3.4.0            extRemes_2.0-8           
[31] MatrixModels_0.4-1        datasets_3.4.0           
[33] stats_3.4.0               bit64_0.9-7              
[35] locfit_1.5-9.1            stats4_3.4.0             
[37] grid_3.4.0                nnet_7.3-12              
[39] base_3.4.0                Biobase_2.36.2           
[41] flexmix_2.3-13            AnnotationDbi_1.38.1     
[43] distillery_1.0-4          Rook_1.1-1               
[45] BiocParallel_1.10.1       limma_3.32.2             
[47] minqa_1.2.4               org.Hs.eg.db_3.4.1       
[49] blob_1.1.0                car_2.1-4                
[51] edgeR_3.18.1              pcaMethods_1.68.0        
[53] modeltools_0.2-21         MASS_7.3-47              
[55] BiocGenerics_0.22.0       splines_3.4.0            
[57] RMTstat_0.3               pbkrtest_0.4-7           
[59] quantreg_5.33             brew_1.0-6               
[61] KernSmooth_2.23-15        rjson_0.2.15             
[63] Cairo_1.5-9              
>

Tutorials

http://hms-dbmi.github.io/scde/pagoda.html

l1cols is missing for some reason

cell.clustering = hc, box = TRUE, labCol = NA, margins = c(0.5, 20), col.cols = rbind(l1cols)) is correct

app <- make.pagoda.app(tamr2, tam, varinfo, go.env, pwpca, clpca, col.cols = col.cols, cell.clustering = hc, title = "NPCs")
should be
app <- make.pagoda.app(tamr2, tam, varinfo, go.env, pwpca, clpca, col.cols = rbind(l1cols), cell.clustering = hc, title = "NPCs")

PAGODA app name

For show.app(app, name, browse=T, port=portnum), if name has a special character such as + then the following error is produced Only NEWS and URLs under /doc and /library are allowed

New error in calling scde.error.models after upgrading

I have used scde for several months with great success; however, I recently managed to break it :(

I upgraded some of my packages including scde to the most recent release in Bioconductor 3.4, and I am now no longer able to run the software.

When calling scde.error.models when the software is fitting each of the models I get the following error for each of the cells in the analysis:

Error in checkSlotAssignment(object, name, value) :
assignment of an object of class “function” is not valid for slot ‘defineComponent’ in an object of class “FLXMRglmC”; is(value, "expression") is not TRUE

I figured this was a version issue, so I tried going back to version 2.0.1, but the error is still there, so I figure it is one of the other packages that was upgraded that is causing this problem. Has anyone experienced this or can identify which package is causing this error to be thrown.

Session info is attached.

05052017_sessionInfo.txt

Genes with 0 expression in one group are not DE when same groups are used for error models

Hi
I noticed that when comparing cell types where one group does not express a gene at all (0 counts) and the other expresses it, this gene often fails to come up as differentially expressed in spite of its (often large) difference in expression. I noticed this when one of the most DE genes in my set was not coming up as significant (Igj), so I made this example where I blank out one half for each of two very highly expressed genes (Rplp0 and Actb). In all three cases, when the same grouping is used for the error models as the DE test, the significance is much smaller than when the grouping is not provided to the error model. In the case where the error model knows about the groups, these "binary" genes are not even near the top of the DE gene list. In fact, for Igj, it is not DE at all (FC=0, Z=0) despite it being highly expressed in one group (variably so) and 0 in the other).

##SCDE
#add controls
combinedOMsMat["Rplp0",names(combinedIsWeirdo)[combinedIsWeirdo=="TRUE"]]=0;
combinedOMsMat["Actb",names(combinedIsWeirdo)[combinedIsWeirdo=="FALSE"]]=0;
storage.mode(combinedOMsMat) <- "integer";

#initial grouping for error models
combinedErrModel = scde.error.models(counts = combinedOMsMat, groups = combinedIsWeirdo, n.cores = 2,  save.crossfit.plots = FALSE, save.model.plots = FALSE, verbose = 1); #threshold.segmentation = TRUE,
combinedPrior <- scde.expression.prior(models = combinedErrModel, counts = combinedOMsMat, length.out = 400, show.plot = FALSE)
combinedDE <- scde.expression.difference(combinedErrModel, combinedOMsMat, combinedPrior, groups  =  combinedIsWeirdo, n.randomizations  =  1000, n.cores  =  2, verbose  =  1)

################ NO INITIAL GROUPING
combinedErrModelNoG = scde.error.models(combinedOMsMat, n.cores = 2)
combinedPriorNoG <- scde.expression.prior(models = combinedErrModelNoG, counts = combinedOMsMat)
combinedDENoG <- scde.expression.difference(combinedErrModelNoG, combinedOMsMat, combinedPriorNoG, groups  =  combinedIsWeirdo, n.randomizations  =  1000, n.cores  =  2, verbose  =  1)

message("DE results for controls when an initial grouping is used");
combinedDE[c("Igj","Actb","Rplp0"),];

message("DE results for controls when NO initial grouping is used");
combinedDENoG[c("Igj","Actb","Rplp0"),];

Here is the output:

cross-fitting cells.
number of pairs:  946
number of pairs:  351
total number of pairs:  1297
cross-fitting 1297 pairs:
building individual error models.
adjusting library size based on 2000 entries
fitting FALSE models:
fitting TRUE models:
comparing groups:

FALSE  TRUE
   44    27
calculating difference posterior
summarizing differences
comparing groups:

FALSE  TRUE
   44    27
calculating difference posterior
summarizing differences
DE results for controls when an initial grouping is used
               lb       mle        ub        ce         Z         cZ
Igj   -3.74336039  0.000000  0.000000  0.000000  0.000000  0.0000000
Actb  -5.79914027 -2.485346 -0.767082 -0.767082 -2.667879 -1.2791274
Rplp0 -0.03068328  0.000000  6.627589  0.000000  1.740882  0.4774509
DE results for controls when NO initial grouping is used
             lb       mle        ub        ce         Z        cZ
Igj   -12.22303 -12.22303 -11.88690 -11.88690 -7.160809 -6.379707
Actb  -12.22303 -12.22303 -11.30630 -11.30630 -7.160847 -6.379707
Rplp0  11.61188  12.22303  12.22303  11.61188  7.160813  6.379707

The meaning of each column in scde results.

Hello,

First of all thank you for developing this tool.
I have a question regarding the output table of scde. I can understand the columns related to Z score and corrected Z score.
My question is where I can find the fold change. Is it the "mle" column? Is it in the log scale? what is the meaning of "lb" , "ub" and "ce" columns?

I would be grateful if you can clarify this for me.
Best

hms-dbmi / scde Goto Github PK

scde's Introduction

Overview of SCDE

Sample analyses and images

Single cell error modeling

Differential expression analysis

Pathway and gene set overdispersion analysis

Contributing

Citation

scde's People

Contributors

Stargazers

Watchers

Forkers

scde's Issues

Recommend Projects

Recommend Topics

Recommend Org