mohuangx / saver Goto Github PK

View Code? Open in Web Editor NEW

108.0 108.0 19.0 3.98 MB

Single-cell RNA-seq Gene Expression Recovery

R 100.00%

saver's People

Contributors

Stargazers

Watchers

Forkers

steffen12 chenxofhit houghton192 haroon123 bharris12 gyd1990 adrianbzg zorrodong chitrita tuqiang2014 ssicreative83 ni11235 chenmengpin chen496 zdb969 catstats kunppbu wook2014

saver's Issues

Error in is.data.frame(x) : object 'mu' not found

Hi,

I encountered this issue when running saver:

Error in is.data.frame(x) : object 'mu' not found
Calls: saver ... calc.estimate -> do.call -> -> var -> is.data.frame
In addition: Warning message:
In if (!(grepl("matrix", class(x), ignore.case = TRUE))) { :
the condition has length > 1 and only the first element will be used
Execution halted

The code I used was:
res <- saver(d,ncores=detectCores(),size.factor=1)$estimate

Here d is a numeric matrix 15981 rows and 2796 columns. I have done library size normalization by myself so I set size.factor to be 1. I have checked the data and there is no NA values nor genes or cells with all zeroes. I wonder what is the potential cause of this problem? Thanks!

Any speed optimization planned for large datasets?

Hi Mohuan,

great talk at ASHG this year! I am currently running SAVER on some of our Drop-seq datasets and I was wondering whether there are any other tips to improve runtime besides multicore usage via doParallel? It is still quite slow even when using multiple cores on large data. I am currently running 10 000 cells and it takes about 2 days using 12 cores.

Anything planned to speed up the imputation algorithm?
Sorry for submitting an issue for this, was the easiest way to get in contact.

All the best for SAVER, I really like the code!

Best,

Florian

Can't install this module

Hi,

I got this error.

> install.packages("SAVER")
Installing package into ‘/home/inoue019/R/x86_64-pc-linux-gnu-library/4.2’
(as ‘lib’ is unspecified)
also installing the dependency ‘doParallel’

trying URL 'https://cloud.r-project.org/src/contrib/doParallel_1.0.17.tar.gz'
Content type 'application/x-gzip' length 164254 bytes (160 KB)
==================================================
downloaded 160 KB

trying URL 'https://cloud.r-project.org/src/contrib/SAVER_1.1.2.tar.gz'
Content type 'application/x-gzip' length 2953463 bytes (2.8 MB)
==================================================
downloaded 2.8 MB

ERROR: failed to lock directory ‘/home/inoue019/R/x86_64-pc-linux-gnu-library/4.2’ for modifying
Try removing ‘/home/inoue019/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-doParallel’
ERROR: dependency ‘doParallel’ is not available for package ‘SAVER’
* removing ‘/home/inoue019/R/x86_64-pc-linux-gnu-library/4.2/SAVER’

The downloaded source packages are in
        ‘/tmp/Rtmp2LGrin/downloaded_packages’
Warning messages:
1: In install.packages("SAVER") :
  installation of package ‘doParallel’ had non-zero exit status
2: In install.packages("SAVER") :
  installation of package ‘SAVER’ had non-zero exit status
>

Probably this is related to this issue.
https://community.rstudio.com/t/parallel-package-no-longer-available-in-cran/66793

If this is correct, could you try to deal with this problem?

Is SAVER output compatible with typical scRNAseq analyses

Hi,

I have been using SAVER on my 10X data, and it seems to be working pretty well. I have mostly been using SAVER estimates for clustering analysis in R in order to define clusters. As a next step, I am interested in performing more sophisticated analyses such as, dimensionality reduction, differential gene expression, pseudotime analysis, etc. I am wondering if SAVER estimates can be treated as raw read counts for analysis in other softwares such as Seurat or Monocle (for pseudotime analysis for instance)? Or would you recommend some sorts of modification/normalization of the estimates before going into Seurat or Monocle? Do you think the SAVER estimates also follow negative binomial distribution (assumption used in many scRNAseq software) same as raw read counts?

Thank you in anticipation of your help.

Tutorial doesn't work

Hi,

I follow your tutorial (https://mohuangx.github.io/SAVER/articles/saver-tutorial.html#preprocessing). But it doesn't work.

This is my environment.

Environment

# Ubuntu
$ lsb_release -d
Description:    Ubuntu 18.04.4 LTS

# R
R.version.string
'R version 4.2.0 (2022-04-22)'

# SAVER
library(SAVER)
packageVersion("SAVER")
[1] ‘1.1.2’

This is an error code.

ERROR

library(SAVER)
packageVersion("SAVER")

[1] '1.1.2'

data.path <- "expression_mRNA_17-Aug-2014.txt"

# Need to remove first 10 rows of metadata
raw.data <- read.table(data.path, header = FALSE, skip = 11, row.names = 1, 
                check.names = FALSE)

head(raw.data)

A data.frame: 6 x 3006

	V2	V3	V4	V5	V6	V7	V8	V9	V10	V11	...	V2998	V2999	V3000	V3001	V3002	V3003	V3004	V3005	V3006	V3007
	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	...	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>
Tspan12	1	0	0	0	3	0	0	3	0	0	...	0	0	0	0	0	0	0	0	0	1
Tshz1	1	3	1	0	2	2	2	2	1	0	...	0	0	0	0	0	0	0	0	0	1
Fnbp1l	1	3	1	6	4	1	2	1	0	5	...	0	0	0	0	0	0	0	0	0	0
Adamts15	1	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
Cldn12	1	1	1	1	0	0	0	0	0	2	...	0	0	0	0	0	0	0	0	0	0
Rxfp1	1	0	0	0	0	0	1	0	0	1	...	0	0	0	2	0	0	0	0	0	0

cortex <- as.matrix(raw.data[, -1])
cellnames <- read.table(data.path, skip = 7, nrows = 1, row.names = 1, 
                        stringsAsFactors = FALSE)
colnames(cortex) <- cellnames[-1]

head(cortex)

A matrix: 6 x 3005 of type int

	1772071015_C02	1772071017_G12	1772071017_A05	1772071014_B06	1772067065_H06	1772071017_E02	1772067065_B07	1772067060_B09	1772071014_E04	1772071015_D04	...	1772058148_C03	1772058148_F03
Tspan12	0	0	0	3	0	0	3	0	0	0	...	0	1
Tshz1	3	1	0	2	2	2	2	1	0	2	...	0	1
Fnbp1l	3	1	6	4	1	2	1	0	5	2	...	0	0
Adamts15	0	0	0	0	0	0	0	0	0	0	...	0	0
Cldn12	1	1	1	0	0	0	0	0	2	3	...	0	0
Rxfp1	0	0	0	0	0	1	0	0	1	0	...	2	0

dim(cortex)

19972 ･ 3005

cortex.saver <- saver(cortex, ncores = 12)

Error in if (!(grepl("matrix", class(x), ignore.case = TRUE))) {: the condition has length > 1
Traceback:


1. saver(cortex, ncores = 12)

2. clean.data(x)

Do you know why this happens?

Error reading from connection

Hi, I am new to running SAVER and I'm encountering the following error when I try to run on my local machine:

sce_matrix <- assay(sce, "counts")
dim(sce_matrix)
sce_saver <- saver(sce_matrix, ncores = 6)
16326 genes, 9250 cells
starting worker pid=4723 on localhost:11445 at 19:13:06.473
starting worker pid=4724 on localhost:11445 at 19:13:06.480
starting worker pid=4721 on localhost:11445 at 19:13:06.484
starting worker pid=4725 on localhost:11445 at 19:13:06.487
starting worker pid=4722 on localhost:11445 at 19:13:06.490
starting worker pid=4726 on localhost:11445 at 19:13:06.493
Error in unserialize(node$con) : error reading from connection
Calls: -> workLoop -> makeSOCKmaster
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: -> workLoop -> makeSOCKmaster
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: -> workLoop -> makeSOCKmaster
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: -> workLoop -> makeSOCKmaster
Error in unserialize(node$con) : error reading from connection
Calls: -> workLoop -> makeSOCKmaster
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: -> workLoop -> makeSOCKmaster
Execution halted
Execution halted
Error in makePSOCKcluster(names = spec, ...) :
Cluster setup failed. 6 of 6 workers failed to connect.
`

Recommendations on how to address this? I get the same error when I try to run on the Zeisel cortex dataset used in the SAVER tutorial, so I don't think it has to do specifically with my dataset.

Error when I use SAVER with size.factor =1

Hi,
I have a dataset with 22164 rows and 268 columns. When I run SAVER with size.factor = 1 since my dataset has been normalized, an Error appears. It said:

Error in optimize(calc.loglik.b, interval = c(0, var(y/sf)/mean(y/sf)), :
'xmin' not less than 'xmax'
In addition: Warning message:
In if (!(grepl("matrix", class(x), ignore.case = TRUE))) { :
the condition has length > 1 and only the first element will be used

And my device is M1 macbook, running R 4.1.0.
This error does not appear when I use default size.factor.

Thanks!

Saver in docker

Good morning, I installed SAVER in a docker container and i tried to RUN SAVER from there but, it give me an error about the ssh connection . Using this command

mm.saver <- saver(mainMatrix, ncores = ncores,estimates.only = TRUE)
This is the output :

ssh: connect to host 0.0.0.4 port 22: Connection timed out

Using only one core i have the same error but with different last host number.
You can find the docker container downloading repbioinfo/saver . If you try to run saver here you will see the error.
Thank you for your support

Best

Luca

A bug detected

Hi Mo,

SAVER is really cool!

I run it on a UMI count table but RStudio reported the following error (SAVER_0.3.1):

Calculating predictions for 9377 genes using 9375 genes and 74 cells...
Approximate finish time: 2018-02-19 10:29:56
Running in parallel: 5 workers
Error in out[[i]][lasso.genes, ] <- matrix(unlist(tempvec), nrow = length(tempvec), :
number of items to replace is not a multiple of replacement length
In addition: Warning messages:
1: In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
2: In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
3: In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
4: In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
5: In doTryCatch(return(expr), name, parentenv, handler) :
restarting interrupted promise evaluation
6: In doTryCatch(return(expr), name, parentenv, handler) :
restarting interrupted promise evaluation
7: In doTryCatch(return(expr), name, parentenv, handler) :
restarting interrupted promise evaluation
8: In doTryCatch(return(expr), name, parentenv, handler) :
restarting interrupted promise evaluation
Timing stopped at: 11.72 1.1 1596

I think that this information "Error in out[[i]][lasso.genes, ] <- matrix(unlist(tempvec), nrow = length(tempvec), :
number of items to replace is not a multiple of replacement length" may be helpful.

I hope the above information is helpful.

Best wishes,
Wenhao

Difference in output while using all genes vs using pred.genes

Hello,

I have a huge matrix of scRNA data that I would need to break down into small matrix for it to work on my server with limited RAM, followed by combine.saver as is explained in SAVER manual. I wanted to make sure that the output is similar if I predict all genes at once verses if I predict only a set of genes per a SAVER run. For this I took another small scRNA dataset and ran SAVER with default value for pred.genes, ie. all genes, verses with 1:10 value for pred.genes, ie. predict only first 10 genes. The estimates are very similar for many genes as expected, but for few genes the correlation is as poor as pearson r=0.16. I was hoping to see very similar estimates for all genes irrespective of if I predicted all genes vs a set of genes. Is my assumption valid or there is more going on behind the scene that could explain the discrepancies between different methods? I have provided commands that I used below. Please let me know if I should change my analysis in any way.

saver(data.fltrd, ncores=5)   # all genes
saver(data.fltrd, pred.genes = 1:10, pred.genes.only = TRUE, do.fast = FALSE, ncores=5) # only 10 genes

Thank you.

Error in checkForRemoteErrors(lapply(cl, recvResult))

Hello,

When I tried to use SAVER, it returned some error message,

Running SAVER with 10 worker(s)
Calculating predictions for 23131 genes using 17490 genes and 12346 cells...
Start time: 2021-06-09 06:07:12
Estimating finish time...
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
10 nodes produced errors; first error: object '.doSnowGlobals' not found

Is there anyway to resolve this ?
Here is my sessionInfo()

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] SAVER_1.1.2

loaded via a namespace (and not attached):
[1] lattice_0.20-41 codetools_0.2-18 digest_0.6.27 foreach_1.5.1 grid_4.0.4 evaluate_0.14 rlang_0.4.10 doParallel_1.0.16 Matrix_1.3-2
[10] rmarkdown_2.7 iterators_1.0.13 tools_4.0.4 tinytex_0.31 xfun_0.22 yaml_2.2.1 compiler_4.0.4 htmltools_0.5.1.1 knitr_1.32

SAVER vs Seurat normalization

In a previous discussion (#18), it was suggested to use SAVER estimates as the raw counts for Seurat:

data.saver <- saver(data, estimates.only = TRUE)
seurat.obj <- CreateSeuratObject(raw.data = data.saver)
seurat.obj <- NormalizeData(seurat.obj)

The SAVER estimates are library size normalized (the column sums are not equal). When you run NormalizeData(), they will get normalized again to a constant number (by default, column sums are 10k). I assume the SAVER normalization is preferred by the SAVER authors. Then shouldn't extra Seurat normalization be skipped?

Output of estimated prior parameters

Hi Mo,

Can the function saver output estimated alpha and beta of prior Gamma distribution? How can I access to those estimates?

Thank you very much!

Best wishes,
Wenhao

missing entries in NAMESPACE file

Dear Mo,

I would like to implement SAVER in my simulation tool. The code runs fine when I explicitly load all libraries but once I include the main saver function, I get the following error:
Error in foreach::foreach(xit = iterators::iter(x[lasso.genes, ], by = "row"), : could not find function "%dopar%"

The reason is that the functions are not stated in the NAMESPACE file (specifically foreach). May I suggest that you include the packages and functions with #' @importFrom calls in your R code files?

Kind regards
Beate

Error in lm.fit ... 0 (non-NA) cases

Hi,

I am trying to run SAVER. Things look good initially, but then I receive an error that I cannot debug. My output is as follows:

sc_data.saver <- saver(sc_data)
32738 genes, 2612 cells
Running SAVER with 1 worker(s).
Calculating predictions for 14175 genes using 1124 genes and 2612 cells...
Start time: 2019-07-30 14:16:03
Estimating finish time...
Finished 8/32738 genes. Approximate finish time: 2019-07-30 14:37:27
Calculating max cor cutoff...
Finished 100/32738 genes. Approximate finish time: 2019-07-30 14:35:23
Calculating lambda coefficients...
Finished 200/32738 genes. Approximate finish time: 2019-07-30 14:33:39
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases

Do you know what the issue might be, or what a good next step would be to try to figure out the problem?

Thanks!
Matt

Error in class() examination while using SAVER

Hi,

I have a UMI-based dataset, and when I was trying to perform imputation with SAVER (X.saver <- saver(X, , ncores = 12), I got an error as below.

Error in if (!(grepl("matrix", class(x), ignore.case = TRUE))) { :
the condition has length > 1

When I examine the class of my matrix (class(X)), it gives " "matrix" "array" ".
Can you please help me to fix this problem?

PS: The dataset was not normalized or log-transformed, so there is not any negative number.

Thanks

Error in serialize(data, node$con) : error writing to connection

Hi,

I am pretty new to the imputation and paralle computation. When I am running the saver function, R keeps giving warning messages and errors such like:

Estimating finish time...
Error in serialize(data, node$con) : error writing to connection
In addition: Warning message:
In if (!(grepl("matrix", class(x), ignore.case = TRUE))) { :
the condition has length > 1 and only the first element will be used

Do you know how should I fix it?

Thanks,
Stingo

Error while running SAVER

Hello,

First of all let me say this, I think your approach for accurate measurement of gene expression from scRNA-seq data seems promising. I am trying to give SAVER a try with my data generated on DropSeq. I get the following error when I try to run the pipeline on my data or on the sample data provided with the package (linnarsson).

> saver.adneo = saver(linnarsson, npred = 5)
3529 genes, 500 cells
Running SAVER with 1 worker(s)
Calculating predictions for 5 genes using 3529 genes and 500 cells...
Estimating finish time with 5 genes.
Error in summary.connection(connection) : invalid connection

Can you please let me know how to fix this?

Thanks.
Ravi

Subsetting Dataset

Hello,

If I'm particularly interested in a subset of cells in my dataset, is it valid to run SAVER on just that subset?

Furthermore, I'm working on a dataset comprised from many different scRNA sources where I suspect batch effects may be relevant. Would it be a good idea to split my SAVER runs to be on each source instead of running on data combined from all sources?

Thank you!

is it possible to do correlation analyses without running the correction

I have run the tutorial successfully and I wonder is it possible to run the correlation because I have already done imputation using another software.
For example, does it make sense to generate the SE matrix manually using only values of zero?

Thanks!

Kai

SAVER keeps crashing due to non-finite values

Hello,

I am trying to incorporate SAVER into my sequencing workflow. However, I keep getting the following messages, culminating with an error:

14515 genes, 3126 cells
Running SAVER with 1 worker(s)
Calculating predictions for 14515 genes using 0 genes and 3126 cells...
Start time: 2018-12-01 01:36:01
Estimating finish time...
Finished 8/14515 genes. Approximate finish time: 2018-12-01 01:36:17
Calculating max cor cutoff...
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'x'

However, I know this cannot be true, as these are the two prior lines:

q_array <- t(q_array)
q_array[!is.finite(q_array)] <- 0
q_array <- saver(q_array)

Is there something I am missing, such as a transformation done by the program that induces these non-finite values? If it helps at all, I immediately precede the step with batch correction via mnnCorrect.

Thanks!

Problem running SAVER on large dataset

Hello,
I have a large dataset with 7685 cell and 18,000 genes that I was to use SAVER on (for imputation).
I tried running it with 12, 6, 4, and 2 cores (setting ncore argument) and it doesn't seem to work.
I also tries to split the genes to subgroups and combine using the saver combine but that didn't work either.

Is there a way you can assist me? Or have any recommendations?
Thanks in advance!

Error in approx(lambda, seq(lambda), sfrac) : need at least two non-NA values to interpolate

After i transformed this count into TPM, i use SAVER aging in the same pipelines.
However, i get this error and I want to know the reason. Thank you so much.

> cortex.saver <- saver(tpm, ncores = 12) 18530 genes, 35467 cells starting worker pid=1772768 on localhost:11093 at 12:26:14.811 starting worker pid=1772782 on localhost:11093 at 12:26:15.116 starting worker pid=1772796 on localhost:11093 at 12:26:15.403 starting worker pid=1772810 on localhost:11093 at 12:26:15.691 starting worker pid=1772824 on localhost:11093 at 12:26:15.982 starting worker pid=1772838 on localhost:11093 at 12:26:16.275 starting worker pid=1772852 on localhost:11093 at 12:26:16.561 starting worker pid=1772866 on localhost:11093 at 12:26:16.853 starting worker pid=1772880 on localhost:11093 at 12:26:17.165 starting worker pid=1772894 on localhost:11093 at 12:26:17.459 starting worker pid=1772908 on localhost:11093 at 12:26:17.754 starting worker pid=1772922 on localhost:11093 at 12:26:18.047 Running SAVER with 12 worker(s) Calculating predictions for 15644 genes using 12340 genes and 35467 cells... Start time: 2023-01-13 12:28:11 Estimating finish time... Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Loading required package: SAVER loaded SAVER and set parent environment Finished 12/18530 genes. Approximate finish time: 2023-01-13 12:46:42 Calculating max cor cutoff... loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment Finished 100/18530 genes. Approximate finish time: 2023-01-13 14:16:42 Calculating lambda coefficients... loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment Finished 214/18530 genes. Approximate finish time: 2023-01-13 15:45:52 Predicting remaining genes... loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment loaded SAVER and set parent environment Error in approx(lambda, seq(lambda), sfrac) : need at least two non-NA values to interpolate

`> dim(tpm)
[1] 18530 35467

sum(is.na(tpm))
[1] 0`

A query about the detailed procedure for filtering Drop seq data

Hi Mo,

I followed your filtering procedure as proposed in your paper: filtering genes which have mean expression less than 0.1 and cells with library size less than 500 or greater than 20000 for Torre case study.

Did you filter genes and cells simultaneously, like Data[-genes,-cells]? This procedure left me 9177 genes rather than 12241 out of 32287 genes.

May I ask how could I repeat your filtering procedure to obtain the same filtered dataset as you used in that paper?

Thank you very much!

Best wishes,
Wenhao

combine.saver() : object 'x' not found

Dear Mo,

in the function combine.saver() the object x is not defined:
if (!is.null(x$alpha)) { return(combine.saver.old(saver.list)) }

Kind regards
Beate

Error in cbind2(1, newx) : object 'newx' not found

I am getting this error when I try to run saver with the matrix of raw counts and a vector of size factors. The count matrix I am using is from an sce object. I have tried using just the matrices produced by counts(sce_obj), as.matrix(counts(sce_obj)), and data.matrix(counts(sce_obj)). The size factors I am using are the size factors computed by scran. Any help would be greatly appreciated, thanks!

Error: number of items to replace is not a multiple of replacement length

Hi Mo!
I also faced the bug reported by @WT215 previously when using SAVER_0.3.1, and now it's fixed. However, there are probably some new bugs in currant version.

It reports the following error when running on a UMI count table:

11001 genes, 274 cells
Running SAVER with 64 worker(s)
Calculating predictions for 11001 genes using 10464 genes and 274 cells...
Estimating finish time with 64 genes.
  |==============================================================================================================================================================| 100%
Approximate finish time: 2018-04-12 15:30:25
Calculating max cor cutoff with 36 genes.
  |==============================================================================================================================================================| 100%
Calculating lambda coefficients with 164 genes.
  |==============================================================================================================================================================| 100%
Predicting 10737 genes.
  |==============================================================================================================================================================| 100%
Error in est[ind4, ] <- out$est : 
  number of items to replace is not a multiple of replacement length
Timing stopped at: 1.042e+04 2.067e+04 600.5

I tried to traceback:

> traceback()
2: saver.fit(x, x.est, do.fast, sf, scale.sf, pred.genes, pred.cells, 
       null.model, ngenes = nrow(x), ncells = ncol(x), gene.names, 
       cell.names)
1: saver(edgeR::cpm(counts(sce4_6.25)), size.factor = 1)

Sometimes it can run successfully on the same dataset without changing any parameter, but sometimes it will report such error.

Hope the information is helpful.

Best wishes,
Xueyi

Less total expression after saver?

I've run saver with 43k cells, on a subset of genes expressed in at least 100 cells with at least 100 copies (in the entire dataset). Somehow, I'm finding that the sum of my output matrix in saver$estimate is lower than my input by several tens of thousands. I'm using size.factor =1 because I've already normalized my data.

saver1 = saver(x = data, do.fast = TRUE, size.factor = 1, pred.genes = gene.indices[1:2500], pred.genes.only = T, ncores = 16)

> sum(saver1$estimate) - sum(data[gene.indices[1:2500],])
[1] -86580.29

I had understood that SAVER would impute values where they don't already exist or add to what already exists, but this appears to have lowered my overall expression. Either that, or what I suspect is more likely is that the mean library size-normalization is coming into play.

Here's a randomly selected cell:

> saver1$estimate[1:10,6197 ]
   Xkr4  Gm1992     Rp1  Mrpl15 Gm37988   Tcea1   Rgs20 Atp6v1h   Oprk1  Rb1cc1 
  2.031   0.105   0.004   0.083   0.004   0.095   0.261   0.388   0.040   0.767 
> data[gene.indices[1:10],6197 ]
    Xkr4   Gm1992      Rp1   Mrpl15  Gm37988    Tcea1    Rgs20  Atp6v1h    Oprk1   Rb1cc1 
3.139104 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.543411

Again, it looks like it imputed new values where there were none, but the genes that already had high expression are now lowered.

I've confirmed the cell names match up and both datasets contain the same amount of columns/cells.

Any thoughts?

error in optimize

I tried running SAVER on a dataset of ~1500 cells. After approx. 24h on 64 cores the program crashed with the following message:

Error in optimize(calc.loglik.a, interval = c(0, var(y/sf)/mean(y/sf)^2),  :                                                                                                                                                                                             
   invalid 'xmin' value                                                                                                                                                                                                                                                   
 In addition: Warning message:                                                                                                                                                                                                                                            
 In matrix(gene.means, ngenes, ncells) :                                                                                                                                                                                                                                  
 data length [16562] is not a sub-multiple or multiple of the number of rows [16710]

Thanks!

SAVER fails when using multi-threading.

running saver on NUM_OF_THREAD=8 generate the following errors:

14224 genes, 340 cells
starting worker pid=25881 on localhost:11987 at 10:39:03.870
starting worker pid=25932 on localhost:11987 at 10:39:04.133
starting worker pid=25941 on localhost:11987 at 10:39:04.362
starting worker pid=25950 on localhost:11987 at 10:39:04.592
starting worker pid=25959 on localhost:11987 at 10:39:04.818
starting worker pid=25968 on localhost:11987 at 10:39:05.048
starting worker pid=25977 on localhost:11987 at 10:39:05.288
starting worker pid=25986 on localhost:11987 at 10:39:05.516
Running SAVER with 8 worker(s)
Calculating predictions for 14224 genes using 13245 genes and 340 cells...
Start time: 2018-12-21 10:39:06
Estimating finish time...
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
 Error in FUN(X[[i]], ...) : subscript out of bounds 
7.
lapply(out, `[[`, 3) 
6.
unlist(lapply(out, `[[`, 3)) 
5.
calc.estimate(x[ind1, , drop = FALSE], x.est, cutoff = 0, coefs = NULL, 
    sf, scale.sf, gene.names[pred.genes], pred.cells, null.model, 
    nworkers, calc.maxcor = TRUE, estimates.only) 
4.
saver.fit(x, x.est, do.fast, ncores, sf, scale.sf, pred.genes, 
    pred.cells, null.model, ngenes = nrow(x), ncells = ncol(x), 
    gene.names, cell.names, estimates.only) 
3.
saver(logcounts(sce), ncores = NUM_OF_THREAD, size.factor = 1, 
    estimates.only = TRUE) 
2.
system.time({
    logcounts(sce) = saver(logcounts(sce), ncores = NUM_OF_THREAD, 
        size.factor = 1, estimates.only = TRUE)
}) 
1.
SAVER_impute(res1$result[[1]]) 
Warning message:
In if (calc.maxcor) { :
  the condition has length > 1 and only the first element will be used
Warning message:
In if (calc.maxcor) { :Warning message:
In if (calc.maxcor) { :
  the condition has length > 1 and only the first element will be used

  the condition has length > 1 and only the first element will be used
Warning message:
In if (calc.maxcor) { :Warning message:
In if (calc.maxcor) { :
  the condition has length > 1 and only the first element will be used

  the condition has length > 1 and only the first element will be used
Warning message:
In if (calc.maxcor) { :
  the condition has length > 1 and only the first element will be used
Warning message:
In if (calc.maxcor) { :
  the condition has length > 1 and only the first element will be used
Warning message:
In if (calc.maxcor) { :
  the condition has length > 1 and only the first element will be used

change estimates.only to FALSE gives the new error:

Error in is.data.frame(y) : object 'x2' not found 
9.
is.data.frame(y) 
8.
cor(x1, x2) 
7.
(function (..., deparse.level = 1) 
.Internal(rbind(deparse.level, ...)))(cor(x1, x2), cor(x1, x2), 
   cor(x1, x2), cor(x1, x2), cor(x1, x2), cor(x1, x2), cor(x1, 
       x2), cor(x1, x2), cor(x1, x2), cor(x1, x2), cor(x1, x2),  ... 
6.
do.call(rbind, lapply(out, `[[`, 2)) 
5.
calc.estimate(x[ind1, , drop = FALSE], x.est, cutoff = 0, coefs = NULL, 
   sf, scale.sf, gene.names[pred.genes], pred.cells, null.model, 
   nworkers, calc.maxcor = TRUE, estimates.only)

All errors and warnings disappear when running on a single thread.

the dataset comes from our benchmark study: https://github.com/LuyiTian/CellBench_data/tree/master/data. They are normal CEL-seq2 data and run smoothly on all other methods.

this is the sessioninfo:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.4 (Final)

Matrix products: default
BLAS: /wehisan/general/system/bioinf-software/bioinfsoftware/R/R-3.5.1/lib64/R/lib/libRblas.so
LAPACK: /wehisan/general/system/bioinf-software/bioinfsoftware/R/R-3.5.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2              dplyr_0.7.8                 rsvd_1.0.0                  Matrix_1.2-14               cluster_2.0.7-1             DrImpute_1.0               
 [7] SAVER_1.1.1                 scone_1.6.0                 SCnorm_1.4.2                Linnorm_2.6.0               edgeR_3.24.0                limma_3.38.2               
[13] DESeq2_1.20.0               CellBench_0.1.0             tibble_1.4.2                magrittr_1.5                scater_1.8.0                ggplot2_3.1.0              
[19] scran_1.8.2                 SingleCellExperiment_1.4.0  SummarizedExperiment_1.12.0 DelayedArray_0.8.0          matrixStats_0.54.0          Biobase_2.42.0             
[25] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1         IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0         BiocParallel_1.16.2        

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1            SparseM_1.77              rtracklayer_1.42.1        prabclus_2.2-6            R.methodsS3_1.7.1         tidyr_0.8.1               acepack_1.4.1            
  [8] bit64_0.9-8               knitr_1.20                aroma.light_3.10.0        R.utils_2.6.0             data.table_1.11.4         rpart_4.1-13              hwriter_1.3.2            
 [15] doParallel_1.0.13         RCurl_1.95-4.11           GenomicFeatures_1.32.0    cowplot_0.9.2             RSQLite_2.1.1             future_1.8.1              bit_1.1-14               
 [22] bayesm_3.1-0.1            httpuv_1.4.4.2            assertthat_0.2.0          viridis_0.5.1             amap_0.8-16               tximport_1.8.0            hms_0.4.2                
 [29] evaluate_0.12             promises_1.0.1            DEoptimR_1.0-8            progress_1.2.0            caTools_1.17.1            dbplyr_1.2.1              igraph_1.2.1             
 [36] DBI_1.0.0                 geneplotter_1.58.0        htmlwidgets_1.2           apcluster_1.4.7           tensorA_0.36              EDASeq_2.14.1             rARPACK_0.11-0           
 [43] RcppArmadillo_0.8.600.0.0 purrr_0.2.5               RSpectra_0.13-1           backports_1.1.2           energy_1.7-4              permute_0.9-4             trimcluster_0.1-2        
 [50] annotate_1.58.0           compositions_1.40-2       moments_0.14              biomaRt_2.38.0            quantreg_5.36             withr_2.1.2               robustbase_0.93-3        
 [57] checkmate_1.8.5           vegan_2.5-2               GenomicAlignments_1.18.0  prettyunits_1.0.2         mclust_5.4.2              segmented_0.5-3.0         lazyeval_0.2.1           
 [64] crayon_1.3.4              ellipse_0.4.1             genefilter_1.62.0         labeling_0.3              glmnet_2.0-16             pkgconfig_2.0.2           nlme_3.1-137             
 [71] vipor_0.4.5               nnet_7.3-12               globals_0.12.1            bindr_0.1.1               rlang_0.3.0.1             diptest_0.75-7            MatrixModels_0.4-2       
 [78] BiocFileCache_1.4.0       rprojroot_1.3-2           Rhdf5lib_1.2.1            boot_1.3-20               zoo_1.8-3                 base64enc_0.1-3           beeswarm_0.2.3           
 [85] viridisLite_0.3.0         rjson_0.2.20              bitops_1.0-6              shinydashboard_0.7.0      R.oo_1.22.0               KernSmooth_2.23-15        Biostrings_2.50.1        
 [92] blob_1.1.1                DelayedMatrixStats_1.2.0  stringr_1.3.1             ShortRead_1.38.0          scales_1.0.0              memoise_1.1.0             plyr_1.8.4               
 [99] hexbin_1.27.2             gplots_3.0.3              gdata_2.18.0              zlibbioc_1.28.0           compiler_3.5.1            RColorBrewer_1.1-2        Rsamtools_1.34.0         
[106] cli_1.0.1                 XVector_0.22.0            listenv_0.7.0             htmlTable_1.12            Formula_1.2-3             MASS_7.3-50               mgcv_1.8-24              
[113] tidyselect_0.2.5          stringi_1.2.3             yaml_2.1.19               locfit_1.5-9.1            latticeExtra_0.6-28       grid_3.5.1                tools_3.5.1              
[120] rstudioapi_0.7            foreach_1.5.0             foreign_0.8-70            gridExtra_2.3             Rtsne_0.13                digest_0.6.18             FNN_1.1                  
[127] shiny_1.1.0               fpc_2.1-11                Rcpp_1.0.0                later_0.7.3               httr_1.4.0                ggdendro_0.1-20           AnnotationDbi_1.44.0     
[134] kernlab_0.9-26            colorspace_1.4-0          XML_3.98-1.16             splines_3.5.1             statmod_1.4.30            flexmix_2.3-14            xtable_1.8-3             
[141] jsonlite_1.5              dynamicTreeCut_1.63-1     modeltools_0.2-21         R6_2.3.0                  gmodels_2.18.1            Hmisc_4.1-1               pillar_1.2.3             
[148] htmltools_0.3.6           mime_0.6                  glue_1.3.0                DT_0.4                    DESeq_1.32.0              class_7.3-14              RUVSeq_1.14.0            
[155] codetools_0.2-15          mvtnorm_1.0-8             utf8_1.1.4                lattice_0.20-35           mixtools_1.1.0            ggbeeswarm_0.6.0          gtools_3.8.1             
[162] survival_2.42-4           rmarkdown_1.10            munsell_0.5.0             fastcluster_1.1.25        rhdf5_2.24.0              GenomeInfoDbData_1.2.0    iterators_1.0.11         
[169] reshape2_1.4.3            gtable_0.2.0

Input and Output

Dear author,

Thanks for developing the software! I have a few questions but did not find clear answer from your paper:

is the input a count matrix?
is the output a count matrix? Has it been normalized by library size, or log2-transformed?

Thanks!

Correlation cutoff to increase speed

Hi,
I have been using saver and its performance is amazing. However it takes too much time to finish, making it less useful in some situations. I wonder whether one feature can be added to substantially increase the computational speed. When estimating the prior mean of gene A, other genes are used to predict the expression of gene A. Is it possible to set a correlation cutoff to only include genes that are highly correlated with gene A? The idea is essentially sure independence screening.
Thanks!

Cell selection

Hello and thank you for this exciting tool!

What sort of filtration would you recommend prior to running SAVER? Would you recommend a fairly stringent filtering of cells based on UMI, gene counts and mitochondrial content prior to running the imputation?

10x_HDF5 format support

Can you add support for 10x_HD5 format? Thanks!

https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices

Error in UseMethod("units<-") :

Hi,

I kept receiving the following error message from SAVER. I am wondering what caused this problem. Would you please help me resolving this issue? Thanks.

Estimating finish time...
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Loading required package: SAVER
loaded SAVER and set parent environment
Finished 8/2206 genes. Approximate finish time: 2018-10-18 19:50:05
Calculating max cor cutoff...
loaded SAVER and set parent environment
loaded SAVER and set parent environment
loaded SAVER and set parent environment
loaded SAVER and set parent environment
loaded SAVER and set parent environment
loaded SAVER and set parent environment
loaded SAVER and set parent environment
loaded SAVER and set parent environment
Error in UseMethod("units<-") :
no applicable method for 'units<-' applied to an object of class "c('double', 'numeric')"

Best,
Joon

questions about prior variance optimization step

Hi Mo,

Thanks for developing the great software! I have a question about the marginal likelihood optimization procedure in your implementation. what's the purpose of this step?

SAVER/R/optimize_variance.R

Lines 39 to 55 in 23f8311

 if (mle.a - min.a < 0.5) { 

 if (mle.a-max.a > 10) { 

 a.max <- uniroot(function(x) calc.loglik.a(x, y, mu, sf) + mle.a - 10, 

 c(1e-05, var(y/sf)/mean(y/sf)^2))$root 

 } else { 

 a.max <- var(y/sf)/mean(y/sf)^2 

 } 

 samp <- (exp((exp(ppoints(100))-1)/2)-1)*a.max 

 loglik <- mapply(calc.loglik.a, a = samp, 

 MoreArgs = list(y = y, 

 mu = mu, 

 sf = sf)) 

 loglik2 <- exp(-loglik-min(-loglik)) 

 loglik3 <- loglik2/sum(loglik2) 

 a <- mean(sample(samp, 10000, replace = TRUE, prob = loglik3)) 

 a.loglik <- calc.loglik.a(a, y, mu, sf) 

 }

I didn't find references in the paper or supp materials. It would be great if you can illustrate more about this. Thanks!!

Best,
Kangcheng

nFeatureRNA value became zero for all cells

Hi!

I am new to the whole pipeline of SAVER. I ran SAVER on one of my data to see how the expression recovery is working. Howver I noticed that the values of nFeature_RNA in my seurat object turned out to be zero after this. Why is that so? Will it affect my Downstream analysis.?

Please see the Violin plot here

problem too large error in rowMeans inside saver()

Hello!

I am trying to use SAVER, hoping that imputed data will give rise to better gene correlation analysis. Thus, I am running on the RNA slot of a CCA integrated Seurat object. However it breaks down after ~15min.

Did you encounter this problem ?

> require(SAVER)
> data.imputed = saver(combined.obj@assays$RNA@counts, size.factor=1, ncores = 1, do.fast = TRUE)
26690 genes, 82000 cells
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'rowMeans': Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

Thank you & Happy New Year!

Question: Integration in Seurat workflow?

Dear SAVER-Team,

thank you for providing this great package. I am wondering what is the usual integration in the Seurat workflow and especially if I still need Seurat::ScaleData() and on which object I run the imputation (data@data, [email protected], [email protected]).

My simplified integration would be like that:

# 1
data = read.table("test.txt")
data = CreateSeuratObject()
data <- NormalizeData()
data <- ScaleData()

# 2
data@imputed = saver(data@data, size.factor=1)

# 3: Clustering, plotting etc

Many thanks!

	if (mle.a - min.a < 0.5) {
	if (mle.a-max.a > 10) {
	a.max <- uniroot(function(x) calc.loglik.a(x, y, mu, sf) + mle.a - 10,
	c(1e-05, var(y/sf)/mean(y/sf)^2))$root
	} else {
	a.max <- var(y/sf)/mean(y/sf)^2
	}
	samp <- (exp((exp(ppoints(100))-1)/2)-1)*a.max
	loglik <- mapply(calc.loglik.a, a = samp,
	MoreArgs = list(y = y,
	mu = mu,
	sf = sf))
	loglik2 <- exp(-loglik-min(-loglik))
	loglik3 <- loglik2/sum(loglik2)
	a <- mean(sample(samp, 10000, replace = TRUE, prob = loglik3))
	a.loglik <- calc.loglik.a(a, y, mu, sf)
	}

mohuangx / saver Goto Github PK

saver's People

Contributors

Stargazers

Watchers

Forkers

saver's Issues

Environment

ERROR

Recommend Projects

Recommend Topics

Recommend Org