iaconogi / bigscale2 Goto Github PK

View Code? Open in Web Editor NEW

101.0 101.0 38.0 196.72 MB

Framework for clustering, phenotyping, pseudotiming and inferring gene regulatory networks from single cell data

R 95.41% C++ 4.59%

bigscale2's People

Contributors

Stargazers

Watchers

bigscale2's Issues

bigSCale2 considers data only from 5000 cells

[1] "Pre-processing) Removing null rows "
[1] "Discarding 20 genes with all zero values"
[1] "PASSAGE 1) Setting the size factors ...."
[1] "PASSAGE 2) Setting the bins for the expression data ...."
[1] "97.0 % of elements < 10 counts, therefore Using a UMIs compatible binning"[1] "PASSAGE 3) Storing in the single cell object the Normalized data ...."
[1] "PASSAGE 4) Computing the numerical model (can take from a few minutes to 30 mins) ...."
[1] 19369 5000

Going through the given log file as highlighted above, I noticed that bigSCale2 considered only 5000 cells, though I parsed the data from 35639 cells. Thus, have few queries:

Does that mean, in order to analyze regulatory network using bigScale2 we should use cells from one sample at a time?
Can't we merge all the cells from a group of samples to compare them with other?
Does it pick top 5000 significant genes to derive their regulatory network? If it is so, how does it rank the genes?

Data normalization

Hi, is there any way I can use TPM normalized data to obtain the clustering output? I don't have raw counts for some datasets. I'm only interested in the clustering output at this time.

Thank you

Network using non-Ensembl/-symbol gene names

I was wondering if there was any way around filtering genes based on their names in the network algorithm. If I understand correctly from your paper, you filter the correlation network based on gene ontology? Right now I am using modified Ensembl IDs for the gene names and as a result the graph is empty.

Analyzing 10X single cell data for regulatory network

Hi Giovanni,
This could be a naive question but I am a bit confused.
I want to analyse 10X single cell RNA data in order to explore the regulatory network. In this case, the data has a cumulative reads from all the cells for a sample.

For analyzing regulatory network using bigSCale2, I believe it requires demultiplexed read counts from each cell. If this is the case, can you please tell how should I demultiplex my cumulative fastq files to cell specific fastq or cumulative read counts to cell specific read count for each sample.

Thanks

How can we obtain the index cells?

Hi!

I couldn't figure out from the tutorial (which ran just fine) how to extract the expression matrix of the index cells (analogous to extracting the single-cell expression matrix with assay(sce)). Could someone please give any pointers?

libv8 library dependency installation difficulty

Hi!

When trying to install bigSCale2 to a cluster where I don't have sudo access, I get this R error message:

> devtools::install_github("iaconogi/bigSCale2")
Downloading GitHub repo iaconogi/bigSCale2@master
These packages have more recent versions available.
Which would you like to update?

 1:   assertthat   (0.2.0       -> 0.2.1      ) [CRAN]   2:   backports    (1.1.3       -> 1.1.4      ) [CRAN]
 3:   cli          (1.0.1       -> 1.1.0      ) [CRAN]   4:   colorspace   (1.4-0       -> 1.4-1      ) [CRAN]
 5:   cowplot      (0.9.4       -> 1.0.0      ) [CRAN]   6:   curl         (3.3         -> 4.0        ) [CRAN]
 7:   data.table   (1.12.0      -> 1.12.2     ) [CRAN]   8:   dendextend   (1.10.0      -> 1.12.0     ) [CRAN]
 9:   digest       (0.6.18      -> 0.6.20     ) [CRAN]  10:   dplyr        (0.8.0.1     -> 0.8.3      ) [CRAN]
11:   ellipsis     (0.1.0       -> 0.2.0.1    ) [CRAN]  12:   ggplot2      (3.1.0       -> 3.2.1      ) [CRAN]
13:   ggpubr       (0.2         -> 0.2.2      ) [CRAN]  14:   ggrepel      (0.8.0       -> 0.8.1      ) [CRAN]
15:   ggsignif     (0.5.0       -> 0.6.0      ) [CRAN]  16:   gtable       (0.2.0       -> 0.3.0      ) [CRAN]
17:   hexbin       (1.27.2      -> 1.27.3     ) [CRAN]  18:   hms          (0.4.2       -> 0.5.0      ) [CRAN]
19:   httpuv       (1.4.5.1     -> 1.5.1      ) [CRAN]  20:   httr         (1.4.0       -> 1.4.1      ) [CRAN]
21:   mime         (0.6         -> 0.7        ) [CRAN]  22:   openssl      (1.2.2       -> 1.4.1      ) [CRAN]
23:   pillar       (1.3.1       -> 1.4.2      ) [CRAN]  24:   plotly       (4.8.0       -> 4.9.0      ) [CRAN]
25:   progress     (1.2.0       -> 1.2.2      ) [CRAN]  26:   Rcpp         (1.0.1       -> 1.0.2      ) [CRAN]
27:   RcppArmad... (0.9.200.7.0 -> 0.9.600.4.0) [CRAN]  28:   rlang        (0.3.1       -> 0.4.0      ) [CRAN]
29:   R.utils      (2.8.0       -> 2.9.0      ) [CRAN]  30:   shiny        (1.2.0       -> 1.3.2      ) [CRAN]
31:   sys          (2.1         -> 3.2        ) [CRAN]  32:   tibble       (2.1.1       -> 2.1.3      ) [CRAN]
33:   tidyr        (0.8.2       -> 0.8.3      ) [CRAN]  34:   vctrs        (0.1.0       -> 0.2.0      ) [CRAN]
35:   xtable       (1.8-3       -> 1.8-4      ) [CRAN]  36:   zoo          (1.8-4       -> 1.8-6      ) [CRAN]
37:   CRAN packages only                                38:   All                                             
39:   None                                              
Enter one or more numbers separated by spaces, or an empty line to cancel
1: 39
Installing 4 packages: ggbeeswarm, randomcoloR, V8, vipor
Installing packages into ‘/gfs/devel/avoda/R/x86_64-pc-linux-gnu-library/3.5’
(as ‘lib’ is unspecified)
trying URL 'https://www.stats.bris.ac.uk/R/src/contrib/ggbeeswarm_0.6.0.tar.gz'
Content type 'application/x-gzip' length 1494271 bytes (1.4 MB)
==================================================
downloaded 1.4 MB

trying URL 'https://www.stats.bris.ac.uk/R/src/contrib/randomcoloR_1.1.0.tar.gz'
Content type 'application/x-gzip' length 5854 bytes
==================================================
downloaded 5854 bytes

trying URL 'https://www.stats.bris.ac.uk/R/src/contrib/V8_2.3.tar.gz'
Content type 'application/x-gzip' length 304765 bytes (297 KB)
==================================================
downloaded 297 KB

trying URL 'https://www.stats.bris.ac.uk/R/src/contrib/vipor_0.4.5.tar.gz'
Content type 'application/x-gzip' length 4699815 bytes (4.5 MB)
==================================================
downloaded 4.5 MB

* installing *source* package ‘V8’ ...
** package ‘V8’ successfully unpacked and MD5 sums checked
Using PKG_CFLAGS=-I/usr/include/v8 -I/usr/include/v8-3.14
Using PKG_LIBS=-lv8 -lv8_libplatform
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because  was not found. Try installing:
 * deb: libv8-dev or libnode-dev (Debian / Ubuntu)
 * rpm: v8-devel (Fedora, EPEL)
 * brew: v8 (OSX)
 * csw: libv8_dev (Solaris)
To use a custom libv8, set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘V8’
* removing ‘/gfs/devel/avoda/R/x86_64-pc-linux-gnu-library/3.5/V8’
Error in i.p(...) : 
  (converted from warning) installation of package ‘V8’ had non-zero exit status

I cannot install libv8 due to lack of sudo access and can't find a pre-compiled executable for Linux after a few hours of looking it up.

Is the V8 library so necessary that it must be a dependency for bigSCale2? It isn't used in the iCells feature for example

Critical value way too high (1.02513e+10): Stopping the analysis while performing compute.distance under compute.network

While performing compute.network(), I am getting following error:

Recursive clustering, beginning round 1 ....[1] "Analyzing 3622 cells for ODgenes, min_ODscore=2.33"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
Error in compute.distances(expr.norm = expr.data.norm[, which(mycl == :
Critical value way too high (1.02513e+10): Stopping the analysis!!
Calls: compute.network -> bigscale.recursive.clustering -> compute.distances
In addition: Warning messages:
1: 'Rfast::sort_mat' is deprecated.
Use 'Rfast::rowSort' instead.
See help("Deprecated")
2: In compute.distances(expr.norm = expr.data.norm[, which(mycl == :
Critical value very high (1.02513e+10): Increased memory usage!!
Execution halted

running function bigscale: Error in vector("list", tot.clusters * tot.clusters) : vector size cannot be infinite

Thanks for this wonderful tool！
I met this error when using the test data bigSCale2 provide.
library(bigSCale)
data(sce)
sce=bigscale(sce,speed.preset='fast')

[1] "PASSAGE 1) Setting the bins for the expression data ...."
[1] "Pre-processing) Removing null rows "
[1] "Setting the size factors ...."
[1] "Generating the edges ...."
[1] "Creating edges..."
[1] "93.9 % of elements < 10 counts, therefore Using a UMIs compatible binning"
[1] "PASSAGE 2) Storing the Normalized data ...."
[1] "PASSAGE 3) Computing the numerical model (can take from a few minutes to 30 mins) ...."
[1] "Computing Overdispersed genes ..."
[1] "Analyzing 3005 cells for ODgenes, min_ODscore=2.33"
[1] "Discarding skewed genes"
[1] "Using 15596 genes detected in at least >15 cells"
[1] "Further reducing to 15563 geni after discarding skewed genes"
[1] "Determined 1538 overdispersed genes"
[1] "Using 25 PCA components for 1538 genes and 3005 cells"
[1] "Computing t-SNE and UMAP..."
[1] "Computing the markers (slowest part) ..."
Error in vector("list", tot.clusters * tot.clusters) :
vector size cannot be infinite
In addition: Warning message:
In max(clusters) : no non-missing arguments to max; returning -Inf

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.0 DelayedArray_0.12.0 BiocParallel_1.20.0 matrixStats_0.55.0
[6] Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.0 IRanges_2.20.1 S4Vectors_0.24.0
[11] BiocGenerics_0.32.0 bigSCale_2.0

loaded via a namespace (and not attached):
[1] umap_0.2.3.1 Rcpp_1.0.3 RSpectra_0.15-0 compiler_3.6.1 pillar_1.4.2 XVector_0.26.0
[7] prettyunits_1.0.2 bitops_1.0-6 tools_3.6.1 progress_1.2.2 zlibbioc_1.32.0 zeallot_0.1.0
[13] packrat_0.5.0 jsonlite_1.6 Rtsne_0.15 lifecycle_0.1.0 gtable_0.3.0 tibble_2.1.3
[19] lattice_0.20-38 pkgconfig_2.0.3 rlang_0.4.2 Matrix_1.2-18 rstudioapi_0.10 GenomeInfoDbData_1.2.2
[25] dplyr_0.8.3 askpass_1.1 vctrs_0.2.0 hms_0.5.2 tidyselect_0.2.5 RcppZiggurat_0.1.5
[31] grid_3.6.1 Rfast_1.9.7 reticulate_1.13 glue_1.3.1 R6_2.4.1 purrr_0.3.3
[37] ggplot2_3.2.1 magrittr_1.5 scales_1.1.0 backports_1.1.5 assertthat_0.2.1 colorspace_1.4-1
[43] openssl_1.4.1 lazyeval_0.2.2 munsell_0.5.0 RCurl_1.95-4.12 crayon_1.3.4 zoo_1.8-6

rowSort' is not an exported object from 'namespace:Rfast

when I try the Gene Regulatory Networks tutorial , the first step "results.ctl=compute.network(expr.data = expr.ctl,gene.names = gene.names)" reported an error:
Error: 'rowSort' is not an exported object from 'namespace:Rfast'

Failed to install 'bigSCale'

Hi everyone,

Eventually the installation was aborted, which made me sad. Logs and environment as follows.

devtools::install_github("iaconogi/bigSCale2")
Downloading GitHub repo iaconogi/bigSCale2@master
Skipping 21 packages not available: Rcpp, SingleCellExperiment, Rfast, Rtsne, float, ggplot2, zoo, dendextend, BioQC, igraph, org.Hs.eg.db, org.Mm.eg.db, RgoogleMaps, ggbeeswarm, ggpubr, plotly, heatmap3, progress, bigmemory, R.utils, RJSONIO
√ checking for file 'C:\Users\xxx\AppData\Local\Temp\Rtmpc9Thpy\remotes4745885785f\iaconogi-bigSCale2-0b688ee/DESCRIPTION' (911ms)

preparing 'bigSCale': (1.2s)
checking DESCRIPTION meta-information ...
checking DESCRIPTION meta-information ...
√ checking DESCRIPTION meta-information

cleaning src

checking for LF line-endings in source and make files and shell scripts

checking for empty or unneeded directories

looking to see if a 'data/datalist' file should be added
building

building 'bigSCale_1.7.tar.gz'
ERROR: dependencies 'SingleCellExperiment', 'Rfast', 'Rtsne', 'float', 'zoo', 'dendextend', 'BioQC', 'igraph', 'org.Hs.eg.db', 'org.Mm.eg.db', 'RgoogleMaps', 'ggbeeswarm', 'ggpubr', 'plotly', 'heatmap3', 'progress', 'bigmemory', 'R.utils', 'RJSONIO' are not available for package 'bigSCale'

removing 'C:/Program Files/R/R-3.6.1/library/bigSCale'
错误: Failed to install 'bigSCale' from GitHub:
(由警告转换成)installation of package ‘C:/Users/xxx/AppData/Local/Temp/Rtmpc9Thpy/file47452de53ba/bigSCale_1.7.tar.gz’ had non-zero exit status

R version: 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
OS: Win7 x64 with sp1

Should I use Ubuntu? Looking forward to any suggestions.

Object 'out' not found when tweaking previous network

Dear Giovanni,
When I tried to tweak previous network with 'compute.network' , there came an error, as shown below:

ctl.grn <- compute.network(previous.output = ctl.grn, quantile.p = 0.999)
[1] "It appears you want to tweak previously created networks with a different quantile.p, proceeding ...."
Error in compute.network(previous.output = ctl.grn, quantile.p = 0.999) :
object 'out' not found
In addition: Warning message:
In if (is.na(previous.output)) { :
the condition has length > 1 and only the first element will be used

Then I searched the source code, it seems that the parameter 'out' is not defined

Looking for your help.
Thanks,
Jason

Memory issue (vector memory exhausted) with very small dataset, inside iCell

Hi! The package feature iCell works wonderfully for compressing the Human Cell Atlas 270k+ cord blood as well as the 250k+ bone marrow cells to 8k & 8k high quality cells.

However, when trying to compress 68,000 PBMC 10X cells (dataset available for free download here: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/fresh_68k_pbmc_donor_a) to 8k, it gave a memory error

Error: vector memory exhausted (limit reached?)

I tried closing all the software (e.g. Chrome & other RAM eaters) and started a widget to monitor RAM usage (even though the same machine managed to handle the Human Cell Atlas Immune Census...) to make sure there's low RAM usage before and during the run. The machine is a 16GB-RAM, 2.7GHz Intel Core i7 Macbook Pro.

However, this did not work, and it fetched the same error.

Full console dump underneath:

> setwd("/Users/avoda/Downloads/filtered_matrices_mex/hg19")
> library(bigSCale)
> out=iCells(file.dir = "matrix.mtx",target.cells = 8000)
[1] "Attempting to reduce the size approximately 8.57237 times"
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
    clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply,
    parSapplyLB

The following objects are masked from ‘package:Matrix’:

    colMeans, colSums, rowMeans, rowSums, which

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colMeans, colnames,
    colSums, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
    paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans,
    rownames, rowSums, sapply, setdiff, sort, table, tapply, union, unique, unsplit,
    which, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:Matrix’:

    expand

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite
    Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians

Loading required package: BiocParallel

Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

    aperm, apply

[1] "Total of 68579 cells, to be reduced with pooling factor 3 (=2+1)"
[1] "Adjusting  icells.chuncks to 50000 cells"
[1] "For the proprocessing, downsampling of 0.145817"
[1] "Incorporated 10000 cells for pre-processing"
[1] "Pre-processing) Removing null rows "
[1] "Discarding 15740 genes with all zero values"
[1] "Setting the size factors ...."
[1] "matrix"
[1] "Generating the edges ...."
[1] "95.0 % of elements < 10 counts, therefore Using a UMIs compatible binning"
            used   (Mb) gc trigger   (Mb) limit (Mb)  max used   (Mb)
Ncells   4782780  255.5    8564171  457.4         NA   6167999  329.5
Vcells 356631604 2720.9  622151691 4746.7      16384 622146352 4746.6
[1] 16998  5000
[1] "I remove 2966 genes not expressed enough"
[1] "Calculating normalized-transformed matrix ..."
[1] "Computing transformed matrix ..."
[1] "Normalizing expression gene by gene ..."
[1] "Calculating Pearson correlations ..."
[1] "Clustering  ..."
[1] "Calculating optimal cut of dendrogram for pre-clustering"
[1] "Pre-clustering: cutting the tree at 2.00 %: 98 pre-clusters of median(mean) size 22.5 (51.0204)"
[1] "Computed Numerical Model. Enumerated a total of 1.36968e+09 cases"                      
[1] "Analyzing 10000 cells for ODgenes, min_ODscore=3.00"
[1] "Discarding skewed genes"
[1] "Using 10536 genes detected in at least >20 cells"
[1] "Further reducing to 10530 geni after discarding skewed genes"
[1] "Determined  717 overdispersed genes"
[1] NA
[1] "Less than 10 cells remaining, quitting"
Error: vector memory exhausted (limit reached?)

Any help would be welcome.

0 nodes and 0 edges after GO filtering

Just to give a brief background, I am using SmatSeq2 scRNA Seq data having 25 bp PE reads. I want to use Networks tool from bigSCale2. For which, I used direct mode with compute.network.
This led to identify 13617 nodes and 189399 edges in my raw regulatory network. But after GO filtering it gives 0 nodes and 0 edges, as shown below:

"Inferred the raw regulatory network: 13617 nodes and 189399 edges (ratio E/N)=13.909011"
[1] "Recognized 0/0 (NaN%) as Human, Gene Symbol"
[2] "Recognized 0/0 (NaN%) as Human, ENSEMBL"
[3] "Recognized 0/0 (NaN%) as Mouse, Gene Symbol"
[4] "Recognized 0/0 (NaN%) as Mouse, ENSEMBL"

[1] "Final network after GO filtering: 0 nodes and 0 edges (ratio E/N)=NaN and 0 components"

I am worried if it is because any of these networks are not significant enough as per the default parameters. Or is there any possibility that the tool could not compare my list of candidate genes from the network with the GO database.

Looking for your kind help and guidance.

Thanks

compute.network error: Error in smooth.spline(x = sLa$x[!is.na(movSD)], y = movSD[!is.na(movSD)], : 'tol' must be strictly positive and finite

This time while analyzing 10X genomics RNA Seq data, I am getting another error:
"
Error in smooth.spline(x = sLa$x[!is.na(movSD)], y = movSD[!is.na(movSD)], :
'tol' must be strictly positive and finite
"
What does this error mean and how can I resolve this?
Please accept my apologies for spamming your wall with so many issues and errors.

Thanks

Error in cutree(ht, h = max(ht$height) * progressive.depth[k]/100) : the 'height' component of 'tree' is not sorted (increasingly)

Hi, As suggested, I have reinstalled the new version of bigSCale2 and re-ran it both with direct and recursive modes but getting another error as shown in th logs below:

Cmd used:
results_wt=compute.network(expr.data = wt_count,gene.names = gnames_wt,clustering = 'direct')

[1] "Pre-processing) Removing null rows "
[1] "PASSAGE 1) Setting the size factors ...."
[1] "PASSAGE 2) Setting the bins for the expression data ...."
[1] "95.4 % of elements < 10 counts, therefore Using a UMIs compatible binning"
[1] "PASSAGE 3) Storing in the single cell object the Normalized data ...."
[1] "PASSAGE 4) Computing the numerical model (can take from a few minutes to 30 mins) ...."
[1] 19389 5000
[1] "I remove 16 genes not expressed enough"
[1] "Calculating normalized-transformed matrix ..."
[1] "Computing transformed matrix ..."
[1] "Normalizing expression gene by gene ..."
[1] "Calculating Pearson correlations ..."
[1] "Clustering ..."
[1] "Calculating optimal cut of dendrogram for pre-clustering"
[1] "We are here"
[1] "Pre-clustering: cutting the tree at 1.00 %: 61 pre-clusters of median(mean) size 61 (81.9672)"
[1] "Computed Numerical Model. Enumerated a total of 3.73985e+09 cases"
[1] "PASSAGE 5) Clustering ..."
[1] "Clustering cells down to groups of approximately 50-250 cells"

Recursive clustering, beginning round 1 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Recursive clustering, after round 2 obtained 3 clusters
Recursive clustering, beginning round 2 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Recursive clustering, after round 3 obtained 7 clusters
Recursive clustering, beginning round 3 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Recursive clustering, after round 4 obtained 20 clusters
Recursive clustering, beginning round 4 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Recursive clustering, after round 5 obtained 54 clusters
Recursive clustering, beginning round 5 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Recursive clustering, after round 6 obtained 120 clusters
Recursive clustering, beginning round 6 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Recursive clustering, after round 7 obtained 255 clusters
Recursive clustering, beginning round 7 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Recursive clustering, after round 8 obtained 380 clusters
Recursive clustering, beginning round 8 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Recursive clustering, after round 9 obtained 465 clusters
Recursive clustering, beginning round 9 ....
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"
[1] "Proceeding to calculated cell-cell distances with bigscale modality"

Error in cutree(ht, h = max(ht$height) * progressive.depth[k]/100) :
the 'height' component of 'tree' is not sorted (increasingly)
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Kindly guide how can I resolve this.

Thanks in advance.

Error at clustering stage

I am applying bigScale to a unique dataset. The dataset contains single cell gene expression matrix, 9320 genes and 9120 cells.
Without any preprocessing whatsoever, I run bigScale on this dataset by first embedding it into SingleCellExperiment Class. It runs fine until clustering and I get this error
Error in hclust(D, method = "ward.D") :
NA/NaN/Inf in foreign function call (arg 10)
I even tried with custom hclust but the error persist
Any ideas?

Error: vector memory exhausted (limit reached?)

Thank you for developing this tool !
I'm trying to use bigSCale 2 Gene Regulatory Networks with big dataset

"37625 features across 34419 samples"

and gets below error !

results.ctl=compute.network(expr.data = counts.data, gene.names = gene.names, clustering = "direct")
[1] "Pre-processing) Removing null rows "
Error: vector memory exhausted (limit reached?)
In addition: Warning message:
In compute.network(expr.data = counts.data, gene.names = gene.names, :
It seems you are running compute.network on a kind of large dataset, it it failed for memory issues you should try the compute.network for large datasets

are there anyways to bypass this memory issues?

Thank you in advance!

Error in iCells.simple: object 'intermediate' not found

Hi!

When trying to compress 68,000 PBMC 10X cells (dataset available for free download here: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/fresh_68k_pbmc_donor_a) to 8k, it gave a weird error.

Command used:

out=iCells(file.dir = "matrix.mtx",target.cells = 8000, icells.chuncks=150000, verbose = TRUE)

Error message presented at end:

[1] "Detected 1 chunks to merge"
[1] "Appending chunk 1/1"
[1] "Input cells from 1 to 24600"
[1] "Writing cells from 1 to 24600"
Error in iCells.simple(file.dir = file.dir, pooling.factor = pooling1,  : 
  object 'intermediate' not found
Calls: iCells -> iCells.simple

Full verbose output:

[1] "Attempting to reduce the size approximately 8.57237 times"
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:Matrix’:

    which

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:Matrix’:

    expand

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians

Loading required package: BiocParallel

Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

    aperm, apply, rowsum

[1] "Total of 68579 cells, to be reduced with pooling factor 3 (=2+1)"
[1] "Adjusting  icells.chuncks to 75000 cells"
[1] "For the proprocessing, downsampling of 0.145817"
[1] "reps=1"
[1] "Incorporated 10000 cells for pre-processing"
[1] "Pre-processing) Removing null rows "
[1] "Discarding 15745 genes with all zero values"
[1] "Setting the size factors ...."
[1] "matrix"
[1] "Generating the edges ...."
[1] "95.0 % of elements < 10 counts, therefore Using a UMIs compatible binning"
            used   (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   4952082  264.5    8918354  476.3   6279209  335.4
Vcells 356742416 2721.8  726276908 5541.1 614394028 4687.5
[1] 16993  5000
[1] "I remove 3043 genes not expressed enough"
[1] "Calculating normalized-transformed matrix ..."
[1] "Computing transformed matrix ..."
[1] "Normalizing expression gene by gene ..."
[1] "Calculating Pearson correlations ..."
[1] "Clustering  ..."
[1] "Calculating optimal cut of dendrogram for pre-clustering"
[1] "We are here"
[1] "Pre-clustering: cutting the tree at 2.00 %: 43 pre-clusters of median(mean) size 59 (116.279)"
[1] "Computed Numerical Model. Enumerated a total of 3.09007e+09 cases"
[1] "Analyzing 10000 cells for ODgenes, min_ODscore=3.00"
[1] "Discarding skewed genes"
[1] "Using 10523 genes detected in at least >20 cells"
[1] "Further reducing to 10517 geni after discarding skewed genes"
[1] "Determined  695 overdispersed genes"
[1] "Pre-processing round 1, cumulated 695 OD genes, sum(N)=3.09007e+09"
[1] "Total sum of cases in N 3.09007e+09"
[1] NA
[1] "Starting from 68579 unpooled cells...."
[1] ""
[1] "Computing distances ..."
[1] "Relative pooling: Estimated relative distance cutoff 0.00 (quantile 0.05)"
[1] "Launching iCells pooling ..."
[1] "USING CUTOFF 0.00"
[1] "Actual 0.1 quantile of local distances 2.23"
[1] "Sorting 68579 cells with 500 neighbours..."
[1] "Best cell has 250 good neighbours, restricting the data"
[1] "Starting to pool..."
[1] "21967 cells are still unpooled .... (pooling.factor=2)"
[1] "Starting from 21967 unpooled cells...."
[1] ""
[1] "Computing distances ..."
[1] "Relative pooling: Estimated relative distance cutoff 9.72 (quantile 0.05)"
[1] "Launching iCells pooling ..."
[1] "USING CUTOFF 9.72"
[1] "Actual 0.1 quantile of local distances 13.14"
[1] "Sorting 21967 cells with 500 neighbours..."
[1] "Best cell has 135 good neighbours, restricting the data"
[1] "Starting to pool..."
[1] "4441 cells are still unpooled .... (pooling.factor=2)"
[1] "Starting from 4441 unpooled cells...."
[1] ""
[1] "Computing distances ..."
[1] "Relative pooling: Estimated relative distance cutoff 25.38 (quantile 0.05)"
[1] "Launching iCells pooling ..."
[1] "USING CUTOFF 25.38"
[1] "Actual 0.1 quantile of local distances 32.20"
[1] "Sorting 4441 cells with 500 neighbours..."
[1] "Best cell has 102 good neighbours, restricting the data"
[1] "Starting to pool..."
[1] "451 cells are still unpooled .... (pooling.factor=2)"
[1] "Starting from 451 unpooled cells...."
[1] ""
[1] "Computing distances ..."
[1] "Relative pooling: Estimated relative distance cutoff 42.34 (quantile 0.05)"
[1] "Launching iCells pooling ..."
[1] "USING CUTOFF 42.34"
[1] "Actual 0.1 quantile of local distances 50.79"
[1] "Sorting 451 cells with 450 neighbours..."
[1] "Best cell has 86 good neighbours, restricting the data"
[1] "Starting to pool..."
[1] "39 cells are still unpooled .... (pooling.factor=2)"
[1] "Starting from 39 unpooled cells...."
[1] ""
[1] "Computing distances ..."
[1] "Relative pooling: Estimated relative distance cutoff 64.17 (quantile 0.05)"
[1] "Launching iCells pooling ..."
[1] "USING CUTOFF 64.17"
[1] "Actual 0.1 quantile of local distances 74.28"
[1] "Sorting 39 cells with 38 neighbours..."
[1] "Best cell has 6 good neighbours, restricting the data"
[1] "Starting to pool..."
[1] "14 cells are still unpooled .... (pooling.factor=2)"
[1] "Starting from 14 unpooled cells...."
[1] ""
[1] "Computing distances ..."
[1] "Relative pooling: Estimated relative distance cutoff 82.28 (quantile 0.05)"
[1] "Launching iCells pooling ..."
[1] "USING CUTOFF 82.28"
[1] "Actual 0.1 quantile of local distances 96.27"
[1] "Sorting 14 cells with 13 neighbours..."
[1] "Best cell has 2 good neighbours, restricting the data"
[1] "Starting to pool..."
[1] "7 cells are still unpooled .... (pooling.factor=2)"
[1] "Starting from 7 unpooled cells...."
[1] ""
[1] "Less than 10 cells remaining, quitting"
[1] "Reading again from source"
[1] "Processing iCells from 1 to 500"
[1] "Processing iCells from 500 to 999"
[1] "Processing iCells from 999 to 1498"
[1] "Processing iCells from 1498 to 1997"
[1] "Processing iCells from 1997 to 2496"
[1] "Processing iCells from 2496 to 2995"
[1] "Processing iCells from 2995 to 3494"
[1] "Processing iCells from 3494 to 3993"
[1] "Processing iCells from 3993 to 4492"
[1] "Processing iCells from 4492 to 4991"
[1] "Processing iCells from 4991 to 5490"
[1] "Processing iCells from 5490 to 5989"
[1] "Processing iCells from 5989 to 6488"
[1] "Processing iCells from 6488 to 6987"
[1] "Processing iCells from 6987 to 7486"
[1] "Processing iCells from 7486 to 7985"
[1] "Processing iCells from 7985 to 8484"
[1] "Processing iCells from 8484 to 8983"
[1] "Processing iCells from 8983 to 9482"
[1] "Processing iCells from 9482 to 9981"
[1] "Processing iCells from 9981 to 10480"
[1] "Processing iCells from 10480 to 10979"
[1] "Processing iCells from 10979 to 11478"
[1] "Processing iCells from 11478 to 11977"
[1] "Processing iCells from 11977 to 12476"
[1] "Processing iCells from 12476 to 12975"
[1] "Processing iCells from 12975 to 13474"
[1] "Processing iCells from 13474 to 13973"
[1] "Processing iCells from 13973 to 14472"
[1] "Processing iCells from 14472 to 14971"
[1] "Processing iCells from 14971 to 15470"
[1] "Processing iCells from 15470 to 15969"
[1] "Processing iCells from 15969 to 16468"
[1] "Processing iCells from 16468 to 16967"
[1] "Processing iCells from 16967 to 17466"
[1] "Processing iCells from 17466 to 17965"
[1] "Processing iCells from 17965 to 18464"
[1] "Processing iCells from 18464 to 18963"
[1] "Processing iCells from 18963 to 19462"
[1] "Processing iCells from 19462 to 19961"
[1] "Processing iCells from 19961 to 20460"
[1] "Processing iCells from 20460 to 20959"
[1] "Processing iCells from 20959 to 21458"
[1] "Processing iCells from 21458 to 21957"
[1] "Processing iCells from 21957 to 22456"
[1] "Processing iCells from 22456 to 22955"
[1] "Processing iCells from 22955 to 23454"
[1] "Processing iCells from 23454 to 23953"
[1] "Processing iCells from 23953 to 24452"
[1] "Processing iCells from 24452 to 24600"
[1] 24600     3
[1]     1 68579
[1]     1 68579
[1] "Total cells read (cumulative): 68579, current group of iCells ranges from 1 to 68579, total iCells range from 1 to 68579"
[1] "Detected 1 chunks to merge"
[1] "Appending chunk 1/1"
[1] "Input cells from 1 to 24600"
[1] "Writing cells from 1 to 24600"
Error in iCells.simple(file.dir = file.dir, pooling.factor = pooling1,  : 
  object 'intermediate' not found
Calls: iCells -> iCells.simple
In addition: Warning message:
'Rfast::sort_mat' is deprecated.
Use 'Rfast::rowSort' instead.
See help("Deprecated") 
Execution halted

Visualization of the regulatory network

Hi I am planning to use networkx in python to visualize the GRN from bigSCale2. Can I use the results$graph directly with networkx if not how can I modify it for the same.

Thanks

Can I apply infercnv on iCells?

I have got a very large public dataset so that my service cannot handle the infercnv program even though I have filted the dataset strictly, so I want to know if I could iCell the row counts and then apply infercnv? Thank you very much.

GRN step error: dimnames applied to non-array

Dear Giovanni, Thanks for making such a nice package! I'm trying to use this for GRN inference and running into the following issue when running the example data from the tutorial:

> data("pancreas")
> results.ctl=compute.network(expr.data = expr.ctl,gene.names = gene.names,clustering = 'direct')
[1] "Pre-processing) Removing null rows "
[1] "Discarding 4182 genes with all zero values"
[1] "PASSAGE 1) Setting the size factors ...."
[1] "PASSAGE 2) Setting the bins for the expression data ...."
[1] "47.2 % of elements < 10 counts, therefore Using a reads compatible binning"
[1] "PASSAGE 3) Storing in the single cell object the Normalized data ...."
[1] "PASSAGE 4) Computing the numerical model (can take from a few minutes to 30 mins) ...."
[1] 22089  1313
[1] "I remove 1349 genes not expressed enough"
[1] "Calculating normalized-transformed matrix ..."
[1] "Computing transformed matrix ..."
[1] "Normalizing expression gene by gene ..."
[1] "Calculating Pearson correlations ..."
[1] "Clustering  ..."
[1] "Calculating optimal cut of dendrogram for pre-clustering"
[1] "Pre-clustering: cutting the tree at 6.00 %: 15 pre-clusters of median(mean) size 84 (87.5333)"
[1] "Computed Numerical Model. Enumerated a total of 1.11135e+09 cases"
[1] "PASSAGE 5) Clustering ..."
[1] "Clustering cells down to groups of approximately 50-250 cells"

Recursive clustering, beginning round 1 ....
Recursive clustering, after round 2 obtained 3 clusters
Recursive clustering, beginning round 2 ....
Recursive clustering, after round 3 obtained 8 clusters
Recursive clustering, beginning round 3 ....
Recursive clustering, after round 4 obtained 16 clusters
Recursive clustering, beginning round 4 ....
Recursive clustering, after round 5 obtained 18 clusters
Recursive clustering, beginning round 5 ....
Recursive clustering, after round 6 obtained 20 clusters
Recursive clustering, beginning round 6 ....
Recursive clustering, after round 7 obtained 20 clusters[1] "Assembling cluster average expression for 16837 genes expressed in at least 15 cells"
           used  (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells  4695558 250.8    8343664 445.6   8343664  445.6
Vcells 43342908 330.7  130680182 997.1 316068296 2411.5
[1] "Calculating Pearson ..."
           used  (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells  4695747 250.8    8343664 445.6   8343664  445.6
Vcells 32308898 246.5  104544145 797.7 316068296 2411.5
            used   (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   4695759  250.8    8343664  445.6   8343664  445.6
Vcells 315793497 2409.4  457819434 3492.9 316130223 2411.9
[1] "Calculating quantile ..."
[1] "Using 0.844034 as cutoff for pearson correlation"
[1] "Calculating Spearman ..."
[1] "Calculating the significant links ..."
[1] "Calculating the final score ..."
Error in dimnames(x) <- dn : 'dimnames' applied to non-array
In addition: There were 16 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (2.1453e+06): Increased memory usage!!
2: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (1.99952e+06): Increased memory usage!!
3: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (602734): Increased memory usage!!
4: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (1.95611e+06): Increased memory usage!!
5: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (1.37801e+06): Increased memory usage!!
6: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (246067): Increased memory usage!!
7: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (1.07938e+06): Increased memory usage!!
8: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (678642): Increased memory usage!!
9: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (1.88923e+06): Increased memory usage!!
10: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (568443): Increased memory usage!!
11: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (129351): Increased memory usage!!
12: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (719888): Increased memory usage!!
13: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (825144): Increased memory usage!!
14: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (135339): Increased memory usage!!
15: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (2.48703e+06): Increased memory usage!!
16: In compute.distances(expr.norm = expr.data.norm[, which(mycl ==  ... :
  Critical value very high (124733): Increased memory usage!!
>

The same happens when I try to use my own data as well. It seems that there is possibly some issue with this step:

bigSCale2/R/Functions.R

Line 732 in c0b655e

Df=(Ds+Dp)/float::fl(2)

but I can't figure out exactly what. Any ideas how to fix this?

Chris

compute.network error "modality argument missing"

Hi @iaconogi,

I'm having a small issue with the compute.network function. When it gets to the recursive clustering in Round 5, it throws the error that the argument modality is missing with no default.

Looking into the source code, this seems to arise because the bigscale.recursive.clustering() function calls compute.distances(), but fails to set the modality argument that it requires.

David

raw fastq

Error in Rfast::rowVars

when I try the Gene Regulatory Networks tutorial , the first step "results.ctl=compute.network(expr.data = expr.ctl,gene.names = gene.names)" reported an error:
Error in Rfast::rowVars(expr.row.sorted[, (num.samples - skwed.cells):num.samples], :
unused argument (suma = NULL)

Bigscale Repo removed

Dear @iaconogi

I am not sure what was the point of removing https://github.com/iaconogi/bigSCale when it is mentioned in the paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5991513/pdf/878.pdf

What happened to reproduciblity aspects of the paper and the reader who are going to read the paper and what to reproduce the result? I have web archive extension installed so I was able to see an archived version which navigated me to bigScale2, but not every reader are tech-savvy in this matter.

Can you please bring back bigScale repo?

Thanks.
Rahul

compute.network error

Hi I am getting error while performing compute.network during
Recursive clustering, after round 6 obtained 11 clusters[1] "Calculating Zscores for 17726 genes expressed in at least 15 cells"

The error message I am getting is:

Error in bigscale.DE(expr.norm = expr.norm, N_pct = N_pct, edges = edges, :
Critical value too high (1.58501e+06): Note from bigSCale author, you have to increase indA.size

How can I resolve this error? Any help will be highly appreciated.

Thanks

Using a singleCellExperiment type object converted from a Seurat type causes bigscale() to error out.

Hi,
I was using Seurat, and did some QC and filtering on my SC data. Then converted it with as.SingleCellExperiment(), to be able to use bigSCale. However, when applying the bigscale() function, it throws me an error after running just fine for quite some time. The error occurs at passage 10, and reads the following:

[1] "PASSAGE 10) Computing the markers (slowest part) ..."

Error in print.default() : argument "x" is missing, with no default

I am guessing there is an issue with one of the messages that is supposed to be printed, but I don't know why.
What could be the issue?

Thanks for any help in advance

Could bigscale2 export gene clusters or gene modules after analyzing GRN?

Thanks for the tools!
I am wondering whether bigscale2 could export gene clusters or gene modules after analyzing GRN.
Thanks a lot.

viewSignatures-Error in x - a

Hi, I have converted my data from seurat object to a SCE class. After I ran the bigscale() command when I want to view my signature I get the following error regarding to the ggplot apparently

Error in x - a : non-numeric argument to binary operator
In addition: Warning messages:
1: In cbind(palette, ColSideColors) :
number of rows of result is not a multiple of vector length (arg 1)
2: In mean.default(x) : argument is not numeric or logical: returning NA
3: In mean.default(x) : argument is not numeric or logical: returning NA

I have no clue how can it happen and how should I resolve it.

I would appreciate it any comment on it !

Thanks

Input file format for bigSCale2

How should we prepare the input file for bigSCale2

Using raw UMI counts in bigscale

Hello,
How to get started with RNA-seq data available in a CSV file (table format)? I am attaching a sample file having raw UMI counts. So I need to get gene.names and expr.ctl value from this file. Please let me know soon!

sample_subset.xlsx

How can I get the cell information after iCells() downsample process?

hi, I need get the cell information of the expression data,that is ,the column name of the expression data.
but,I read the source code and found that iCells() function input with the sparse matrix which only contains the numeric data and missing the column names,how can I get it ? Thanks~

Correlation matrix and the list of genes with centrality from bigSCale2

Hi Giovanni,
It could be again a naive question but I wanted to ask why do we have different # of genes between the correlation matrix and the list of genes with centrality?
Would it be recommended to look for the overlapping genes between the above two list to find out any interesting observation?

Thanks

construct Gene Regulatory Networks for other species SC data

Dear iaconogi,
Thanks for such nice tools!
I am wondering how to use bigSCale2 construct GRN for other species SC data(for example zebrafish)

iaconogi / bigscale2 Goto Github PK

bigscale2's People

Contributors

Stargazers

Watchers

Forkers

bigscale2's Issues

Recommend Projects

Recommend Topics

Recommend Org