jcao89757 / scina Goto Github PK
View Code? Open in Web Editor NEWSCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples
Home Page: http://lce.biohpc.swmed.edu/scina/index.php
SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples
Home Page: http://lce.biohpc.swmed.edu/scina/index.php
Hi,
I'm trying to get SCINA working for a project and receiving following error message:
Error in chol.default(theta[[i]]$sigma1) : 'a' must have dims > 0
In addition: Warning messages:
1: In min(c(diag(x$sigma1), diag(x$sigma2))) :
no non-missing arguments to min; returning Inf
2: In min(c(diag(x$sigma1), diag(x$sigma2))) :
no non-missing arguments to min; returning Inf
3: In min(c(diag(x$sigma1), diag(x$sigma2))) :
no non-missing arguments to min; returning Inf
4: In min(c(diag(x$sigma1), diag(x$sigma2))) :
no non-missing arguments to min; returning Inf
I have tried with setting 'rm_overlap = FALSE', but can't resolve the issue in this way. As for the marker genes, the matrix contains around 90 marker genes in total and genes that are not in the counts have been removed. Have you maybe seen this error previously and have an idea how to fix it?
Thanks a lot in advance!
Hi there,
I noticed a potential bug I wanted to bring to your attention. I was able to run the SCINA() function successfully with 3 signatures (~10 genes each), but got the following error when running plotheat.SCINA().
Error in exp[unlist(signatures), order(factor(results$cell_labels, levels = c(names(signatures), :
subscript out of bounds
Once I removed two genes from the signatures that were not present in the expresison matrix, this error disappears and I can successfully produce a heatmap.
Hi there,
This is a really wonderful tool.
Just a little query: on line 68 of the EM_model.R
file, you are converting the expression matrix into a dense matrix using exp=as.matrix(exp)
, before you then subset down to the gene markers. For large datasets, this requires a lot of memory! This can regularly lead to an error Cholmod error 'problem too large'
.
Would it affect performance to either not dens-ify the matrix, or to do so only after subsetting for the marker genes?
Jonny
Line 15 in a789563
Hello! I am trying to use SCINA with a very big dataset, it is ~ 25.000 genes and 80.000 cells. I Have a very large file of signatures downloaded from CellMarker (1600 signatures), but I filtered out all the signatures that do not have at least 2 markers genes that overlap with my expression matrix.
When I run SCINA i got sometimes the error "chol.default(theta[[i]]$sigma1) : 'a' must have dims > 0" and sometimes "Error in if (any(keep)) { : missing value where TRUE/FALSE needed".
I did many test with the same data, and for example I saw that with a exp matrix of 1200 genes and 2000 cells, the same signatures files always filtered with the same criteria, works with no errors.
So I want to ask if the size of the expression matrix could be a problem ? Because I did multiple test and I saw that often even with a signature file with many markers for each signature SCINA works with no errors.
Thanks a lot!
signatures
contains multiple signature vectors. I noticed if the user has a signature vector where none of the markers match exp
the SCINA::SCINA()
function tosses up an error:
library(SCINA)
load(system.file('extdata','example_expmat.RData', package = "SCINA"))
load(system.file('extdata','example_signatures.RData', package = "SCINA"))
exp = exp_test$exp_data
If there is one marker in a signature list that doesn't match things are fine:
signatures <- list(cd14_monocytes = signatures$cd14_monocytes, b_cells = signatures$b_cells, cd56_nk = c("CLIC3", "CST7", "foo"))
results = SCINA(exp, signatures, max_iter = 100, convergence_n = 10, convergence_rate = 0.999, sensitivity_cutoff = 0.9, rm_overlap=TRUE, allow_unknown=TRUE, log_file='SCINA.log')
However, if there is a signature vector where all markers fail to match SCINA::SCINA()
errors out:
signatures <- list(cd14_monocytes = signatures$cd14_monocytes, b_cells = signatures$b_cells, cd56_nk = c("foo", "bar"))
results = SCINA(exp, signatures, max_iter = 100, convergence_n = 10, convergence_rate = 0.999, sensitivity_cutoff = 0.9, rm_overlap=TRUE, allow_unknown=TRUE, log_file='SCINA.log')
Error in chol.default(theta[[i]]$sigma1) : 'a' must have dims > 0
In addition: Warning message:
In min(c(diag(x$sigma1), diag(x$sigma2))) :
no non-missing arguments to min; returning Inf
It seems to me it would be best to remove empty vectors from the theta
(list) object. I'll take a shot at a PR
Hi @jcao89757 ,
Very cool software package!
To my understanding SCINA is based on an EM-framework which would involve a random initialization step for every time SCINA is invoked. However, I am receiving identical results for a particular dataset every time I run SCINA. Is there a hidden set.seed() function somewhere?
I want to run SCINA 10 times and determine how stable the cell type annotations are across these 10 iterations.
Thanks in advance!
Hello:
I would like to ask if it is possible to annotate the results with the seurat cluster! Like the cluster parameter of singleR.
Hello. When I run the following command, I get the eror:
Command:
results = SCINA(exp, signatures, max_iter = 100, convergence_n = 10, convergence_rate = 0.999, sensitivity_cutoff = 0.9, rm_overlap=F, allow_unknown=TRUE, log_file='SCINA.log')
Error:
Error in chol.default(theta[[i]]$sigma1) : 'a' must have dims > 0
I use SCINA to annotate my single-cell data. I met two problems.
table(results$cell_labels)
unknown
95520
When I remove "Monocyte" and "DC" genes in signatures , I got the results like this :
> table(results1$cell_labels)
B CD4 T CD4 Trg
2207 14380 3212
CD8T-NKT Cholangiocyte Cycling T
13438 869 1195
Endothelial-1 Endothelial-2 Hepatocyte
5060 2641 4366
Macrophage NK Normal-like Endothelial
8714 7647 2748
pDC Plasma Stellate cell
2951 1709 1247
unknown
23136
What was the reason for that?
2.my single-cell data has 9.5W cells. I cannot got a heatmap plot by plotheat.SCINA function. There was no error message, it just didn't work.
Is it possible to add a parameter so that the generated image is not directly displayed?
It's like argument "silent=T" of function plotScoreHeatmap()
Hello, I'm using SCINA_1.2.0 with R version 4.3.2 (2023-10-31).
When I plotted the heatmap on my dataset, I noticed that the legend for the column colors listed all signatures as opposed to those actually assigned to my cells, with a consequent mismatch in the plot.
I think the problem is with this part of the plotheat.SCINA code:
legend_text = c(paste("Gene identifiers", names(signatures), sep = "_"), names(signatures), "unknown")
Thanks
Hello !
I'm trying to annotate using my gene list. However, unknown labels always came out although I used 'allow_unknown = FALSE' option. I tried to make Seurat object again but result was same... my code is here:
scina.results <- SCINA(scina.data, list_pnas, max_iter = 100,
convergence_n = 10,
convergence_rate = 0.99,
sensitivity_cutoff = 0.9,
rm_overlap = TRUE,allow_unknown = FALSE)
table(scina.results$cell_labels)
Immature_neuron RGC unknown
18498 1972 11107
Is there any mistake in my code..?
Thank you in advance!!!
Dahun
Hi, I'm working with SCINA but unfortunately when running the SCINA function I obtain the following error message:
Error in if (any(keep)) { : missing value where TRUE/FALSE needed
Can you help me please?
Best
Massimo
Hello!
I'm a newer using SCINA.
I have one error of running SCINA.
My gene list has only 11 genes. But I can't find any annotated cells...
scina.results.non <- SCINA(scina.data, list_test, max_iter = 100, convergence_n = 10, convergence_rate = 0.999, sensitivity_cutoff = 0.9, rm_overlap = FALSE,allow_unknown = FALSE)
Error in if (any(keep)) { : missing value where TRUE/FALSE needed
I searched in Issues section but I think I'm not same situation.
Could you let me know any suggestions about this?
Thanks!
Thank you for this useful package!
I noticed that the GitHub version of the project does not carry an open source license, which restricts any ability to modify or distribute the code. The CRAN version carries the GPL-2 license, which also substantially restricts the ability of published works to use SCINA. (Here is a good summary of the issue for CRAN packages with GPL licenses.)
Would you consider adding a more permissive license to the GitHub version (or better yet, both versions) so that the package can be included in distributed projects that may not be able to carry the GPL-2 license for a variety of reasons?
Hi jcao,
Will there be a Python version of SCINA?
Best,
Laura
Hi,
is there a way to define a signature with negative expression (or low expression) of a specific marker?
For instance CD27-IgD+ and CD27+IgD+ cells?
How would the signature vector would look like?
thank you
I constructed gene markers in CSV file format. Make sure that gene symbols are consistent with the exp matrix. When I called SCINA(), I got the following error!
Error in chol.default(theta[[i]]$sigma1) : 'a' must have dims > 0
In addition: Warning message:
In min(c(diag(x$sigma1), diag(x$sigma2))) :
no non-missing arguments to min; returning Inf
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.