functionlab / speedi Goto Github PK

View Code? Open in Web Editor NEW

0.0 5.0 0.0 1.86 MB

Single-cell Pipeline for End to End Data Integration (SPEEDI)

R 100.00%

non-commercial-license

speedi's Introduction

SPEEDI

Overview of SPEEDI
Using the SPEEDI Website
Running SPEEDI Locally
Citing SPEEDI
Need Help?

Overview of SPEEDI

Single-cell Pipeline for End to End Data Integration (SPEEDI) is a fully automated, end-to-end pipeline that facilitates single cell data analysis and improves robustness and reproducibility. SPEEDI computationally infers batch labels and automates the application of state of the art processing and analysis tools. Additionally, SPEEDI implements a reference-based cell type annotation method coupled with a majority-vote system. SPEEDI takes raw count feature-by-barcode single cell data matrices as input and outputs an integrated and annotated single-cell object, a log file with auto-selected analysis parameters, and a set of preliminary analyses.

Using the SPEEDI Website

The SPEEDI Website allows users to upload their single cell datasets to our server for processing. Users can then view and download results once processing completes. Please visit the website to learn more!

Running SPEEDI Locally

To install the SPEEDI R package locally, you can use devtools and BiocManager:

if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
devtools::install_github('FunctionLab/SPEEDI', repos = BiocManager::repositories())

All R-related dependencies should be installed automatically. Note that RTools is required to install the SPEEDI R package in Windows. To learn how to use the SPEEDI R package, please view the SPEEDI vignette.

Citing SPEEDI

The SPEEDI manuscript is currently under review.

Need Help?

If you encounter any issues using SPEEDI, feel free to contact a SPEEDI administrator ([email protected]).

speedi's People

Contributors

Watchers

speedi's Issues

Handling of SeuratData References

Originally, in LoadReference, for a given reference, the function used data to load the reference data (e.g., data("kidneyref")). However, this doesn't seem to work with these reference data, as you get an error message like:

Warning message:
In data("kidneyref") : data set ‘kidneyref’ not found

Note that the data command does work for non-reference data distributed by SeuratData. This issue has been reported by other users: satijalab/seurat-data#53

We need to figure out how to successfully use these references. I will do some more testing and report back!

Turn SPEEDI into R package and submit to Bioconductor (or CRAN)

This issue contains two steps:

Prepare SPEEDI as an R package (keeping in mind Bioconductor standards)
Submit SPEEDI to Bioconductor and go through the review process

Before SPEEDI is accepted to Bioconductor, we can provide instructions for how to install SPEEDI directly from GitHub. Reviewers may need to use this approach if we are still waiting for Bioconductor approval when we submit the paper.

Importantly, Bioconductor follows a release schedule where packages are released every ~six months (usually in April and October). If this schedule is completely at odds with the publication schedule, we may need to consider submitting to CRAN instead, but I think this is unlikely.

Are the functions in utils.R necessary?

It seems like the functions in utils.R are all connected with the PCA function in that file. Is this PCA function actually used anywhere now? I saw a line using it that was commented out in the original VisualizeIntegration function (since removed), but I don't see it used anywhere else. It looks like we're just using the standard Seurat::UsePCA. Can we delete utils.R?

Source for cc.gene.updated.mouse

What was the source for the cc.gene.updated.mouse file?

Add Doublet Detection to SPEEDI?

Should we consider adding doublet detection as an optional addition to SPEEDI? The relevant code can be run immediately after creation of the SeuratObject - something like this as a starting point:

# Load doublet package
library(scDblFinder)
# Find doublets
sc_obj <- as.Seurat(scDblFinder(as.SingleCellExperiment(sc_obj), samples = "sample"))
# See distribution of doublets in each sample
doublet_sc_obj <- subset(x = sc_obj, subset = scDblFinder.class %in% "doublet")
print(table(doublet_sc_obj$sample))
rm(doublet_sc_obj)
# Remove doublets
sc_obj <- subset(x = sc_obj, subset = scDblFinder.class %in% "singlet")

Create Wrapper Function for Running SPEEDI

We should add a wrapper function so that users can run SPEEDI with a single function call (specifying data path, sample IDs, organism type, and tissue type).