The R package scDesign3 is an all-in-one single-cell data simulation tool by using reference datasets with different cell states (cell types, trajectories or and spatial coordinates), different modalities (gene expression, chromatin accessibility, protein abundance, DNA methylation, etc), and complex experimental designs. The transparent parameters enable users to alter models as needed; the model evaluation metrics (AIC, BIC) and convenient visualization function help users select models. The following illustration figure summarizes the usage of scDesign3:
To find out more details about scDesign3, you can check out our manuscript on Nature Biotechnology:
To install the development version from GitHub, please run:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("SONGDONGYUAN1994/scDesign3")
We are now working on submitting it to Bioconductor and will provide the link once online.
The following code is a quick example of running our simulator. The function scdesign3()
takes in a SinglecellExperiment
object with the cell covariates(such as cell types, pesudotime, or spatial coordinates) stored in the colData
of the SinglecellExperiment
object. For more details on the SinlgeCellExperiment
object, please check on its Bioconductor link.
example_simu <- scdesign3(
sce = example_sce,
assay_use = "counts",
celltype = "cell_type",
pseudotime = "pseudotime",
spatial = NULL,
other_covariates = NULL,
mu_formula = "s(pseudotime, k = 10, bs = 'cr')",
sigma_formula = "s(pseudotime, k = 5, bs = 'cr')",
family_use = "nb",
n_cores = 2,
usebam = FALSE,
corr_formula = "1",
copula = "gaussian",
DT = TRUE,
pseudo_obs = FALSE,
return_model = FALSE,
nonzerovar = FALSE
)
The output of scdesign3()
is a list which includes:
new_count
: This is the synthetic count matrix generated byscdesign3()
.new_covariate
:- If the parameter
ncell
is set to a number that is different from the number of cells in the input data, this will be a matrix that has the new cell covariates that are used for generating new data. - If the parameter
ncell
is the default value, this will beNULL
.
- If the parameter
model_aic
: This is a vector include the genes' marginal models' AIC, fitted copula's AIC, and total AIC, which is the sum of the previous two.model_bic
: This is a vector include the genes' marginal models' BIC, fitted copula's BIC, and total BIC, which is the sum of the previous two.marginal_list
:- If the parameter
return_model
is set toTRUE
, this will be a list which contains the fitted gam or gamlss model for all genes in the input data. This may greatly increase the object size. - If the parameter
return_model
is set to the default valueFALSE
, this will beNULL
.
- If the parameter
corr_list
:- If the parameter
return_model
is set toTRUE
, this will be a list which contains either a correlation matrix (whencopula = "gaussian"
) or the fitted Vine copula (whencopula = "vine
) for each user specified correlation groups (based on the parametercorr_by
). - If the parameter
return_model
is set to the default valueFALSE
, this will beNULL
.
- If the parameter
For all detailed tutorials, please check the website. The tutorials will demonstrate the applications of scDesign3 from the following four perspectives: data simulation, model parameters, model selection, and model alteration.
- Data simulation
- scDesign3 introduction
- Simulate datasets with cell library siz
- Simulate datasets with multiple lineages
- Simulate spatial transcriptomic data
- Simulate spot-resolution spatial data for cell-type deconvolution
- Simulate single-cell ATAC-seq data
- Simulate CITE-seq data
- Simulate multi-omics data from multiple single-omic datasets
- Simulate datasets with batch effect
- Simulate datasets with condition effect
- Model parameter
- scDesign3 introduction
- scDesign3 marginal distribution for genes
- Compare Gaussian copula and Vine copula
- Model selection
- Evaluate clustering goodness-of-fit by scDesign3
- Evaluate pseudotime goodness-of-fit by scDesign3
- Model alteration
- Simulate datasets with/without batch effect
- Simulate datasets with/without condition effect
- Simulate datasets for DE test
Any questions or suggestions on scDesign3
are welcomed! Please report it on issues, or contact Dongyuan Song ([email protected]{.email}) or Qingyang Wang ([email protected]{.email}).
- The predecessors of scDesign3
- scDesign: Li, W. V., & Li, J. J. (2019). A statistical simulator scDesign for rational scRNA-seq experimental design. Bioinformatics, 35(14), i41-i50.
- scDesign2: Sun, T., Song, D., Li, W. V., & Li, J. J. (2021). scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome biology, 22(1), 1-37.
- The simulator for single-cell multi-omics reads developed by our lab memeber Guanao Yan