Giter Club home page Giter Club logo

scdesign3's Introduction

scDesign3


The R package scDesign3 is an all-in-one single-cell data simulation tool by using reference datasets with different cell states (cell types, trajectories or and spatial coordinates), different modalities (gene expression, chromatin accessibility, protein abundance, DNA methylation, etc), and complex experimental designs. The transparent parameters enable users to alter models as needed; the model evaluation metrics (AIC, BIC) and convenient visualization function help users select models. The following illustration figure summarizes the usage of scDesign3:

{width="600"}

To find out more details about scDesign3, you can check out our manuscript on Nature Biotechnology:

Song, D., Wang, Q., Yan, G. et al. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol (2023).

Installation

To install the development version from GitHub, please run:

if (!require("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("SONGDONGYUAN1994/scDesign3")

We are now working on submitting it to Bioconductor and will provide the link once online.

Quick Start

The following code is a quick example of running our simulator. The function scdesign3() takes in a SinglecellExperiment object with the cell covariates(such as cell types, pesudotime, or spatial coordinates) stored in the colData of the SinglecellExperiment object. For more details on the SinlgeCellExperiment object, please check on its Bioconductor link.

example_simu <- scdesign3(
    sce = example_sce,
    assay_use = "counts",
    celltype = "cell_type",
    pseudotime = "pseudotime",
    spatial = NULL,
    other_covariates = NULL,
    mu_formula = "s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula = "s(pseudotime, k = 5, bs = 'cr')",
    family_use = "nb",
    n_cores = 2,
    usebam = FALSE,
    corr_formula = "1",
    copula = "gaussian",
    DT = TRUE,
    pseudo_obs = FALSE,
    return_model = FALSE,
    nonzerovar = FALSE
  )

The output of scdesign3() is a list which includes:

  • new_count: This is the synthetic count matrix generated by scdesign3().
  • new_covariate:
    • If the parameter ncell is set to a number that is different from the number of cells in the input data, this will be a matrix that has the new cell covariates that are used for generating new data.
    • If the parameter ncell is the default value, this will be NULL.
  • model_aic: This is a vector include the genes' marginal models' AIC, fitted copula's AIC, and total AIC, which is the sum of the previous two.
  • model_bic: This is a vector include the genes' marginal models' BIC, fitted copula's BIC, and total BIC, which is the sum of the previous two.
  • marginal_list:
    • If the parameter return_model is set to TRUE, this will be a list which contains the fitted gam or gamlss model for all genes in the input data. This may greatly increase the object size.
    • If the parameter return_model is set to the default value FALSE, this will be NULL.
  • corr_list:
    • If the parameter return_model is set to TRUE, this will be a list which contains either a correlation matrix (when copula = "gaussian") or the fitted Vine copula (when copula = "vine) for each user specified correlation groups (based on the parameter corr_by).
    • If the parameter return_model is set to the default value FALSE, this will be NULL.

Tutorials

For all detailed tutorials, please check the website. The tutorials will demonstrate the applications of scDesign3 from the following four perspectives: data simulation, model parameters, model selection, and model alteration.

  • Data simulation
    • scDesign3 introduction
    • Simulate datasets with cell library siz
    • Simulate datasets with multiple lineages
    • Simulate spatial transcriptomic data
    • Simulate spot-resolution spatial data for cell-type deconvolution
    • Simulate single-cell ATAC-seq data
    • Simulate CITE-seq data
    • Simulate multi-omics data from multiple single-omic datasets
    • Simulate datasets with batch effect
    • Simulate datasets with condition effect
  • Model parameter
    • scDesign3 introduction
    • scDesign3 marginal distribution for genes
    • Compare Gaussian copula and Vine copula
  • Model selection
    • Evaluate clustering goodness-of-fit by scDesign3
    • Evaluate pseudotime goodness-of-fit by scDesign3
  • Model alteration
    • Simulate datasets with/without batch effect
    • Simulate datasets with/without condition effect
    • Simulate datasets for DE test

Contact

Any questions or suggestions on scDesign3 are welcomed! Please report it on issues, or contact Dongyuan Song ([email protected]{.email}) or Qingyang Wang ([email protected]{.email}).

Related Manuscripts

scdesign3's People

Contributors

songdongyuan1994 avatar qw130 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.