Giter Club home page Giter Club logo

scpca-nf's Introduction

scpca-nf

This repository holds a Nextflow workflow (scpca-nf) that is used to process 10X single-cell data as part of the Single-cell Pediatric Cancer Atlas (ScPCA) project. All dependencies for the workflow outside of the Nextflow workflow engine itself are handled automatically; setup generally requires only organizing the input files and configuring Nextflow for your computing environment. Nextflow will also handle parallelizing sample processing as allowed by your environment, minimizing total run time.

The workflow processes fastq files from single-cell and single-nuclei RNA-seq samples using alevin-fry to create gene by cell matrices. The workflow outputs gene expression data in two formats: as SingleCellExperiment objects and as AnnData objects. Reads from samples are aligned using selective alignment, to an index with transcripts corresponding to spliced cDNA and to intronic regions, denoted by alevin-fry as splici. These matrices are filtered and additional processing is performed to calculate quality control statistics, create reduced-dimension transformations, assign cell types using both SingleR and CellAssign, and create output reports. scpca-nf can also process libraries with ADT tags (e.g., CITE-seq), multiplexed libraries (e.g., cell hashing), bulk RNA-seq, and spatial transcriptomics samples.

For more information on the contents of the output files and the processing of all modalities, please see the ScPCA Portal docs.

Overview of Workflow

Using scpca-nf to process your samples

The default configuration of the scpca-nf workflow is currently set up to process samples as part of the ScPCA portal and requires access to AWS through the Data Lab. For all other users, scpca-nf can be set up for your computing environment with a few configuration files.

Instructions for using scpca-nf

โš ๏ธ Please note that processing single-cell and single-nuclei RNA-seq samples requires access to a high performance computing (HPC) environment with nodes that can accommodate jobs requiring up to 24 GB of RAM and 12 CPUs.

To run scpca-nf on your own samples, you will need to complete the following steps:

  1. Organize your files so that each folder contains fastq files relevant to a single sequencing run.
  2. Prepare a run metadata file with one row per library containing all information needed to process your samples.
  3. Prepare a sample metadata file with one row per sample containing any relevant metadata about each sample (e.g., diagnosis, age, sex, cell line).
  4. Set up a configuration file, including the definition of a profile, dictating where Nextflow should execute the workflow.

You may also test your configuration file using example data.

For ALSF Data Lab users, please refer to the internal instructions for how to run the workflow on our systems.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.