Giter Club home page Giter Club logo

adtseq's Introduction

adtseq

adtseq provides simple Rcpp methods to identify Antibody-Derived-Tags in single cell CITE/REAP-seq data. Currently the function is setup to accept the 15 nt Total-Seq antibody barcodes provided and distributed by BioLegend.

Installation

Currently adtseq requires the Seqan headers provided by the RSeqan package, the pacakge currently links to the Boost headers provided by the BH package, thus it must also be installed - these are primarily used in some utility diagnostic functions (in the dev branch). Please refer to each of these packages for their installation requirements.

NOTE: the primary function will be able to accept BAM files only if the Seqan headers are linked to zlib (should be present on most *nix type systems).

install.packages(c("Rcpp", "BH"))
BiocManager::install("RSeqan")
install_github("JulianSpagnuolo/adtseq")

Citation

It is requested that any published or presented work that utlised this package and scripts contained within cite adtseq.

#> 
#> To cite package 'adtseq' in publications use:
#> 
#>   Julian Spagnuolo (2018). adtseq: Identification of
#>   Antibody-Derived-Tags in single cell CITE/REAP-seq data. R
#>   package version 0.99.3.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {adtseq: Identification of Antibody-Derived-Tags in single cell
#> CITE/REAP-seq data},
#>     author = {Julian Spagnuolo},
#>     year = {2018},
#>     note = {R package version 0.99.3},
#>   }
#> 
#> ATTENTION: This citation information has been auto-generated from
#> the package DESCRIPTION file and may need manual editing, see
#> 'help("citation")'.

Example

The primary function adtseq will take two files as input; A) a fasta file containing the antibody derived barcodes to identify in the sequencing data, B) a bam or sam file that has been processed through the dropseq-tools pipeline to tag read 2 reads with the cell barcodes and UMIs, remove the SMART-PCR adapter sequences and (optionally) 3’ polyA stretches. It is intended that this function be used at command line as it is currently extremely verbose and will output cell barcodes, UMI, antibody-ID, hamming distance and map score directly to std output.

The function will tag each read with the antibody barcode name (form the fasta file) using a combination of hamming distance (less than max_dist) and best Phred quality weighted mapping score calculated as the sum of scores for each base, subtracting the score at mismatches or adding at matches. This is performed similarly to the method used by Bowtie2 to determine alignment scores.

Score = MN + ((MX - MN) * (Phred/40))

At present Bowtie2’s defaults for MN (minimum score) and MX (maximum score), 2 and 6, respectively, are used.

The function will place the name of the identified antibody in a new tag “XA”, the mapping score under “AS”, and the hamming distance under “XN”. Additionally, the function will output a second tab-delimited text file containing the Cell Barcodes, UMI and identified antibody along with the hamming distance and mapping scores for each bam/sam record.

The resulting BAM/SAM file can be directly used in latter stages of the dropseq-tools pipeline which attempt to correct for synthesis errors in the cell barcodes and UMIs. However, should users wish, they may retrieve count data directly from the text file and omit the latter steps (not recommended).

adtseq(bamFileName = "ADT_processed_u.sam", bamOut = "out.sam", adtFasta = "total_seq_panel.fasta", max_dist = 4, sumoutput = "summary.txt", adt_panel = "A", bc_length = 15)

NOTE:

Experimental support for TotalSeq panels B and C has been implemented but not tested. In addition, support for custom barcodes has also been implemented, this is controlled via the “adt_panel” parameter. The bc_length parameter must be set for custom panels.

Default is to use the TotalSeq Panel “A” regex rules and a barcode length (bc_length) of 15.

# TotalSeq Panel A
adtseq(bamFileName = "ADT_processed_u.sam", bamOut = "out.sam", adtFasta = "total_seq_panel.fasta", max_dist = 4, sumoutput = "summary.txt", adt_panel = "A", bc_length = 15)
# TotalSeq Panel B
adtseq(bamFileName = "ADT_processed_u.sam", bamOut = "out.sam", adtFasta = "total_seq_panelB.fasta", max_dist = 4, sumoutput = "summary.txt", adt_panel = "B", bc_length = 15)
# TotalSeq Panel C
adtseq(bamFileName = "ADT_processed_u.sam", bamOut = "out.sam", adtFasta = "total_seq_panelC.fasta", max_dist = 4, sumoutput = "summary.txt", adt_panel = "C", bc_length = 15)
# Custom panel
adtseq(bamFileName = "ADT_processed_u.sam", bamOut = "out.sam", adtFasta = "custom_panel.fasta", max_dist = 4, sumoutput = "summary.txt", adt_panel = "custom", bc_length = 8)

adtseq's People

Contributors

julianspagnuolo avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.