Giter Club home page Giter Club logo

splice2neo's Introduction

splice2neo

R-CMD-check Codecov test coverage

Documentation: https://tron-bioinformatics.github.io/splice2neo/

Overview

This package provides functions for the analysis of splice junctions and their association with somatic mutations. It integrates the output of several tools that predict splicing effects from mutations or detect expressed splice junctions from RNA-seq data into a standardized splice junction format based on genomic coordinates. Users can filter detected splice junctions against canonical splice junctions and annotate them with affected transcript sequences, CDS, and resulting peptide sequences. Splice2neo supports splice events from alternative 3’/5’ splice sites, exons skipping, intron retentions, exitrons, and mutually exclusive exons. Integrating splice2neo functions and detection rules based on splice effect scores and RNA-seq support facilitates the identification of mutation-associated splice junctions that are tumor-specific and can encode neoantigen candidates.

Currently, splice2neo supports the output data from the following tools:

For a more detailed description and a full example workflow see the vignette

Installation

The R package splice2neo is not yet on CRAN or Bioconductor. Therefore, you need to install splice2neo from github.

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}

remotes::install_github("TRON-Bioinformatics/splice2neo")

Example usage

This is a basic example of how to use some functions.

library(splice2neo)

# load human genome reference sequence
requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)
bsg <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19

1 Example data

We start with some example splice junctions provided with the package.

junc_df <- dplyr::tibble(
  junc_id = toy_junc_id[c(1, 6, 10)]
)

junc_df
#> # A tibble: 3 × 1
#>   junc_id                   
#>   <chr>                     
#> 1 chr2:152389996-152392205:-
#> 2 chr2:179415981-179416357:-
#> 3 chr2:179446225-179446226:-

2 Add transcripts

Next, we find the transcripts which are in the same genomic region as the splice junction and that may be affected by the junction.

junc_df %>% 
  add_tx(toy_transcripts) %>% 
  head()
#> # A tibble: 6 × 3
#>   junc_id                    tx_id           tx_lst      
#>   <chr>                      <chr>           <named list>
#> 1 chr2:152389996-152392205:- ENST00000409198 <GRanges>   
#> 2 chr2:152389996-152392205:- ENST00000172853 <GRanges>   
#> 3 chr2:152389996-152392205:- ENST00000397345 <GRanges>   
#> 4 chr2:152389996-152392205:- ENST00000427231 <GRanges>   
#> 5 chr2:152389996-152392205:- ENST00000618972 <GRanges>   
#> 6 chr2:152389996-152392205:- ENST00000413693 <GRanges>

3 Modify transcripts with junctions

We modify the canonical transcripts by introducing the splice junctions. Then we add the transcript sequence in a fixed-sized window around the junction positions, the context sequence.

toy_junc_df %>% 
  head()
#> # A tibble: 6 × 2
#>   junc_id                    tx_id          
#>   <chr>                      <chr>          
#> 1 chr2:152389996-152392205:- ENST00000409198
#> 2 chr2:152389996-152390729:- ENST00000409198
#> 3 chr2:152389955-152389956:- ENST00000397345
#> 4 chr2:152388410-152392205:- ENST00000409198
#> 5 chr2:152388410-152390729:- ENST00000409198
#> 6 chr2:179415981-179416357:- ENST00000342992


toy_junc_df %>% 
  add_context_seq(transcripts = toy_transcripts, size = 400, bsg = bsg) %>% 
  head()
#> # A tibble: 6 × 8
#>   junc_id       tx_id tx_mod_id junc_pos_tx cts_seq cts_junc_pos cts_size cts_id
#>   <chr>         <chr> <chr>           <int> <chr>   <chr>           <int> <chr> 
#> 1 chr2:1523899… ENST… ENST0000…       16412 AAGAAG… 200               400 90bfc…
#> 2 chr2:1523899… ENST… ENST0000…       16517 AAGAAG… 200               400 26f77…
#> 3 chr2:1523899… ENST… ENST0000…       21620 AAGAAG… 0,200,1745,…     1945 f1f2c…
#> 4 chr2:1523884… ENST… ENST0000…       16412 AAGAAG… 200               400 d4f9e…
#> 5 chr2:1523884… ENST… ENST0000…       16517 AAGAAG… 200               400 c715a…
#> 6 chr2:1794159… ENST… ENST0000…       83789 TGGATT… 200               400 0128d…

4 Annotate peptide sequence

Here, we use the splice junctions to modify the coding sequences (CDS) of the reference transcripts. The resulting CDS sequences are translated into protein sequence and further annotated with the peptide around the junction, the relative position of the splice junction in the peptide, and the location of the junction in an open reading frame (ORF).

toy_junc_df %>% 
  
  # add peptide sequence
  add_peptide(cds=toy_cds, flanking_size = 13, bsg = bsg) %>% 

  # select subset of columns
  dplyr::select(junc_id, peptide_context, junc_in_orf, cds_description) %>% 
  head()
#> # A tibble: 6 × 4
#>   junc_id                    peptide_context         junc_in_orf cds_description
#>   <chr>                      <chr>                   <lgl>       <chr>          
#> 1 chr2:152389996-152392205:- NRHFKYATQLMNEIC         TRUE        mutated cds    
#> 2 chr2:152389996-152390729:- HLLAKTAGDQISQIC         TRUE        mutated cds    
#> 3 chr2:152389955-152389956:- MLTALYNSHMWSQVMSDGM     TRUE        mutated cds    
#> 4 chr2:152388410-152392205:- NRHFKYATQLMNEIKYRKNYEK… TRUE        mutated cds    
#> 5 chr2:152388410-152390729:- <NA>                    TRUE        truncated cds  
#> 6 chr2:179415981-179416357:- SDPSKFTLAVSPVAGTPDYIDV… TRUE        mutated cds

Issues

Please report issues here: https://github.com/TRON-Bioinformatics/splice2neo/issues

Citation

Lang, Franziska, Patrick Sorn, Martin Suchan, Alina Henrich, Christian Albrecht, Nina Koehl, Aline Beicht, et al. 2023. “Prediction of Tumor-Specific Splicing from Somatic Mutations as a Source of Neoantigen Candidates.” bioRxiv. https://doi.org/10.1101/2023.06.27.546494.

splice2neo's People

Contributors

federicomarini avatar franla23 avatar ibn-salem avatar johausmann avatar patricksorn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

federicomarini

splice2neo's Issues

hg38 reference genome

Hi,
Is it possible to run splice2neo by using hg38 genome reference sequence?

Thank you in advance.

Elisa

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.