Giter Club home page Giter Club logo

zhangxiaotuo / labranchor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jpaggi/labranchor

0.0 0.0 0.0 174 MB

LaBranchoR predicts RNA splicing branchpoints using a LSTM network. LaBranchoR uses only the sequence 1 to 70 base pairs upstream of a three prime splice site as a feature. In our paper, we show that LaBranchoR predicts a correct branchpoint for over 90% of 3' splice sites! See our website for precomputed predictions for all introns in gencode v19.

Home Page: http://bejerano.stanford.edu/labranchor/

Jupyter Notebook 99.97% Python 0.02% Perl 0.01% MATLAB 0.01% R 0.01% Shell 0.01% HTML 0.01%

labranchor's Introduction

LaBrachoR (LSTM Branchpoint Retriever)

LaBranchoR uses a LSTM network built with keras to predict the position of RNA splicing branchpoints relative to a three prime splice site. Precisely evaluating LaBranchoR was challenging due to pervasive noise in the experimental data, but as we show in our paper, we estimate that LaBranchoR correcty predicts a branchpoint for over 90% of 3'ss.

Paggi J.M., Bejerano, G. A sequence-based, deep learning model accurately predicts RNA splicing branchpoints. bioRxiv 185868 (2017). DOI:10.1101/185868

Download existing branchpoint annotations

See our website linked above to download branchpoint predictions for introns in gencode v19 (hg19) or view LaBranchoR predicted branchpoints in the UCSC genome browser.

Running LaBranchoR

If having to run the model yourself would stop you from using LaBranchoR, please open an issue requesting the desired predictions or contact the authors via email.

All of the code and model weights needed to run LaBranchoR are available in the 'labranchor' directory. Running LaBranchoR requires keras and numpy to be installed.

Predicting branchpoints

The script labranchor.py makes predictions for a fasta file of sequences upstream of 3'ss. It can be invoked with

python labrachor.py weights 'top-bed'/'top'/'all' fasta_file output

weights: The path to the h5 weights file (labranchor/2layer.h5)

'top-bed'/'top'/'all': top-bed: produces a bed file of predicted branchpoints. Assumes fasta names are chrom:3'ss_coord:strand (ex. chr1:1000:+) top: reports the shift of the top scoring branchpoint from the associated 3'ssfor each fasta entry all: reports a comma seperated list of branchpoint probabilities corresponding to positions -70 to -1 from each 3'ss

fasta_file: Path to a fasta file of sequences upstream of 3'ss. Input sequences are required to be 70 base pairs and should not contain characters other than 'A', 'C', 'G', 'T', or 'N'. Any Ns will be considered A's during prediction.

output: Path to the output file. See the above options for formatting.

Creating 3'ss sequence fasta files

The script create_fasta.py can be used to create fasta files suitable for branchpoint prediction for all introns in given gtf file.

It can be invoked with:

python create_fasta.py genome gtf output

genome: A path to a genome fasta file consistent with the gtf file.

gtf: The path to the gtf file you wish to predict branchpoints in.

output: The path to the output fasta file.

Analysis Included in Paper

Model training: notebooks/train_model.ipynb

Model performance: notebooks/performance_*

Cases where LaBranchoR disagrees with experimental data: notebooks/disagreement_*

Genome-wide properties and overlap with pathogenic variants: notebooks/landscape_*

Properties of C and no -2 U branchpoints: notebooks/landscape_C_and_noT.ipynb

Enrichments of ExAC variants: notebooks/ExAC_variant_enrichments.ipynb

Generation of ISM supplmentary data: notebooks/supp_data.ipynb

Analysis not included in paper

Exploration of nucleotide importances: notebooks/importance.ipynb

Analysis of secondary structure near branchpoints: notebooks/secondary_*

labranchor's People

Contributors

jpaggi avatar charlotteanne avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.