Giter Club home page Giter Club logo

danio5p's Introduction

DaniO5P: Predicting 5'UTR-mediated translation during zebrafish embryogenesis

This repository contains code to train, evaluate, and interpret the Danio Optimus 5-Prime (DaniO5P) model from Reimão-Pinto MM, Castillo-Hair SM, Seelig G, Schier A. The regulatory landscape of 5′ UTRs in translational control during zebrafish embryogenesis. bioRxiv 2023.

Contents

See Python scripts and Jupyter notebooks inside each folder for more details.

DaniO5P examples

  • example_DaniO5P.ipynb: Example jupyter notebook on how to use DaniO5P to make predictions on an arbitrary 5'UTR sequence.
  • example_plot_contributions.ipynb: Example jupyter notebook on how to plot precomputed predictions and contribution scores for any sequence in the MPRA.

DaniO5P model training

  • length_model: Calculate and evaluate a model on Mean Ribosome Load (MRL) and estimated changes in abundance only based on 5'UTR length. Compute predictions and residuals (measurements - predictions) for all MPRA sequences.
  • cnn: Train and evaluate an ensemble of convolutional neural network (CNN) models to predict the residuals of MRL and estimated changes in abundance based on sequence.
  • full_model_evaluation: Compute performance metrics on the full DaniO5P (length + CNN) model, which are reported in the manuscript.

DaniO5P interpretation

  • contributions: Calculate nucleotide contributions to MRL and abundance predictions, for every sequence in the MPRA. Generate nucleotide contribution plots for the paper figures.
  • motifs: Extract motifs from the convolutional filters of the CNN models. Calculate average motif contributions to MRL and estimated changes in abundance at each timepoint, and relate these to motif position, secondary structure, etc.

DaniO5P-RNN (experimental)

DaniO5P-RNN can theoretically make predictions on sequences longer than those in the MRPA (238nt), but their accuracy in such sequences has not been validated. See notes at the beginning of the relevant notebooks and use with caution.

  • example_DaniO5P_RNN.ipynb: Example jupyter notebook on how to use DaniO5P-RNN to make predictions on an arbitrary 5'UTR sequence.
  • rnn: Train and evaluate the ensemble of recurrent neural network (RNN) models underlying DaniO5P-RNN.

Preprocessing and supporting code

  • preprocess_data: Data preprocessing. Computes MRL and estimated abundances from fraction TPMs, which are used for model training and analysis.
  • secondary_structure: Computes secondary structure metrics such as free energy and unpaired probabilities. These are used for motif analysis.
  • utils: Supporting code for sequence processing, model interpretation, and plotting.

Additional data

Some files are too big to be included in this repository. The following must be downloaded separately:

  • Trained model weights: [TODO: add URL when available]
  • Calculated contribution scores for all MPRA sequences: [TODO: add URL when available]
  • Secondary structure calculation results: [TODO: add URL when available]

Requirements

All of the code here was run in Python 3.9 with the following package version:

  • matplotlib 3.5.1
  • numpy 1.26.4
  • pandas 1.4.3
  • scipy 1.12.0
  • seaborn 0.13.2
  • logomaker 0.8
  • tensorflow 2.7
  • nupack 4.0.1.1 (for secondary structure calculations)
  • prtpy 0.8.2 (to compute chromosome-based data splits)

Other necessary software includes:

danio5p's People

Contributors

castillohair avatar

Stargazers

 avatar Dié Tang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.