Giter Club home page Giter Club logo

trips's Introduction

TRIPs

Dependencies

Analysis was done with Python v3.8.6 with the following libraries:

Code has been tested only on these library versions. No non-standard hardware is required and installation only takes minutes. We recommend using conda to create an specific environment - here all installations were done using anaconda3 (https://www.anaconda.com/download).

Guide to notebooks

There are two example workflows (see Table S1 from the manuscript for details):

  • Ecoli_D1: E. coli MG1655 grown in LB
  • Saureus_D5: S. aureus USA300 LAC grown in TSB

For each of these we have the following notebooks:

  • initial_processing.ipynb: Code for doing initial data import, filtering, and scVI denoising. This also has the code to generate global correlation patterns. The whole workbook takes ~1 hr to run.
  • cycle_analysis.ipynb: Code for doing cell cycle analysis including assigning cell and gene angles, finding cycle variable genes, and aligning cell angle by the predicted point of replication. All analysis takes ~10 min to run (with additional time for origin_angle_circular_model.R (see below).
  • promoter_distance_analysis.ipynb: Determining relationships between expression patterns and distance from the transcriptional start site (see Figure 4 in the manuscript). Takes <5 min to run.
  • trip_analysis.ipynb: Defining Transcription-Replication Interaction Profiles (TRIPs) and performing clustering on these. Takes <5 min to run.

In addition there is the origin_angle_circular_model.R file. This is a script that must be run separately in RStudio as part of the cycle_analysis.ipynb workflow (see notebook for details). This takes ~5 min to run.

There are also the following folders:

  • count_matrices: Raw count matrices for each library as output of the PETRI-seq processing pipeline (https://tavazoielab.c2b2.columbia.edu/PETRI-seq/).
  • outputs: Output files from the scripts above.
  • samples: Actual files used within the analysis presented for the manuscript. These are frequently used as inputs in the Jupyter notebooks to aid comparison to manuscript figures.
  • reference: Gene annotation files.

In order to make the analysis work, you must run the initial steps of the initial_processing.ipynb notebook to generate the AnnData file (*_adata.h5ad). This file was too large to upload directly to Github (>500 MB), so you'll need to generate it yourself. Please contact the lab if you have any issues doing so. Once you have done this, you should be able to run the notebooks individually without issues.

Considerations for running this analysis on new data

We have not yet fully automated the analysis pipeline. As such, there are a number of parameters that may need to be varied in this workflow for working with new data:

  • scVI hyperparameters: We stick with the same ones throughout but it should be considered whether these are right for your dataset or a further hyperparameter search is warranted.
  • UMAP chromosome bin size (see cycle_analysis.ipynb): The size of chromosome bins used may need to be varied based on species and data quality. We have done 100 kb for E. coli and 50 kb for *S. aureus, which given the differing genome sizes is ~50 bins for each chromosome. For lower quality datasets, the bin size may need to increase.
  • Angle orientation: As explained in the cycle_analysis.ipynb, the initial directionality of the cell angles/gene angles is arbitrary and may need to be reversed. See notebook for details.
  • Chain selection in Rstan model fit: The cyclical regression model tends to hit a lot of local minima. However, we run eight chains (eight independent fits) with the HMC sampling, allowing us to clearly identify the chains that fit correctly. See origin_angle_circular_model.R for details.

trips's People

Contributors

andrewpountain avatar

Stargazers

Hang Qiao avatar Seyoon Lee avatar Arya avatar  avatar

Watchers

 avatar Leon Anavy avatar

Forkers

nahid18

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.