Giter Club home page Giter Club logo

opplatek / fivepseq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lilit-nersisyan/fivepseq

0.0 1.0 0.0 2.34 MB

An application for analysis of 5′ endpoints distribution in RNA sequencing datasets. This is particularly useful for techniques that capture 5′ monophosphorylated RNAs, such as 5PSeq, PARE-seq or GMUC. It may also be useful for ribosome profiling datasets and alike.The main workflow of fivepseq is intended for downstream analysis of alignment files to describe the distribution of 5′ endpoints of reads relative to translation start and stop sites, as well as relative to amino acids or codons. It also computes frame preference of 5′ endpoint distribution and captures periodicity patterns.

Home Page: http://pelechanolab.com/software/fivepseq/

License: BSD 3-Clause "New" or "Revised" License

Python 93.79% Shell 5.28% HTML 0.92%

fivepseq's Introduction

Welcome to fivepseq readme!

Fivepseq is a software package for analysis of 5′ endpoints distribution in RNA degradome sequencing datasets.

Homepage

The homepage is hosted at Pelechano lab website at http://pelechanolab.com/software/fivepseq/.

User guide

Below is a quick manual to get you started. For detailed instructions and explanations on fivepseq output, please see the user guide at: https://fivepseq.readthedocs.io/en/latest/.

Citation

Nersisyan L, Ropat M, Pelechano V. Improved computational analysis of ribosome dynamics from 5′P degradome data using fivepseq. NAR Genomics and Bioinformatics, 2:4, 2020.

Installation

Install dependencies:

To set up fivepseq, the following python packages need to be pre-installed manually using pip (if you don't have pip you may install it as described here ).

Paste the following lines into the shell terminal:

pip install --upgrade numpy pysam cython
pip install plastid

Clone the project from github:

git clone https://github.com/lilit-nersisyan/fivepseq.git

Navigate into the fivepseq directory and install:

python setup.py install

To check if fivepseq was installed correctly, type the following in the command line:

fivepseq --version

This should display the currently installed version of fivepseq. To display commandline arguments you may type:

fivepseq --help

In order to enable exporting vector and portable image files, you'll also need to have phantomjs installed as follows:

conda install phantomjs selenium pillow

Running fivepseq

Fivepseq requires the following files to run:

Aligned reads (.bam)
Alignment index (.bai)
Genomic sequence file (.fasta / .fa)
Genomic annotation file (.gff/ .gtf)

This section assumes that you already have these files. If not, please, refer to the section: Preparing data.

Fivepseq usage

The fivepseq --help command will show fivepseq usage and will list all the arguments.

usage: fivepseq -b B -g G -a A [optional arguments]

Required arguments

-b B   the full path one or many bam/sam files (many files should be provided with a pattern, **within double quotes**: e.g. ["your_bam_folder/*.bam"])
-g G   the full path to the fa/fasta file
-a A   the full path to the gtf/gff/gff3 file

Note:

  • The indexed alignment files should be in the same directory as bam files, with the same name, with .bai extension added.
  • Multiple bam files should be indicated with a pattern placed within double quotes: e.g. ["your_bam_folder/*.bam"]

Commonly, you will run fivepseq by also providing the name of the output folder ('fivepseq' by default) and the title of your run (determined from bam path otherwise):

fivepseq \
   -g <path_to_genome_fasta> \
   -a <path_to_annotation> \
   -b <path_to_bam_file(s) \
   -o <output_directory> \
   -t <title_of_the_run>

Note: this is a single commandline, the backslashes are used to move to a new line for cozy representation: either copy-paste like this or use a single line without the backslashes.

Additional arguments

Type fivepseq --help to see the list of additional arguments. For a detailed description of available arguments, see the User guide at: https://fivepseq.readthedocs.io/en/latest/.

Preprocessing from FASTQ files

Fastq files need to be preprocessed and aligned to the reference genome before proceeding to fivepseq downstream analysis. Preprocessing proceeds with the following steps:

  • quality checks (with FASTQC and MULTIQC),
  • adapter and quality based trimming,
  • UMI extraction (if the library was generated with UMIs),
  • mapping to reference
  • read deduplication (if the library was generated with UMIs),
  • bedgraph generation to view 5'P count distribution in genome viewers

An example of pre-processing pipeline can be found in the preprocess_scripts directory

In order to run this pipeline, you need to have access to common bioinformatics software such as STAR, UMI-tools, bedtools, Samtools, FastQC, MultiQC and cutadapt.

To use it, navigate to the directory where the script is located and use the following command in the prompt:

./fivepseq_preprocess.sh -f [path to directory containing fastq files] -g [path to genome fasta] -a [path to annotation gff/gtf] -i [path to reference index, if exists] -o [output directory] -s [which steps to skip: either or combination of characters {cudqm} ]

The option -s specifies which steps of the pipeline you'd like to skip. Possible values are:

  • c skip trimming adapters with cutadapt
  • u skip UMI extraction
  • d skip deduplication after alignment
  • q skip quality initial check: FASTQC and MULTIQC
  • p skip post-processing quality check: FASTQC and MULTIQC
  • m skip mapping
  • d skip deduplication

You may use any combination of these characters, e.g. use -s cudqm to skip all

This script will produce sub-folders in the output directory, containing results of each step of the pipeline. The bam files will be generated in the align_dedup folder.

In the In addition to performing the steps described above, it also evaluates the distribution of reads across the genome, according to gene classes {"rRNA" "mRNA" "tRNA" "snoRNA" "snRNA" "ncRNA"}. These statistics are kept in the align_rna/rna_stats.txt file.

!!NOTE!! This example pipeline treats files as singl-end libraries. If you have paired-end reads, you should only supply the first read (*_R1* files) to fivepseq.

For UPPMAX users only

  • Install the latest stable version of fivepseq by
    • cd /proj/sllstore2017018/lilit/fivepseq_latest:
    • python setup.py install

Have fun!

fivepseq's People

Contributors

lilit-nersisyan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.