Giter Club home page Giter Club logo

boehmv / upf3 Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 5.42 MB

Code and scripts for the RNA-seq analysis of project: Deciphering the cellular roles and dissecting functional regions of UPF3A and UPF3B in human NMD

Home Page: http://www.uni-koeln.de/math-nat-fak/genetik/groups/Gehring/#home

License: MIT License

Shell 21.98% R 67.86% Python 10.16%
bioinformatics rna-seq r deseq2 alternative-splicing salmon gencode nmd isoform-switches leafcutter

upf3's Introduction

RNA-Seq analysis for:
UPF3A and UPF3B are redundant and modular activators of nonsense-mediated mRNA decay in human cells

This repository contains the code, scripts and log files for the RNA-Seq analysis of the project:
UPF3A and UPF3B are redundant and modular activators of nonsense-mediated mRNA decay in human cells
(available as bioRxiv preprint)

Preprint:
DOI

Graphical abstract

Scope

This repository primarily aims to provide transparent insight into the RNA-Seq analysis steps used in the study of UPF3A-UPF3B in NMD. ⚠️ NOTE: The complete pipeline is currently not optimized to run on different computing infrastructures in a standardized/portable manner. This mean that all required packages have to be installed manually and configured accordingly to reproduce the results.

Features / Requirements

  • Complete analysis of RNA-Seq data (ArrayExpress: E-MTAB-10711, E-MTAB-10716, E-MTAB-10718 and E-MTAB-11184; provided in FASTQ format; see here for ID cross-reference), mapped to Gencode v33 / GRCh38.primary_assembly supplemented with SIRVomeERCCome (from Lexogen; download) using STAR, followed by transcript quantification using Salmon in mapping-based mode with a decoy-aware transcriptome index, finished with analyses of differential gene expression (DGE) via DESeq2, differential transcript usage (DTU) via IsoformSwitchAnalyzeR, alternative splicing (AS) via LeafCutter and intron retention (IR) via IRFinder
  • The main Bash script CRSA_V006.sh runs the complete pipeline or individual modules using the options (see CRSA_V006.sh -h) and requires a design file specifying the following:
    • reference type (gencode.v33.SIRVomeERCCome was used in this study)
    • sequencing design (single- or paired-end reads)
    • study name
    • folder locations (srvdir for raw file locations, mydir for analyses output)
    • location of the experiment file which specifies sample IDs and condition
  • Please see the provided design.txt file example for more information concerning this design file. An example for the tab-delimited experiment.txt file is provided as well. Please see the comments in CRSA_V006.sh for further instructions
  • To run/reproduce the complete analysis script, please make sure you have the following tools installed and configured if required:
    • STAR - version 2.7.3a was used for the analyses - with genome indices generated using GRCh38.primary.SIRVomeERCCome.fa and gencode.v33.SIRVomeERCCome.annotation.gtf (both reference files can be found here). The following code was used for genome index generation:
    STAR   --runMode genomeGenerate   --runThreadN 15   --genomeDir /home/volker/reference/gencode.v33.SIRVomeERCCome   --genomeFastaFiles /home/volker/reference/Gencode/GRCh38.primary.SIRVomeERCCome.fa      --sjdbGTFfile /home/volker/reference/Gencode/gencode.v33.SIRVomeERCCome.annotation.gtf   --sjdbOverhang 99
    
    • Alfred - version v0.2.1 was used for the analyses
    • samtools - version 1.9 (using htslib 1.9) was used for the analyses
    • IGV tools - version 2.8.0 was used for the analyses - make sure you have the gencode.v33.SIRVomeERCCome.chrom.sizes file (can be found here) located in /PATH/TO/IGV/lib/genomes
    • Salmon - version v1.3.0 was used for the analyses - with an index generated using gentrome.v33.SIRV.ERCC.fa.gz and decoys.txt (can be found here). A separate conda environment was created for Salmon. The following code was used for index generation:
    salmon index -t /home/volker/reference/Gencode/gentrome.v33.SIRV.ERCC.fa.gz -d /home/volker/reference/Gencode/decoys.txt -p 12 -i /home/volker/reference/Transcriptome/gencode.v33.SIRVomeERCCome --gencode
    
    • DESeq2 - version 1.28.1 was used for the analyses - please see R_sessions for details on R version and other installed packages. The tx2gene file used for the analyses can be found here
    • IsoformSwitchAnalyzeR - version 1.10.0 was used for the analyses - please see R_sessions for details on R version and other installed packages
    • LeafCutter - version v0.2.7 was used for the analyses - please see R_sessions for details on R version and other installed packages. 📝 NOTE: small changes in the /scripts of LeafCutter maintained gene IDs from Gencode (changed in gtf_to_exons_vb.R and leafcutter_ds.R)
    • IRFinder - version 1.2.6 was used for the analyses - 📝 NOTE: IRFinder results were not used in the publication
    • FastQC - version 0.11.9 was used for the analyses
    • MultiQC - version v1.8 was used for the analyses
  • All analyses were performed on a 16-core (2x Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz) workstation with 128 GB RAM running Ubuntu 18.04.4 LTS
  • Please make sure to change installation and file paths in the respective scripts to match your local environment
  • ℹ️ Different package versions were previously used for the analyses in the bioRxiv preprint, this repository contains the revised scripts and information

Log files

Please see here to access the log files from the complete analysis (run between 16.11.2020 - 18.11.2021)

Quality control

Please see here to access the MultiQC HTML result file for all samples, summarizing the FastQC, Salmon and STAR output. Please download the HTML file and corresponding data to view the MultiQC report properly.

R_sessions

Please see here to access the session info for the individual R scripts

Individual scripts

The specialized scripts called by the main CRSA_V006.sh script can be found here.

Feedback / Questions

Feedback is welcome! For any question, please email: [email protected] or create an issue

Citation

bioRxiv preprint

Damaris Wallmeroth, Volker Boehm, Jan-Wilm Lackmann, Janine Altmüller, Christoph Dieterich and Niels H. Gehring (2021) UPF3A and UPF3B are redundant and modular activators of nonsense-mediated mRNA decay in human cells. bioRxiv 2021.07.07.451444; doi: https://doi.org/10.1101/2021.07.07.451444

upf3's People

Contributors

boehmv avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.