Giter Club home page Giter Club logo

exomeseq's Introduction

Exomeseq

Analysing sequence data from whole exome sequencing or targeted gene panel sequencing.

Requirements

Installed software

* Ubuntu, or other compatible OS (tested with ubuntu 16.04)
* Python 3 (should be installed by default)
* Docker
* Git
* Unzip
* Bgzip / tabix
These can be installed by running the following commands:
$ apt update
$ apt install git unzip tabix docker.io python-all

Hardware requirements

* Storage: 256 GB
* Memory (RAM): 8 GB
* Processors (cores): 4

Setup

The package consists of several python scripts that will

1. Create all required folders and configuration files.
2. Move any provided FASTQ files to their appropriate location.
3. Prepare the configuration file with links to required reference files.
4. Create and run scripts for analysing each sample. 

Instructions:

0. It is recommended to use a service like 'screen' or 'tmux' to detach the process. The terminal can then be closed without interrupting the program.
1. Create a new project folder and move to this location
$ mkdir projectA
$ cd projectA
 
2. Copy all fastq files to the current folder. The name of the fastq files have to follow the following pattern: <sample_name>.1.fastq.gz
$ cp <fastq_file1/2> <sample_name>.1.fastq.gz
$ cp <fastq_file2/2> <sample_name>.2.fastq.gz

3. Copy the bed file containing the target regions into the current folder.
$ cp <target_regions>.bed .

4. Download the pipeline code from GitHub:
$ git clone https://github.com/si-medbif/exomeseq.git

The user can at this step choose one of two paths:

Fully automatic, no trimming of reads
5. Start the command to setup and run the pipeline 
$ exomeseq/runme.sh
Half-automatic, user decides to trim reads or not
5. Start the command to setup the pipeline 
$ exomeseq/runsetup.sh

6. Check the FastQC output (in "sample/FastQC_pre/") and then do either 6a or 6b:

6a. Run the pipeline without trimming reads  
$ exomeseq/run_analysis.sh

6b. Run the pipeline after trimming reads
$ exomeseq/run_qc_analysis.sh

The combined report will be in the current folder when finished.

$ ls -l full_report.txt

exomeseq's People

Contributors

haraldgrove avatar dummai avatar

Stargazers

Yuriy Babin avatar  avatar

Watchers

Bhoom Suktitipat avatar James Cloos avatar  avatar  avatar

Forkers

dummai

exomeseq's Issues

A bottleneck

Downloading "dbNSFPv3.5a.zip" took very long time. Unzipping this file also required a huge disk space.

screen or tmux

Since the pipeline will take several hours, 'screen' or 'tmux' should be used to run the pipeline in "detached" mode.

Missing BED file

If the pipeline is run without any *.bed, it will return an error, " 'Namespace' object has no attribute 'regions38' ".

The error may be bypassed by adding an empty *.bed file.
$ echo "" > sample_name.bed

Unzip required

Please add zip/unzip package to the requirement list to be installed before running the pipieline

Memory problem in DigitialOcean

The following error was encountered when the pipeline was run on DigitalOcean droplet (4vCPU, 8GB RAM, 160 GB HD):

There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 120061952 bytes for committing reserved memory.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006ea300000, 120061952, 0) failed; error='Cannot allocate memory' (errno=12)
An error report file with more information is saved as:
/usr/hs_err_pid1.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.