Giter Club home page Giter Club logo

hlaforest's Introduction

===TABLE OF CONTENTS===
1. Installation
    1.0 Prerequisites
    1.1 Set up the environment
        1.1.1 config.sh
        1.1.2 CallSimulation.sh
        1.1.3 CallHaplotypesPE.sh
    1.2 Testing
        1.1.1 Testing haplotype calling pipeline
        1.1.2 Testing simulation
2. Running
    2.1 Running HLAforest on RNA-seq data
    2.2 Running HLAforest simulations
3. Utilities
4. FAQs


=== 1. Installation ===
== 1.0 Prerequisites ==
HLAforest depends upon Bioperl, bowtie and (for simulations) Math::Random). Instructions for installing bioperl can be found on the web (http://www.bioperl.org/wiki/Installing_BioPerl). Precompiled bowtie binaries are available 
via their website (http://www.bioperl.org/wiki/Installing_BioPerl). Math::Random can be installed via CPAN. For instructions on installing modules from cpan, visit their official website (http://www.cpan.org/modules/INSTALL.html). If you do not intend to run simulations, it is not necessary to install Math::Random

Once you have installed all the prerequisites, you should add bowtie to your PATH. In bash you can do so with the following command:

%> export PATH=/path/to/bowtie/directory:$PATH

== 1.1 Set up the environment == 
In order to get HLAforest running, you will have to modify a few of the scripts to reflect your local environment. All these scripts can be found in the scripts directory.

= 1.1.1 config.sh = 
(Required)
You must modify the HLAFOREST_HOME variable to reflect the directory it resides on your local system.

(Optional)
You can modify NUM_THREADS variable to reflect the number of processors avaiable on your system.

= 1.1.2 CallSimulation.sh = 
(Required)
You must modfiy the CONFIG_PATH variable to reflect the path of the config.sh file that you previously modified

= 1.1.3 CallHaplotypesPE.sh = 
(Required)
You must modfiy the CONFIG_PATH variable to reflect the path of the config.sh file that you previously modified

= 1.1.4 Adding HLAforest to your PATH =
You can add the scripts directory to your PATH by issuing the following command (and by modifying to reflect your local installation). 

%> export PATH=/path/to/your/hlaforest/scripts:$PATH

== 1.2 Testing ==
You can test your installation of HLA forest by running the following two tests.

= 1.1.1 Testing haplotype calling pipeline = 
After you have set up your environment and modified the source files to reflect your local environment, you can test the installation by calling HLA haplotypes on a selected subset of RNA-seq data (gm12878).

From your HLAforest home directory, issue the following command:

%> CallHaplotypesPE.sh test2/ test2/gm12878_short_1.fastq test2/gm12878_short_2.fastq

After the script has completed, you should see the file test2/haplotypes.txt which contains your predicted haplotypes.

= 1.1.2 Testing simulation = 
Alternatively, you can test your installation by running a simulation. Simulations are run with the following syntax:

%> ~/hla/hlaforest/scripts/CallSimulation.sh <OUTDIR> <READ_LENGTH> <NUM_READS> <INSERT_SIZE> <ERROR_RATE>

For example, you can generate a simulation with 100 2x100bp long reads with a 250 insert size and 0% substitution rate with the following command:

%> ~/hla/hlaforest/scripts/CallSimulation.sh sim_2x100_100reads_250insert_0subrate 100 100 250 0

Doing so will generate a new folder called sim_2x100_100reads_250insert_0subrate which contains the simulated haplotypes (sim_chosen_haplotypes.txt), predicted haplotypes (haplotypes.txt) and a file that compares the predictions against the true haplotypes (sim-score.txt)

=== 2. Running ===

== 2.1 Running HLAforest on RNA-seq data ==
You can run HLAforest by issuing the following command. 

%> CallHaplotypesPE.sh <output_directory> </path/to/read1.fastq> </path/to/read2.fastq>

You can input multiple fastq files as long as they are comma delimited. for example:

%> CallHaplotypesPE.sh out_dir read_R1_001.fastq,read_R1_002.fastq read_R2_001.fastq,read_R2_002.fastq

After the script has completed, you should see the file outdir/haplotypes.txt which contains your predicted haplotypes.


== 2.2 Running HLAforest simulations ==

%> ~/hla/hlaforest/scripts/CallSimulation.sh <OUTDIR> <READ_LENGTH> <NUM_READS> <INSERT_SIZE> <ERROR_RATE>

=== 3. Utilities ===

=== 4. FAQs ===
Q: Why is your program such a ram eater?
A: HLAforest is implemented in perl and stores a lot of meta data for each read tree. Each read tree contains the original alignment along with other deprecated fields. Future implementations will reduce the metadata associated with each tree, thus reducing the memory footprint. Second is perl itself. Once it grabs a hold of memory, it will never let it go until it runs out of ram.


hlaforest's People

Contributors

hyjkim avatar

Stargazers

Pranav Swaroop Gundla avatar  avatar

Watchers

James Cloos avatar

Forkers

eclipsezhao

hlaforest's Issues

Running example dataset

Hi there,
When running the example dataset, I run into two errors,

DEBUG in SamReader.pm getNextAlignmentSet: No more alignments found, returning undef

follow by,
Can't use an undefined value as an ARRAY reference at /hlaforest/scripts/call-haplotypes.pl line 140.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.