fnaveed786 / hlaforest Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/hlaforest
Automatically exported from code.google.com/p/hlaforest
===TABLE OF CONTENTS=== 1. Installation 1.0 Prerequisites 1.1 Set up the environment 1.1.1 config.sh 1.1.2 CallSimulation.sh 1.1.3 CallHaplotypesPE.sh 1.2 Testing 1.1.1 Testing haplotype calling pipeline 1.1.2 Testing simulation 2. Running 2.1 Running HLAforest on RNA-seq data 2.2 Running HLAforest simulations 3. Utilities 4. FAQs === 1. Installation === == 1.0 Prerequisites == HLAforest depends upon Bioperl, bowtie and (for simulations) Math::Random). Instructions for installing bioperl can be found on the web (http://www.bioperl.org/wiki/Installing_BioPerl). Precompiled bowtie binaries are available via their website (http://www.bioperl.org/wiki/Installing_BioPerl). Math::Random can be installed via CPAN. For instructions on installing modules from cpan, visit their official website (http://www.cpan.org/modules/INSTALL.html). If you do not intend to run simulations, it is not necessary to install Math::Random Once you have installed all the prerequisites, you should add bowtie to your PATH. In bash you can do so with the following command: %> export PATH=/path/to/bowtie/directory:$PATH == 1.1 Set up the environment == In order to get HLAforest running, you will have to modify a few of the scripts to reflect your local environment. All these scripts can be found in the scripts directory. = 1.1.1 config.sh = (Required) You must modify the HLAFOREST_HOME variable to reflect the directory it resides on your local system. (Optional) You can modify NUM_THREADS variable to reflect the number of processors avaiable on your system. = 1.1.2 CallSimulation.sh = (Required) You must modfiy the CONFIG_PATH variable to reflect the path of the config.sh file that you previously modified = 1.1.3 CallHaplotypesPE.sh = (Required) You must modfiy the CONFIG_PATH variable to reflect the path of the config.sh file that you previously modified = 1.1.4 Adding HLAforest to your PATH = You can add the scripts directory to your PATH by issuing the following command (and by modifying to reflect your local installation). %> export PATH=/path/to/your/hlaforest/scripts:$PATH == 1.2 Testing == You can test your installation of HLA forest by running the following two tests. = 1.1.1 Testing haplotype calling pipeline = After you have set up your environment and modified the source files to reflect your local environment, you can test the installation by calling HLA haplotypes on a selected subset of RNA-seq data (gm12878). From your HLAforest home directory, issue the following command: %> CallHaplotypesPE.sh test2/ test2/gm12878_short_1.fastq test2/gm12878_short_2.fastq After the script has completed, you should see the file test2/haplotypes.txt which contains your predicted haplotypes. = 1.1.2 Testing simulation = Alternatively, you can test your installation by running a simulation. Simulations are run with the following syntax: %> ~/hla/hlaforest/scripts/CallSimulation.sh <OUTDIR> <READ_LENGTH> <NUM_READS> <INSERT_SIZE> <ERROR_RATE> For example, you can generate a simulation with 100 2x100bp long reads with a 250 insert size and 0% substitution rate with the following command: %> ~/hla/hlaforest/scripts/CallSimulation.sh sim_2x100_100reads_250insert_0subrate 100 100 250 0 Doing so will generate a new folder called sim_2x100_100reads_250insert_0subrate which contains the simulated haplotypes (sim_chosen_haplotypes.txt), predicted haplotypes (haplotypes.txt) and a file that compares the predictions against the true haplotypes (sim-score.txt) === 2. Running === == 2.1 Running HLAforest on RNA-seq data == You can run HLAforest by issuing the following command. %> CallHaplotypesPE.sh <output_directory> </path/to/read1.fastq> </path/to/read2.fastq> You can input multiple fastq files as long as they are comma delimited. for example: %> CallHaplotypesPE.sh out_dir read_R1_001.fastq,read_R1_002.fastq read_R2_001.fastq,read_R2_002.fastq After the script has completed, you should see the file outdir/haplotypes.txt which contains your predicted haplotypes. == 2.2 Running HLAforest simulations == %> ~/hla/hlaforest/scripts/CallSimulation.sh <OUTDIR> <READ_LENGTH> <NUM_READS> <INSERT_SIZE> <ERROR_RATE> === 3. Utilities === === 4. FAQs === Q: Why is your program such a ram eater? A: HLAforest is implemented in perl and stores a lot of meta data for each read tree. Each read tree contains the original alignment along with other deprecated fields. Future implementations will reduce the metadata associated with each tree, thus reducing the memory footprint. Second is perl itself. Once it grabs a hold of memory, it will never let it go until it runs out of ram.
Hi there,
When running the example dataset, I run into two errors,
DEBUG in SamReader.pm getNextAlignmentSet: No more alignments found, returning undef
follow by,
Can't use an undefined value as an ARRAY reference at /hlaforest/scripts/call-haplotypes.pl line 140.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.