leofountain / weaver Goto Github PK
View Code? Open in Web Editor NEWAllele-Specific Quantification of Structural Variations in Cancer Genomes
Allele-Specific Quantification of Structural Variations in Cancer Genomes
Weaver Allele specific base-pair resolution quantification of Strcutrual variations in cancer genome [email protected] [email protected] Version 0.20 ---------------------------- INSTALL ---------------------------- Bamtools (https://github.com/pezmaster31/bamtools) libraries are needed included in Weaver_SV/lib and Weaver_SV/inc export LD_LIBRARY_PATH=<PREFIX>/Weaver/Weaver_SV/lib/:$LD_LIBRARY_PATH libz required //-lz flag Parallel::ForkManager (http://search.cpan.org/~szabgab/Parallel-ForkManager-1.06/lib/Parallel/ForkManager.pm) perl package is needed Bedtools (https://github.com/arq5x/bedtools) Samtools (http://samtools.sourceforge.net/) BOOST C++ library (http://www.boost.org/) BWA (http://bio-bwa.sourceforge.net/) Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) 1 Modify the required BOOST directory in src/Makefile 2 ./INSTALL.sh ----------------------------- DATA ----------------------------- wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz ----------------------------- EXAMPLE DATA ----------------------------- wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_example.tar.gz RUN: Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 solo_ploidy TARGET 2 Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0 ---------------------------- Weaver_SV.pl ---------------------------- SV finding Input: BAM file from BWA Output: VCF file for SV ---------------------------- Weaver_pipeline.pl ---------------------------- Master program: 1 Generate SV 2 Generate other inputs needed for Weaver INPUTS DATA package: 1000 Genomes Project Phase 1 haplotypes ---------------------------- Weaver ---------------------------- Core PGM program INPUTS: 1 SV Outputs: 1 Purity and haploid-level sequencing coverage 2 Allele specific copy number of genomic regions 3 Allele specific copy number of structural variations 4 Relative timing of structural variations 5 Cancer scaffolds 5 Phasing of germline SNPs in CNV regions ---------------------------- Weaver_lite ---------------------------- Core PGM program, with SNP phasing disabled to speed up INPUTS: 1 SV 2 reference 3 Mappability (available for hg19) 4 Region (available for hg19) 5 wig (from bam) ---------------------------- Weaver PLOIDY ---------------------------- Weaver PLOIDY -f -S -s ../SNP_dens -g GAP_20140416_num -w -r 1 -m -p 16 INPUTS: -f reference file (fasta), should match the reference used in original bam file. Especially for most TCGA datasets, the alignment was performed on //www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta, which does not have "chr" prefix [MANDATORY] -S SV file, with format consistent with Weaver_SV. [MANDATORY] -s SNP file, with ref and alt mappings [MANDATORY] -w wig file from bam, storing the coverage information [MANDATORY] -r 1, if first time running (generating temp files); 0 if want to use existing temp files. [default 1] -m mappability file, download from http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz [MANDATORY] -p number of cores [default 1] ----------------------------- FILE FORMAT DECLARITIONS ----------------------------- Wiggle file: Wiggle file need to be declared with fixedStep, step 1 and span 1 fixedStep chrom=chr1 start=9994 step=1 span=1 if a chromosome has multiple declaration lines, they need to be sorted based on position: fixedStep chrom=chr1 start=9994 step=1 span=1 X X X fixedStep chrom=chr1 start=100 step=1 span=1 X X X Is not allowed Bam file: Must be sorted and indexed. SNP file: NGS SNP link file 1KGP SNP link SV: Genome region file: GAP regions in assembly are annotated. ################### Output: ################### REGION_CN_PHASE: storing phased allele specific copy number of genome CHR BEGIN END ALLELE_1_CN ALLELE_2_CN SV_CN_PHASE: Structural variation copy number and phasing, catagory CHR_1 POS_1 ORI_1 ALLELE_ CHR_2 POS_2 ORI_2 ALLELE_ CN germline/somatic_post_aneuploidy/somatic_pre_aneuploidy ############### CONTACT ############### Yang Li Ma Lab Bioengineering Dept., University of Illinois at Urbana-Champaign [email protected] https://github.com/leofountain/Weaver
I have installed weaver from github to ~/weaver and unpacked Weaver_data.tar.gz and Weaver_example.tar.gz to the Weaver_data and Weaver_example sub-directories respectively.
I have fixed the broken symlinks in ~/weaver/data
I am attempting to run the example. When I rerun Weaver_example/cmd I get the following output:
$ ../bin/bam2bw.pl lite X.bam 64 M
$ ~/weaver/bin/Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
RUN MODE PLOIDY
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
Estimated cancer haplotype coverage: 0
Estimated normal haplotype coverage: 0
$ ~/weaver/bin/Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
RUN MODE LITE
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
TUMOR coverage was set to 20.
NORMAL was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
base_mean = 20
best_norm = 0
LBP scan
LBP
LBP init
LBP print
LBP scan
$ ls -l
total 1231020
-rw-r----- 1 cameron.d allstaff 405 Oct 30 2014 cmd
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 EACH_REGION
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 EACH_REGION_1
drwxr-x--- 2 cameron.d allstaff 9 Oct 30 2014 INPUT
drwxr-x--- 2 cameron.d allstaff 11 Nov 4 2014 OUTPUT
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 REGION_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 SNP_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 SV_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 SV_REMOVED
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 SV_SELECTED
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 TARGET
-rw-r--r-- 1 cameron.d allstaff 0 May 22 15:42 tempfile
-rw-r----- 1 cameron.d allstaff 1097298665 Oct 30 2014 X.bam
-rw-r----- 1 cameron.d allstaff 161816 Oct 30 2014 X.bam.bai
-rw-r--r-- 1 cameron.d allstaff 28 May 22 15:19 X.bam.G1
-rw-r--r-- 1 cameron.d allstaff 28 May 22 15:19 X.bam.G2
-rw-r--r-- 1 cameron.d allstaff 162603325 May 22 15:24 X.bam.wig
Note that the files are almost all empty files and are located in the ~/weaver/Weaver_example directory, not the ~/weaver/Weaver_example/OUTPUT subdirectory.
What else is required to reproduce the example output for the supplied example?
I am attempting to run weaver and, for the programs I have attempted to run, neither the online github documentation, nor the program usage help match the actual command-line arguments required. To run these programs, I have had to look and the source code for the relevant .pl to work out how to run it.
For example:
$RUN_TYPE
and $MODEFLAG
variables are not mentioned anywhere in the documentation.$FA
occurs before $FULLFA
is set. Is this parameter intended to allow for the bwa & bowtie indexes to not require colocation with the reference fasta?Does there exist an all-in-one wrapper that can go from bam to weaver output for hg19 and/or hg38?
In the Weaver LITE last step (taken up from Weaver github)
Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
What are -t and -n values?
I am running Weaver for whole exome sequencing data and there is only one sample and no control, only single sample. What are the values that need to be entered in -t and -n option and how can I get these values?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.