Giter Club home page Giter Club logo

hahap's Introduction

Build Status

HAHap: A read-based haplotyping method using hierarchical assembly

About

HAHap is a method to infer haplotypes using sequencing data. It attempts to eliminate the influence of noises through the process of assembly, though it remains the spirit of minimum error correction in certain conditions. We developed an adjusted multinomial probabilistic metric for evaluating the reliability of a variant pair, and the derived scores guide the assembly process.

HAHap takes BAM files as the input, and was validated using the short reads from the Illumina HiSeq platform.

Required

HAHap is a pure-python program. It requires the following packages.

  • Python 3.x
  • Numpy (version >= 1.10)
  • Pysam (version >= 0.12)

Usage

Git clone and execute bin/HAHap.

git clone https://github.com/ifishlin/HAHap
cd HAHap/bin
python HAHap phase vcf bam out
usage: python HAHap phase [--mms MMS] [--lct LCT] [--minj MINJ] [--pl PL] VCF BAM OUT

positional arguments
VCF          VCF file with heterozygous variants needed to be phased
BAM          Read mapped file
OUT          VCF file with predicted haplotype. (HP tags)

optional arguments:
--mms            Minimum read mapping quality (default:0)
--lct            Threshold of low-coverage pairs (int, default:median)
--minj           Minimum junctions number (default:4)
--pl             The likelihood of P1 and P2 (default:0.49)        

Data (Ashkenazim family)

The answer set used in the real-data experiment was created by taking the intersection between (1) and (2)

  • the haplotype prediction of 10xGenomics (ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/10XGenomics_ChromiumGenome_LongRanger2.1_09302016/README) and
  • the variant calling result of the read sets (ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/README) (caller, GATK HaplotypeCaller 3.6).
  • The input vcf files of the real-data experiment are located at /data
  • The BAM file used in the real-data experiment is located at Base_URL: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/AshkenazimTrio
    • Base_URL/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/HG002.hs37d5.2x250.bam
    • Base_URL/HG003_NA24149_father/NIST_Illumina_2x250bps/novoalign_bams/HG003.hs37d5.2x250.bam
    • Base_URL/HG004_NA24143_mother/NIST_Illumina_2x250bps/novoalign_bams/HG004.hs37d5.2x250.bam

Authors

Yu-Yu Lin, Pei-Lung Chen, Yen-Jen Oyang and Chien-Yu Chen. National Taiwan University, Taiwan.

hahap's People

Contributors

ifishlin avatar

Stargazers

 avatar

Watchers

 avatar  avatar

hahap's Issues

Error in running the test file given with this tool

This was the command line used for running the HAHap tool:-
python HAHap phase HG002_heter.vcf HG002.hs37d5.2x250.bam phased.vcf

This is the error i got
=== Build Connected Component ===
Traceback (most recent call last):
File "HAHap", line 9, in
main()
File "/home/dhwani/Documents/softwares/HAHap/HAHap/main.py", line 73, in main
module.main(args)
File "/home/dhwani/Documents/softwares/HAHap/HAHap/phase.py", line 63, in main
connected_component = blocks_main(args, str(chrom), var_loc, timer)
File "/home/dhwani/Documents/softwares/HAHap/HAHap/blocks.py", line 136, in main
samfile = pysam.AlignmentFile(args.bam_file, "rb")
File "pysam/libcalignmentfile.pyx", line 444, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 629, in pysam.libcalignmentfile.AlignmentFile._open
File "pysam/libchtslib.pyx", line 364, in pysam.libchtslib.HTSFile.check_truncation
OSError: no BGZF EOF marker; file may be truncated

UnicodeDecodeError

=== Start HAHap phasing ===
Parameters: Minimum mapping quality = 0
Parameters: Threshold of low coverage = Median
Parameters: Minimum junction number = 4
Parameters: Likelihood of P1 and P2 = 0.49

=== Read Heterozygous Data ===
Traceback (most recent call last):
File "./bin/HAHap", line 9, in
main()
File "/home/dhwani/Documents/softwares/HAHap/HAHap/main.py", line 73, in main
module.main(args)
File "/home/dhwani/Documents/softwares/HAHap/HAHap/phase.py", line 56, in main
var_chrom_dict = split_vcf_by_chrom(args.variant_file)
File "/home/dhwani/Documents/softwares/HAHap/HAHap/vcf.py", line 42, in split_vcf_by_chrom
for line in variants_vcf:
File "/home/dhwani/miniconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.