Giter Club home page Giter Club logo

fingerprint_maps's Introduction

build_fingerprint_maps

build_fingerprint_maps is a tool for building haplotype maps for use with the Picard-Tools(http://broadinstitute.github.io/picard/) fingerprinting software CrosscheckFingerprints. A haplotype map is a collection of "blocks" of SNPs which are in tight linkage with SNPs of the same block and low linkage with SNPs of different blocks.

In order to download build_fingerprint_maps, you should clone this repository via the command

git clone https://github.com/naumanjaved/fingerprint_maps.git

Precomputed map files with headers(headers do not contain any entries for scaffolds or contigs)

The map_files directory also contains pre-computed maps with relaxed intra- and inter- block correlation thresholds. Map names contain the parameters used.

Dependencies

In order to run build_fingerprint_maps, you must have working installations of:

  1. Python (>=2.7)

  2. PLINK2

  3. VCFTools

  4. Anaconda or the following modules: a. subprocess b. os c. itertools d. numpy e. sys f. argparse g. traceback h. time i. datetime

  5. LDSC(LDScore regression)

Required Files

Fingerprint maps uses VCFs from 1000 Genomes Phase 3 and recombination maps(SHAPEIT format). These can be found here:

See run.sh to see a sample run script. Run python build_fingerprint_maps.py -h to see a list of command line options.

Use with Picardtools

The above maps are to be used

For most cases where each file you want to compare with CrosscheckFingerprints contains data for only a single Fingerprint, you should run Crosscheck with the CROSSCHECK_BY FILE flag enabled. Picard with default settings can be strict about properly formatted headers and read names, so if a validation error arises, try running with the VALIDATION_STRINGENCY flag set to LENIENT (of course after ensuring that the formatting error does not indicate a legitimate problem with the input bam file).

When comparing many files, it is recommended to upfront precompute VCFs containing extracted fingerprints using the ExtractFingerprint tool in the Picard suite. This will avoid CrosscheckFingerprints having to redundantly compute fingerprints for the same file each time it is used for a comparison.

Custom map files

If you create a custom map file, make sure to append the appropriate header file to the map file. Below there are some headers for hg19 and hg38 with entries for reference chromosomes.

Support

Email [email protected] for issues.

Authors

Nauman Javed(Broad Institute) wrote the above scripts to generate fingerprint maps. Yossi Farjoun(Broad Institute) wrote CrosscheckFingerprints and ExtractFingerprints for which the above maps are inputs.

Citation

If you use the above tool/maps with CrosscheckFingerprints in your publication please cite the Picard-tools repo as well as the paper Javed, N., Farjoun, Y., Fennell, T.J. et al. Detecting sample swaps in diverse NGS data types using linkage disequilibrium. Nat Commun 11, 3697 (2020). DOI: https://doi.org/10.1038/s41467-020-17453-5

fingerprint_maps's People

Contributors

naumanjaved avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.