Giter Club home page Giter Club logo

ibd-pipeline's Introduction

Overview

from takir.org:

By first using family-based phasing to order the child’s allele assignments and to identify segments shared by both the child and a parent, this ensures that the DNA segment being compared to other profiles of unknown relatedness to the child is truly a segment that was inherited by the child from a parent (i.e., IBD). Algorithms provided by GEDmatch enables the general public to create phased genetic profiles and then to conduct IBD segment sharing matching with other users in their database. IBD segment sharing among two target child profiles and one of each of their parents (i.e., among two parent-child dyads) infers that the segment was inherited from a shared ancestor and that the two target persons (and the matching parents) are related.

IBD Calling Pipeline

  1. Receive input files - I'll receive input files from 23andMe, AncestryDNA, or the lab. I'm not sure how the lab will format the file, but they are most likely using the Illumina GenomeStudio software. GenomeStudio has a PLINK plugin so let's assume that it'll be in PLINK format and need to be converted to VCF with PLINK/SEQ. (not etched in stone). The lab is using the H3A chip.

  2. Convert files to VCF - Here is where we run into the issue of SNP selection. H3A has the larger SNP selection but it may miss some that's found in AncestryDNA or 23andMe. We would need to make use of all available markers for IBD later. And I don't impute. Also, we ran into an issue of formatting (for bcftools) that we corrected like this:

forfiles /p Data /m *.vcf.gz /c "cmd /c gzip -d @file"
forfiles /p Data /m *.vcf /c "cmd /c sed -i 's/GT,Number=.,Type=Integer/GT,Number=.,Type=String/g' @file" forfiles /p Data /m *.vcf /c "cmd /c bgzip @file"

  1. We then index and merge into single VCF file for use in beagle -

forfiles /p Data /m .vcf.gz /c "cmd /c bcftools.exe index @file" bcftools merge -O z -o Results\MergedSamples.vcf.gz Data/.vcf.gz

  1. Then we phase the samples using Beagle 4.0. - When I have the trios (father-mother-offspring) or duos (parent-offspring) samples, I prefer the option of family-based phasing using a PED file with family information. Beagle also provide the option of specifying a reference panel which I planned to use but I haven't yet. They also the option of using a MAP file which is important for IBD detection. They provide a link to down the MAP files. Without specifying a map file, command line is:

java -Xmx3g -jar beagle.r1399.jar gt=Results\MergedSamples.vcf.gz gprobs=false impute=false ped=StorageZipFiles\pedigree.ped ibd=true ibdlod=4.0 out=Results\PhasedB4ped.gt

  1. Beagle 4.0 has the ped argument for family-based phasing whereas the latest version of Beagle does not. Beagle 5.0 does computational phasing. Additionally, if using Beagle 5.0 (or now 5.1), IBD detection is done separately in Refined IBD. In Beagle 4.0, Refined IBD is included and used with the ped argument. I like to run both paths and compare the results of 4.0 and 5.0 though family-based phasing is most accurate.

java -Xmx3g -jar beagle.12Jul19.0df.jar gt=Results\MergedSamples.vcf.gz impute=false out=Results\PhasedB5.gt

java -Xmx3g -jar refined-ibd.16May19.ad5.jar gt=Results\PhasedB5.gt.vcf.gz lod=4.0 length=2.0 out=Results\IBDwB5

  1. Merge segments across gaps of 0.6 cMs or less. So if taking the IBD output from Beagle 4.0, the command line would be something like:

cat Results\PhasedB4ped.gt.ibd | java -jar merge-ibd-segments.16May19.ad5.jar Results\MergedSamples.vcf.gz map=plink.GRCh38.map\plink.GRCh38xX.map 0.6 1 > B4pedGapsFilled.ibd

  1. And finally, the relatedness calculation in python:

cat Results\PhasedB4ped.gt.ibd | python relatedness_v1.py plink.GRCh38.map\plink.GRCh38xX.map 2 vcfhead=simdata.vcf_header lower nofill > Results\PhasedB4ped.lower.relatedness.mincm2.nofillhead

  1. But I would like to see IBD results similar to GEDmatch with chromosome number, start position, stop position, segment length in cMs, and number of SNPs. And then total shared, and largest single segment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.