Giter Club home page Giter Club logo

hds-util's Introduction

hds-util

This is a post-processing tool for Minimac4 and Michigan Imputation Server (MIS). It can generate FORMAT fields from HDS, convert from the SAV file format to BCF or VCF, and paste together sample groups that were split due to MIS sample size limit.

Installation

# cget can be installed with `pip3 install --user cget`
cget install -f ./requirements.txt
mkdir build; cd build
cmake -DCMAKE_TOOLCHAIN_FILE=../cget/cget/cget.cmake -DCMAKE_BUILD_TYPE=Release ..
make
make install

Usage

# Generate GT and DS format fields and convert to BCF file format.
hds-util in.sav -f GT,GP -O bcf -o out.bcf

# Paste samples together and recompute estimatad r-square across all samples.
hds-util in1.sav in2.sav in3.sav > merged.sav

# Paste samples, genearte GT and DS while keeping HDS, and filter variants with R2<0.1.  
hds-util -f GT,DS,HDS -m 0.1 in1.sav in2.sav in3.sav > merged.sav

Pasting Samples

To impute datasets that exceed the MIS maximum sample size, array VCFs must be split into sample group files. The imputed sample groups can be paste together using hds-util. The site list for each sample group file must match, so the mininum r-square threshold must be disabled when submitting the imputation job. An r-square filter can be applied in hds-util with --min-r2 <threshold>.

Field Generation Formulas

Below are the formulas for calculating other FORMAT fields from HDS values where x is the first haplotype dosage and y is the second.

Diploid

DS = x+y
GT = round(x), round(y)
GP = (1-x)(1-y), x(1-y)+y(1-x), xy
SD = x(1-x)+y(1-y)

Haploid

DS = x
GT = round(x)
GP = 1-x, x
SD = x(1-x)

hds-util's People

Contributors

jonathonl avatar

Stargazers

 avatar Guillaume Butler-Laporte avatar

Watchers

Andy Boughton avatar  avatar Hyun Min Kang avatar  avatar Sean Caron avatar  avatar

hds-util's Issues

Different site lists in empirical dose files from MIS

Hi Jonathon,

When merging the dose files from MIS, the imputed site lists are guaranteed to be the same as long as no R2 filters are applied because MIS outputs even quasi-monomorphic variants. However, empirical dose files may be different. When we split large GWASs into two batches, we sometimes end up with a few typed variants which are monomorphic in one batch but not in another. MIS eliminates monomorphic typed variants and doesn't output them in the empirical dose files of one batch but not another. Thus, we ended up with different site lists and merging errors.

Would you happen to have any suggestions on how to overcome this (without redoing the imputation)? Can the empirical dose files still be merged? Given that these are typically just a few variants, do you think downstream MetaMinimac will complain if we remove them from the empirical dose files?

Thanks,
Daniel

invalid genotypes

Thank you for developing this tool, it will be quite handy for us.

In my merges I have been getting invalid genotypes (eg 0/-44) in addition a mixture of phased and unphased sites.

I imputed some publicly available HGDP samples on MIS to demonstrate this issue here

Do you advice on how to proceed? Hopefully I am not doing something silly. Thanks

Conflicting if-statement

Hi Jonathon,

I think this condition will be always true when merging empirical dose files, because format_fields vector is filled out with "DS" and "LDS" a few lines above:

if (!format_fields.empty() && emp_cnt)

Also, a related question: Is there anything specific about stats to be aware of when merging empirical doses?

Thanks and best wishes!
Daniel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.