Giter Club home page Giter Club logo

maverick's Introduction

DOI

Maverick: Mendelian variant pathogenicity prediction

Maverick is a Mendelian approach to variant effect prediction built in keras. It leverages transformers to process a multi-modal set of inputs in order to predict whether a variant is benign, dominant pathogenic, or recessive pathogenic.

This repository contains scripts to run inference with Maverick on VCF files aligned to GRCh37 or GRCh38 in the "InferenceScripts" directory. This option would be best if you are setting up an installation of Maverick on a local workstation or cloud platform. See the INSTALL file to get started. For a less resource-intensive experience, try our Maverick inference CoLab. The CoLab will allow you to upload a VCF file and process it with Maverick right in your web browser for free. Currently, that notebook works well using a CPU or TPU backend, but may require some troubleshooting of the CUDA/cuDNN/Tensorflow versions in order to use the GPU backend.

This repository additionally contains python notebooks in the "Notebooks" directory demonstrating 1) how the training and testing sets used in the Maverick paper were generated, 2) how Maverick was trained, and 3) how to run score the variants in a VCF file with Maverick. Each of these notebooks are additionally available as Google CoLabs: Generate Training and Test Sets, Train Maverick, and Maverick inference.

The manuscript associated with Maverick is currently in submission for publication. This page will be updated with citation information when available. The preprint is available at: https://doi.org/10.21203/rs.3.rs-1602211/v1

License

Maverick source code is provided under the MIT open-source license.

Download

The latest version of Maverick can be downloaded under Releases.

We have pre-computed Maverick scores for all possible autosomal missense and nonsense SNVs in the Gencode Basic V33 annotation of GRCh37. Several versions of those scores are avialable for download:

GRCh37

GRCh37 with scores from individual sub-models

Lifted over to GRCh38

Lifted over to GRCh38 with scores from individual sub-models

The datasets used in the manuscript to train and evaluate Maverick are also available for download:

Training Set

Validation Set

Known Genes Test Set

Novel Genes Test Set

Basic Usage

After following the steps in the INSTALL file, Maverick can be run as follows:

Maverick/InferenceScripts/runMaverick.sh Maverick/example/example.vcf

A lite version of Maverick is also available which only runs Architecture 1 Model 1 and has much more forgiving hardware requirements at the cost of a small impact to accuracy:

Maverick/InferenceScripts/runMaverickLite.sh Maverick/example/example.vcf

Output

Running Maverick on the example.vcf file as above will produce two primary output files called example.MaverickResults.txt and example.finalScores.txt. The MaverickResults.txt file contains all the annotations for each variant and outputs the Benign, Dominant, and Recessive scores predicted by each of the eight individual models, as well as the Maverick ensemble. The finalResults.txt file then sorts those scored variants based on Maverick's prediction of their pathogenicity and their genotype as given in the input vcf file. Heterozygous variants are sorted by their Dominant score, homozygous variants are sorted by their recessive score, and compound heterozygous pairs are created for each pair of heterozygous variants on the same gene and those are sorted by the harmonic mean of their recessive scores. The file is sorted by the 'finalScore' column in descending order. This manner of ordering variants within a sample is how Maverick's prioritization capabilities were evaluated in the manuscript.

maverick's People

Contributors

mdanzi avatar zuchnerlab avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.