Giter Club home page Giter Club logo

clingen2016's Introduction

Interpreting the pathogenicity of coding variants using the ExAC database

Fundamentals of Clinical Genetics

Wellcome Trust Genome Campus, January 2016

Tarjinder Singh, Jeffrey C. Barrett

Session description

Interpreting genetic variation in an individual patient’s genome can only be done in the context of variation in the wider population. Many databases now exist with variation data from thousands of healthy individuals. This session will demonstrate one of the most valuable, the Exome Aggregation Consortium (or ExAC) database of protein-coding variation in 60,000 individuals. Typical use cases will be illustrated, including demonstration of the ExAC website interface. We will also highlight other resources, such as 1000 Genomes and ENSEMBL.

Interpreting the function of coding variants

Which of many genetic variants in an individual are functional or likely pathogenic?

We can use:

  1. coding consequences

    • synonymous, missense, loss-of-function (LoF)
  2. gene function

    • e.g. a LoF variant in ARID1B more likely to be pathogenic than a LoF variant in OR2T1
  3. allelic frequency in the general population, as a proxy for selective pressure

Exome Aggregation Consortium (ExAC) - the largest public database of genetic variation to date

summary

acknowledgements

  • describes the functional consequence and allele frequency of each observed coding variant in 60,706 individuals (as of January 2016, v0.3)

  • over 10 million variants: one variant every 6 base pairs; most are rare and novel

http://exac.broadinstitute.org/

How I can use the ExAC database?

  • Browse high-quality genetic variants in individual transcripts, genes, and genomic regions

  • Identify the functional consequence, allele frequency, and quality of an individual variant

  • Find differences in allele frequency of a single variant between global populations (African, American, Non-Finnish Europeans, Finnish Europeans, East Asians, South Asians)

  • Annotate the variants identified in a patient to prioritise likely pathogenic variants

Genes likely intolerant of damaging mutations

  • calculated from how depleted the gene is of damaging variants compared to expectation given the gene's mutation rate

  • measured by pLI

    • a score from 0 - 1
    • genes with pLI > 0.9 described as under genic constraint
    • a proxy for if a single copy loss of a gene is selected against in the population
  • CHD8 has a pLI of 1, and when disrupted, is highly penetrant for developmental disorders

  • OR2T1 has a pLI of 0, is an olfactory receptor, and a single-copy loss is unlikely to cause a severe phenotype

How to use ExAC

  • for individual queries, access online browser at http://exac.broadinstitute.org/

  • first, type in:

    • gene symbols (e.g. PCSK9)
    • Ensembl or RefSeq transcript IDs (e.g. ENST00000407236)
    • rs IDs (rs1800234)
    • variant positions (22-46615880)
    • region of interest (22:46615715-46615880)

input_box

  • in the gene, transcript, and region view, we see:

Top left: gene name, number of variants, and link to other online resources and references

gene name

Top right: observed and expected number of variants of each functional class, and the pLI score

constraint

Middle: Exonic coverage for gene or transcript (proxy for regional quality)

coverage

Below: Table of all variants observed in this gene

variants

  • for each variant, the chromosome, position, consequence, annotation, allele frequency is provided

  • can filter by consequence (Missense + LoF, or LoF)

filter

  • in the variant view, we see:

Top left: ID, frequency, and link to other online resources

variant

Top right: quality metrics

gq

Middle left: Functional consequence, and link to gene and transcript

annotation

Middle right: Frequencies in different global populations

frequencies

Bottom: Read-level data, for a low-level evaluation of quality

reads

Quick examples

  • rs334, the causal variant in sickle cell anemia

    • note the differences in allele frequency between populations
  • p.Phe508del in CFTR, the causal variant in cystic fibrosis

    • again, note the differences in allele frequency
    • because cystic fibrosis is recessive (no dominant mode), the pLI for CFTR is 0
  • KMT2A, a gene when disrupted causes Wiedenmann-Steiner syndrome

    • there are only 4 LoF variants in >60,000 individuals
    • highly constrained gene (pLI = 1)

Demo

For the following genes (TP53, ARID1B, NOD2, NRXN1):

  1. Determine the number of LoF variants in the canonical transcript

  2. Find the pLI score and determine if the gene is constrained

  3. Find the number of transcripts

  4. For the first missense variant in the gene, find the allele frequency in Non-Finnish Europeans

  5. Identify any exons not well-covered by the exome capture technology

Ensembl VEP for annotating large numbers of variants

  • we can use the Ensembl VEP tool to annotate a large number of variants (for example, all variants in a single patient)

  • can use a number of input formats, but the most common is the VCF format

http://www.ensembl.org/common/Tools/VEP

Demo

  1. For variants encoded in GRCh37, go here

  2. Paste the following into "Either paste data:":

    12 49416554 . G GA . . .
    18 53070914 . G A . . .
    22 36142530 . AAGCGGCTGC A . . .

    ensembl_input

    Alternatively, you can upload a VCF to the website.

  3. Under Identifiers and frequency data, go to Frequency data for co-located variants, and select ExAC allele frequencies

    select_exac

  4. Click Run and wait. Click view results.

Answer the following questions:

  • What are the allele frequencies of each variant in ExAC?

  • If the three variants are observed in the same patient, which variant is most likely to be diagnostic? Use constraint scores on the ExAC website as support.

Things to be aware of

  • ExAC is not a collection of phenotypically healthy individuals and includes individuals with schizophrenia, inflammatory bowel disease, diabetes etc. (see here for more information)

    • for rare or severe developmental disorders, this should be less of an issue
  • some regions without genetic variants are simply not well covered in earlier exome captures, so note the coverage!

  • ExAC is still in development, so changes in some variants might be observed

More information

clingen2016's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.