Giter Club home page Giter Club logo

graph-genomics-review's Introduction

Pangenome graphs

A review paper for Annual Reviews in Genomics and Human Genetics.

The final submitted version of the paper has been rendered and is provided in this repo.

notes

Work on github (Erik to make structure), use .bib for citations, use one line per sentence, first draft doesn’t have to compile.

Outline and division of effort

  • Introduction - Erik (sections/intro.tex)
    • Why we need pangenomic models
    • What is our motivation for thinking about pangenomic approaches?
      • Bias
      • Populations
      • Precision medicine
    • Perspective of interfaces (inputs and outputs)
    • Past reviews
  • Building pangenomic models (sections/models.tex)
    • Constructing graphs - Robin
    • Indexing and succinct genome graph models - Jouni / Erik?
    • Other population-ish succinct data structures - Erik / Jouni?
      • De bruijn
      • VCFs / genotype calls / haplotypes / binary matrices
      • Alignments / collections of strings
  • Relating new information to the model (sections/relating.tex)
    • Visualization - Adam
    • Finding structures in pangenome graphs - Jordan
    • Graph alignment algorithms - Jordan
    • Variation graph mappers - Xian
    • De Bruijn graph mappers - Robin
    • Non-graph population mapping tools - Erik
  • Applications of pangenomic models (sections/applications.tex)
    • Error correction - Robin
    • Variant calling / Genotyping - Glenn
    • Assembly - Erik
    • Epigenomics - Glenn
    • Transcriptomics - Jonas
    • Metagenomics and quasispecies - Jonas
  • Discussion - Benedict (sections/discussion.tex)

References

See bib/references.bib for a subset of the citations below in bibtex format. These were auto-generated. The rest may need to be manually introduced (e.g. from google scholar citations).

Introduction

...

Past reviews / Opinion pieces

Computational pan genomics (2016) https://doi.org/10.1093/bib/bbw089

Genome graphs and genome inference (2017) 10.1101/gr.214155.116

Is it time to change the reference genome? (2019) https://doi.org/10.1186/s13059-019-1774-4

Hackathon Paper (2019) http://dx.doi.org/10.12688/f1000research.19630.1

One reference genome is not enough (2019) http://dx.doi.org/10.1186/s13059-019-1717-0

Constructing graphs

Coordinates and intervals on genome graphs (preprint 2016) http://dx.doi.org/10.1101/063206

FORGe (2018) https://doi.org/10.1186/s13059-018-1595-x

NovoGraph (2018) 10.12688/f1000research.15895.1

HUPAN (2019) https://doi.org/10.1186/s13059-019-1751-y

Bake off (preprint 2017) http://dx.doi.org/10.1101/101378

VG toolkit paper (2018) https://dx.doi.org/10.1038%2Fnbt.4227

EG’s thesis (2019) -- describes vg construct, seqwish, and vg msga https://doi.org/10.17863/CAM.41621

Minigraph (2019)

GenomeMapper(2009) https://genomebiology.biomedcentral.com/articles/10.1186/gb-2009-10-9-r98

Graph alignment algorithms

Classic (bit little known) DP for aligning to (cyclic) graphs (2000) http://dx.doi.org/10.1016/S0304-3975(99)00333-3

Approximate matching of regular expressions (1989) http://dx.doi.org/10.1016/S0092-8240(89)80046-1

A New Method That Simultaneously Aligns and Reconstructs Ancestral Sequences for Any Number of Homologous Sequences, When the Phylogeny Is Given (1989) http://dx.doi.org/10.1093/oxfordjournals.molbev.a040577

Partial order alignment (2002) https://doi.org/10.1093/bioinformatics/18.3.452

PO-POA (2004) -- DAG to DAG alignment and MSA construction https://doi.org/10.1093/bioinformatics/bth126

Adam’s context mapping (2015) https://doi.org/10.1093/bioinformatics/btv435

Some guy’s master’s thesis on Adam’s context mapping (2016) https://www.semanticscholar.org/paper/Aligning-reads-against-a-graph-based-reference-Leonardsen/cb05ae5be6c29bfd220c43402a8657fa21e47c54

Complexity of string matching for graphs (2019) 10.4230/LIPIcs.ICALP.2019.55

V-ALIGN sequence alignment on directed graphs (preprint 2017) -- this has an official publication (http://dx.doi.org/10.1089/cmb.2017.0264), but it’s paywalled https://doi.org/10.1101/124941

Aligning sequences to general graphs in O(V + mE) time (preprint 2017) http://dx.doi.org/10.1101/216127 (Note that similar results have been published by Navarro in 2000, see above)

Bit-parallel sequence to graph alignment (2019) https://doi.org/10.1093/bioinformatics/btz162

On the complexity of sequence to graph alignment (preprint 2019) http://dx.doi.org/10.1101/522912

PaSGAL Accelerating sequence to graph alignment (preprint 2019) https://doi.org/10.1101/651638

Indexing and succinct genome graph models

Blight library -- minimizers for DBGs (preprint 2019) https://www.biorxiv.org/content/10.1101/546309v2

CHOP: haplotype indexing in graphs (preprint 2018) https://doi.org/10.1101/305268

PSI -- pan genomic seed index (2019) https://doi.org/10.1093/bioinformatics/btz341

Improved encoding of genetic variation in BWT (preprint 2019) http://dx.doi.org/10.1101/658716

BWBBLE (2013) https://doi.org/10.1093/bioinformatics/btt215

Gramtools / vBWT (2016) https://doi.org/10.1007/978-3-319-43681-4_18

GCSA (2014) 10.1109/TCBB.2013.2297101

GCSA2 (2016) https://doi.org/10.1137/1.9781611974768.2

Master’s thesis on distance metrics in variant graphs https://www.duo.uio.no/handle/10852/57798

Validating paired end reads in sequence graphs (preprint 2019) http://dx.doi.org/10.1101/682799

Sparse dynamic programming on DAGS of small width (2019) 10.1145/3301312

gPBWT (2017) https://doi.org/10.1186/s13015-017-0109-9

GBWT (preprint 2018) https://arxiv.org/abs/1805.03834

Efficient Construction of a Complete Index for Pan-Genomics Read Alignment (preprint Nov 2018) https://doi.org/10.1101/472423

Other population-ish succinct data structures

PanCake - representing aligned sequences (2013) 10.4230/OASIcs.GCB.2013.35

FM index of an alignment (2016) https://doi.org/10.1016/j.tcs.2015.08.008

FM index of a gapped alignment (2018) https://doi.org/10.1016/j.tcs.2017.02.020

Journaled string tree (2014) https://doi.org/10.1093/bioinformatics/btu438

Population BWT -- reference free sequences (2017) 10.1101/gr.211748.116

Making a DBG with BWT https://doi.org/10.1093/bioinformatics/btv603

Bloom Filter Trie -- pan genome storage (2015) 10.1007/978-3-662-48221-6_16

Multi-BRWT -- colored DBG (2018) https://doi.org/10.3929/ethz-b-000314581

PufferFish -- colored DBG (2018) https://doi.org/10.1093/bioinformatics/bty292

Mettanot - colored DBG (preprint 2017) https://doi.org/10.1101/236711

GTC - VCF files (2018) https://doi.org/10.1093/bioinformatics/bty023

MuGI - VCF files (2014) https://doi.org/10.1371/journal.pone.0109384

Compressing large VCFs (2011) https://doi.org/10.1093/bioinformatics/btt460

Tomahawk ...

PBWT -- phased VCFs (2014) https://doi.org/10.1093/bioinformatics/btu014

BGT - VCFs (2016) https://doi.org/10.1093/bioinformatics/btv613

Complete index for pan genomic alignment (2019) https://doi.org/10.1007/978-3-030-17083-7_10

DBGs https://www.pnas.org/content/98/17/9748.short

Colored DBGs https://www.nature.com/ng/journal/v44/n2/abs/ng.1028.html

BiFrost https://www.biorxiv.org/content/10.1101/695338v2.abstract

Pan-Tools (kmer based annotations) (just uses neo4j) https://doi.org/10.1093/bioinformatics/btw455

SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips (2014) https://doi.org/10.1093/bioinformatics/btu756

Finding structures in pangenome graphs

Bubbles (various) Bubbleparse (2013) https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0060058

Superbubbles (various) ...

Context mapping (?) ...

Snarls (2018) https://doi.org/10.1089/cmb.2017.0251

SPQR tree decomposition https://en.wikipedia.org/wiki/SPQR_tree

Flow sort (2018) https://doi.org/10.1089/cmb.2017.0248

Minimum founder reconstruction on genome graphs (2019) https://doi.org/10.1186/s13015-019-0147-6

Variation graph mappers

VG (2018) https://doi.org/10.1038/nbt.4227

deBGA-VARA (2019) 10.1109/bibm.2018.8621555

HISAT2 (2019) https://doi.org/10.1038/s41587-019-0201-4

GenomeMapper (2009) https://doi.org/10.1186/gb-2009-10-9-r98

V-MAP (2019) 10.4230/LIPIcs.WABI.2019.7

7 bridges (2019) https://doi.org/10.1038/s41588-018-0316-4

GraphAligner (2019) -- also in the alignment section DP Algorithm: https://doi.org/10.1093/bioinformatics/btz162 Tool preprint: https://doi.org/10.1101/810812

De Bruijn graph mappers

BrownieAligner (2018) https://doi.org/10.1186/s12859-018-2319-7

BlastGraph (2012) http://www.stringology.org/event/2012/p06.html

BGREAT (2016) https://doi.org/10.1186/s12859-016-1103-9

deBGA (2016) https://doi.org/10.1093/bioinformatics/btw371

Non-graph population mapping tools

AltHapAlignR (2018) https://doi.org/10.1093/bioinformatics/bty125

CHIC (preprint 2017) http://dx.doi.org/10.1101/178129

Visualization

Tube maps (2019) https://doi.org/10.1093/bioinformatics/btz597

Bandage (2015) https://doi.org/10.1093/bioinformatics/btv383

EG’s thesis https://doi.org/10.17863/CAM.41621

GfaViz (2019) https://doi.org/10.1093/bioinformatics/bty1046

Assembly Graph Browser (2019) https://doi.org/10.1093/bioinformatics/btz072

SGTK (2019) https://doi.org/10.1093/bioinformatics/bty956

Downstream use cases

Error correction

Lordec (2014) http://dx.doi.org/10.1093/bioinformatics/btu538

Bcool (2019) https://doi.org/10.1093/bioinformatics/btz102

BCT (preprint 2019) http://dx.doi.org/10.1101/673624

GraphAligner (preprint 2019) -- alread mentioned as aligner above https://doi.org/10.1101/810812

Variant calling / Genotyping

Cortex (2012) https://www.nature.com/articles/ng.1028

Bubbleparse (2013) https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0060058

1000GP phase 3 paper (2015) -- graph based genotyping process described in supplement https://doi.org/10.1038/nature15393

PanVC (2018) https://doi.org/10.1186/s12864-018-4465-8

HISAT-Genotype (2019) -- shared paper with HISAT2 https://doi.org/10.1038/s41587-019-0201-4

PRG (2015) https://doi.org/10.1038/ng.3257

HLA/PRG (2016) https://doi.org/10.1371/journal.pcbi.1005151

HLA/LA (2019) https://doi.org/10.1093/bioinformatics/btz235

Paragraph (preprint 2019) http://dx.doi.org/10.1101/635011

Vg call for SVs (preprint 2019) https://www.biorxiv.org/content/10.1101/654566v1.abstract

ExpansionHunter (preprint 2019) http://dx.doi.org/10.1101/572545

GraphTyper (2019) https://doi.org/10.1038/s41588-018-0316-4

BayesTyper (2018) https://doi.org/10.1038/s41588-018-0145-5

Kourami (2018) https://doi.org/10.1186/s13059-018-1388-2

Epigenomics

GraphPeakCaller (2019) https://doi.org/10.1371/journal.pcbi.1006731

Personalized and graph genomes reveal missing signal in epigenomic data (preprint 2019) http://dx.doi.org/10.1101/457101

Transcriptomics

Quantifies RNA-seq reference-bias (2009) https://doi.org/10.1093/bioinformatics/btp579

GSNAP: SNP-aware mapper (2010) https://www.doi.org/10.1093/bioinformatics/btq057

AlleleSeq: Diploid personal genome mapping (2011) https://doi.org/10.1038/msb.2011.54

MMSEQ: Diploid transcriptome (2011) https://doi.org/10.1186/gb-2011-12-2-r13

Quantifies RNA-seq reference-bias (2014) https://doi.org/10.1186/s13059-014-0467-2

Describes reference-bias in relation to ASE (2015) https://doi.org/10.1186/s13059-015-0762-6

WASP: reference-bias correction (2015) https://doi.org/10.1038/nmeth.3582

rPGA: Personal genome mapping (2015) https://doi.org/10.1093/nar/gkv1099

Kallisto: de Bruijn graph pseudo-alignment (2015) https://doi.org/10.1038/nbt.3519

ASElux: SNP-aware alignment (2017) https://doi.org/10.1093/bioinformatics/btx762

ASGAL: Splice-graph mapper (2018) https://link.springer.com/chapter/10.1007/978-3-319-58163-7_3 https://www.doi.org/10.1186/s12859-018-2436-3

AltHapAlignR: Mapping to alternative reference haplotypes (2018) https://doi.org/10.1093/bioinformatics/bty125

iMapSplice: Mapping to alternative reference bases (2018) https://doi.org/10.1371/journal.pone.0201554

EMASE: Alignment to a diploid transcriptome (2018) https://doi.org/10.1093/bioinformatics/bty078

HISAT2: Variation graph mapper (2019) - also mentioned in the variation graph mapping section https://doi.org/10.1038/s41587-019-0201-4

Metagenomics and quasispecies

Mykrobe predictor (2015) https://doi.org/10.1038/ncomms10063

MetaKallisto (2017) https://doi.org/10.1093/bioinformatics/btx106

Metagenomic classification and assembly review (2017) https://doi.org/10.1093/bib/bbx120

GROOT (2018) https://doi.org/10.1093/bioinformatics/bty387

Virus-VG (2019) https://doi.org/10.1093/bioinformatics/btz443

VG-Flow (2019) https://doi.org/10.1101/645721

graph-genomics-review's People

Contributors

adamnovak avatar cartoonist avatar eblerjana avatar ekg avatar glennhickey avatar jeizenga avatar jltsiren avatar jonassibbesen avatar josiahseaman avatar maickrau avatar shilpagarg avatar subwaystation avatar tobiasmarschall avatar xchang1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graph-genomics-review's Issues

I pulled SHIMMER Sketch Graphs out

I pulled "SHIMMER Sketch Graphs" out of the table of visualization tools, because I couldn't find a citation or a link for it, and I couldn't work out what to put in the new/different columns I was adding to the table.

@cschin, apparently this is your tool and might be part of https://github.com/cschin/Peregrine perhaps? Can you provide a link/reference for the visualization aspect of the tool?

No Section Sharing Pangenome Graphs

This would be the section I am really unsure if it would fit into the paper.
How can such graphs be shared and / or accessed? Because they will become really huge and not so easy to query. The thing I have in mind would be a short section on Semantic Genome Graphs. Though, this might become a topic in the Future Section?

Section Visualization

Hi @vgteam,

I am very happy that someone is writing that review. I already noted some literature that I was not aware of, yet ;)
I briefly read through the review and now I am trying to give some constructive feedback. I am aware that this is work in progress.

So far from my side for this section, but I only had a brief look at it until now.
I will take a closer look again to give you more valuable feedback.

No Section about Meta Data / Annotation

What I am missing in the whole review so far, is at least a brief section about meta data / annotation in pangenome graphs. If you want to build precise graphs, you need to make sure that people are able to understand and interpret them, right? Please correct me here!
Building them alone without any meta information will do something for you like comparing populations on a structural level. But especially in human, we want to take a look at annotated regions.

  • As far as I know, only vg is able to integrate annotation from a GFF file?
  • Are there any other options?
  • How can we annotate non reference-based graphs?
  • Is a liftover of annotations possible or desired in the future?

But I am really not sure, if there should be a section about such a topic in a Precision pangenomics paper. Open for discussion.

\citeauthor refs are broken

Most of these are in the indexing section. I think they broke when we switched to the news bibliographic style.

More Graph Tools

is it De Bruijn or de Bruijn graphs?

I can't seem to see what the consensus about how to write this is.

I just want it consistent. Any suggestions? I'm leaning towards "De Bruijn graphs".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.