Giter Club home page Giter Club logo

quast's Introduction

QUAST evaluates genome assemblies.
It can works both with and without a given reference genome.
The tool accepts multiple assemblies, thus is suitable for comparison.

Usage

./quast.py test_data/contigs_1.fasta \
           test_data/contigs_2.fasta \
        -R test_data/reference.fasta.gz \
        -O test_data/operons.txt \
        -G test_data/genes.txt \
        -o output_directory

Output

report.txt     summary table
report.tsv     tab-separated version, for parsing, or for spreadsheets (Google Docs, Excel, etc)  
report.tex     Latex version
report.pdf     PDF version, includes all tables and plots for some statistics
report.html    everything in an interactive HTML file
alignment.svg  visualized alignement of contigs to reference

**Metrics based only on contigs:**
  • Number of large contigs (i.e., longer than 500 bp) and total length of them.
  • Length of the largest contig.
  • N50 (length of a contig, such that all the contigs of at least the same length together cover at least 50% of the assembly).
  • Number of predicted genes, discovered either by GeneMark.hmm (for prokaryotes), GeneMark-ES or GlimmerHMM (for eukaryotes), or MetaGeneMark (for metagenomes).

When a reference is given:

  • Numbers of misassemblies of different kinds (inversions, relocations, translocations, interspecies translocations (metaQUAST only) or local).
  • Number and total length of unaligned contigs.
  • Numbers of mismatches and indels, over the assembly and per 100 kb.
  • Genome fraction %, assembled part of the reference.
  • Duplication ratio, the total number of aligned bases in the assembly divided by the total number of those in the reference. If the assembly contains many contigs that cover the same regions, its duplication ratio will significantly exceed 1. This occurs due to multiple reasons, including overestimating repeat multiplicities and overlaps between contigs.
  • Number of genes in the assembly, completely or partially covered, based on a user-provided list of gene positions in the reference.
  • NGA50, a reference-aware version of N50 metric. It is calculated using aligned blocks instead of contigs. Such blocks are obtained after removing unaligned regions, and then splitting contigs at misassembly breakpoints. Thus, NGA50 is the length of a block, such that all the blocks of at least the same length together cover at least 50% of the reference.

For more features and explanations, see the [manual](http://quast.bioinf.spbau.ru/manual).

You can also check out the web interface: http://quast.bioinf.spbau.ru

quast's People

Contributors

ad3002 avatar alexeigurevich avatar almiheenko avatar vladsavelyev avatar vyahhi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.