Giter Club home page Giter Club logo

augurlinos's Introduction

Logo

This repository is archived and contains the content used to build the documentation and splash page found in nextstrain.org. This content can now be found here.

License and copyright

Copyright 2014-2018 Trevor Bedford and Richard Neher.

Source code to Nextstrain is made available under the terms of the GNU Affero General Public License (AGPL). Nextstrain is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

augurlinos's People

Contributors

rneher avatar trvrb avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

augurlinos's Issues

'Locus_tag' or 'gene' when reading in GB/GFF files?

Currently load_features in util.py looks for 'feature type' "CDS" and 'feature qualifier' "locus_tag" to get the gene name. For Zika this works as expected.
However, for TB, "CDS" does not contain "locus_tag" (or anything useful). Further, in "gene" (instead of CDS) the "locus_tag" is not the identifier commonly used for genes (ex: gene='dnaA' locus_tag='Rv0001').

For GFF files I have modified load_features (in 'vcf' branch) so that looks for 'gene' and 'gene' instead of 'CDS' and 'locus_tag'. However, for avian influenza Genbank files, the combination should be 'CDS' and 'gene' (this returns the expected PB2, HA, NA, etc).

We should probably either look to see if there is a general rule (or two) we can put in place to ensure we're always getting the common gene names, or we should consider turning this into an option of some kind for users to specify.

locus_tag Specification Partially Implemented

In util.py load_features, for VCF files, it now takes 'gene' as the name if it is specified, and 'locus_tag' if it is not. If no gene-list has been given by the user, this ensures that all genes are read. Similarly, if a gene-list has been specified, and user has named each gene by the 'gene' name if it has one (in the .gff file), and only used 'locus_tag' if it does not, this will find all genes.

However, if the user species a gene by 'locus_tag' even though it has a 'gene' name in the .gff file, it currently will not be included, as 'locus_tag' is only used if 'gene' is absent. If a gene list has been specified, we should check both fields so that users can specify either way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.