Giter Club home page Giter Club logo

patric_gto_tools's Introduction

Tools for PATRIC GTO format

Scripts to convert PATRIC GTO format to other formats.

PATRIC database (the Pathosystems Resource Integration Center) https://www.patricbrc.org/ stores genomes in the GTO format. The GTO format is contains nucleotide and amino-acid sequences and various annotation information.

The repository provides lightweight scripts for converting GTO format to:

  • FNA (nucleotide contigs in fasta format)
  • FNN (genes nucleotide sequences in fasta format)
  • FAA (translated genes amino acid sequences in fasta format)
  • GFF (gene annotations and nucleotide contigs in GFF3 format)

Scripts

Scripts are written on Perl and Python. Located in the ./scripts folder.

The Perl scrips require JSON library. It could be installed with CPAN (The Comprehensive Perl Archive Network): https://metacpan.org/pod/JSON.

With cpanm:

cpanm JSON

With CPAN shell:

perl -MCPAN -e shell
install JSON

GTO to FNA

Converts GTO files to files with nucleotide contigs in fasta format.

Input: Directory with GTO files. Files should have ".gto" extension.

Output: Creates directory "fna" with ".fna" files. One FNA file per one GTO file. Names preserved the same as in GTO files.

usage: perl fna_from_gto.pl [directory with GTO files]

GTO to FNN

Converts GTO files to files with gene nucleotide sequences in fasta format.

Input: Directory with GTO files. Files should have ".gto" extension.

Output: Make ".fnn" files in the directory defined by "-o" flag. One FNN file per one GTO file. Names preserved the same as in GTO files.

usage: python3 fnn_from_gto.py [-h] [-g GTO] [-o OUTPUT]

Converts GTO files to FNN with nucleotide sequences for each gene

optional arguments:
  -h, --help            show this help message and exit
  -g GTO, --gto GTO     Directory with GTO files
  -o OUTPUT, --output OUTPUT
                        Output directory for FNN files
                        Default: "fnn" directory

GTO to FAA

Converts GTO files to files with amino acid sequences in fasta format.

Input: Directory with GTO files. Files should have ".gto" extension.

Output: Creates directory "faa" with ".faa" files. One FAA file per one GTO file. Names preserved the same as in GTO files.

usage: perl faa_from_gto.pl [directory with GTO files]

GTO to GFF3

Converts GTO files to files with annotations in GFF format. Script adds ##FASTA section at the end of file that contains nucleotide contigs.

Input: Directory with GTO files. Files should have ".gto" extension.

Output: Creates directory "faa" with ".faa" files. One GFF file per one GTO file. Names are generated from genome ids recorded in GTO file.

Annotation records are often unsorted or sorted by position in reverse order. Two version of script:

  • Script that sorts annotations in ascending order (Preferred):
usage: perl gff_from_gto_sorted.pl [directory with GTO files]
  • Script that outputs annotations in the order how they are written in GTO file:
usage perl gff_from_gto_no_sotring.pl [directory with GTO files]

patric_gto_tools's People

Contributors

sleyn avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.