Giter Club home page Giter Club logo

cazy-parser's Introduction

cazy-parser

A way to extract specific information from the Carbohydrate-Active enZYmes.

status DOI

License: GNU GPLv3

If you are using this tool please read and cite the paper!

RV Honorato. CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database. The Journal of Open Source Software, 1(8), dec 2016.

doi: 10.21105/joss.00053

Also make sure to visit and cite the CAZy website

  • http://www.cazy.org/
  • Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The Carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490โ€“D495. [PMID: 24270786].

Introduction

cazy-parser is a tool that extract information from CAZy in a more usable and readable format. Firstly, a script reads the HTML structure and creates a mirror of the database as a tab delimited file. Secondly, information is extracted from the database according to user inputted parameters and presented to the user as a set of accession codes.

Changelog

v1.1 - Fixed bug when identifying page indexes

Installation

$ pip install cazy-parser

or

Download latest source from this link

$ tar -zxvf cazy-parser-x.x.x.tar.gz
$ cd cazy-parser-x.x.x
$ python setup.py install

Usage

Please note that both steps require an internet conection

  1. Database creation

$ create_cazy_db

(-h for help)

  • This script will parse the CAZy database website and create a comma separated table containing the following information:
    • domain
    • protein_name
    • family
    • tag (characterized status)
    • organism_code
    • EC number (ec stands for enzyme comission number)
    • GENBANK id
    • UNIPROT code
    • subfamily
    • organism
    • PDB code
  1. Extract sequences
  • Based on the previously generated csv table, extract accession codes for a given protein family.

$ extract_cazy_ids --db <database> --family <family code>

(-h for help)

  • Optional:

--subfamilies Create a file for each subfamily, default = False

--characterized Create a file containing only characterized enzymes, default = False

Usage examples

  1. Extract all accession codes from family 9 of Glycosyl Transferases.

$ extract_cazy_ids --db CAZy_DB_xx-xx-xxxx.csv --family GT9

This will generate the following files:

GT9.csv
  1. Extract all accession codes from family 43 of Glycoside Hydrolase, including subfamilies

$ extract_cazy_ids --db CAZy_DB_xx-xx-xxxx.csv --family GH43 --subfamilies

This will generate the following files:

GH43.csv
GH43_sub1.csv
GH43_sub2.csv
GH43_sub3.csv
(...)
GH43_sub37.csv
  1. Extract all accession codes from family 42 of Polysaccharide Lyases including characterized entries

$ extract_cazy_ids --db CAZy_DB_xx-xx-xxxx.csv --family PL42 --characterized

This will generate the following files:

PL42.fasta
PL42_characterized.fasta

To-do and how to contribute

Please refer to CONTRIBUTE.md

Known bugs

None, yet.

Contact info

If there are any inquires please contact me on rvhonorato at gmail.com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.