Giter Club home page Giter Club logo

synthaser's Introduction

synthaser

Coverage Status Tests passing Documentation Status PyPI version

Process

synthaser parses the results of a batch NCBI conserved domain search and determines the domain architecture of secondary metabolite synthases.

Installation

Install from PyPI using pip:

$ pip install --user synthaser

or clone the repo and install locally:

$ git clone https://www.github.com/gamcil/synthaser
$ cd synthaser
$ pip install .

Finally, configure synthaser with your e-mail address or NCBI API key (used when making requests to NCBI servers), for example:

$ synthaser config --email [email protected]

Dependencies

synthaser is written in pure Python (3.6+), and requires only the following dependencies for remote searches:

  • requests, for interaction with the NCBI's CD-Search API
  • biopython, for retrieving sequences from NCBI Entrez

If you want to do local searches, you'll need:

  • RPS-BLAST, for performing local domain searches
  • rpsbproc, for formatting RPS-BLAST results like CD-Search

These can be obtained from the NCBI FTP.

Usage

A full synthaser search can be performed as simply as:

$ synthaser search -qf sequences.fasta

Where sequences.fasta is a FASTA format file containing the protein sequences that you would like to search.

For a full listing of available arguments, enter:

$ synthaser -h

Visualising your results

synthaser is capable of generating fully-interactive, annotated visualisations so you can easily explore your results. All that is required is one extra argument:

$ synthaser search -qf sequences.fasta -p

This will generate a figure like so:

Example synthaser output

Click here to play around with the full version of this example.

Saving your search session

synthaser allows you to save your search results such that they can be easily reloaded for further visualisation or exploration without having to fully re-do the search.

To do this, use the --json_file command:

$ synthaser search -qf sequences.fasta --json_file sequences.json

This will save all of your results, in JSON format, to the file sequences.json. Then, loading this session back into synthaser, is as easy as:

$ synthaser search --json_file sequences.json ...

Using your own rules

Though synthaser was originally designed to analyse secondary metabolite synthases, it can easily be repurposed to analyse the domain architectures of any type of protein sequence.

Under the hood, synthaser uses a central rule file which contains:

  1. Domain types, containing specific families to save in CD-Search results, corresponding to domain 'islands';
  2. Rules for classifying the sequences based on domain architecture predictions; and
  3. A hierarchy which determines the order of evaluation for the rules.

We distribute our fungal megasynthase rule file as the default, but providing your own rule file is as simple as:

$ synthaser search -qf sequences.fasta --rule_file my_rules.json

We also provide a web application for assembling your own rule files, which can be found here.

For a detailed explanation of how the rule file works, as well as API documentation, please refer to the documentation.

Citations

If you found synthaser helpful, please cite:

Gilchrist, C. L., & Chooi, Y. H. (2021).
Synthaser: a CD-Search enabled Python toolkit for analysing domain architecture of fungal secondary metabolite megasynth (et) ases.
Fungal Biology and Biotechnology, 8(1), 1-19.

synthaser's People

Contributors

gamcil avatar trellixvulnteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

synthaser's Issues

ValueError: Empty results file; perhaps invalid query

Hello,

I run this command and get these errors. Could you please help me to figure it out? Thanks a lot.

command:
synthaser search -qf a_sixpack_rename.fa -o a_cdd.out --evalue 0.01

errors:
`Traceback (most recent call last):
File "/home/sunwanying/.local/lib/python3.8/site-packages/synthaser/search.py", line 123, in search
with open(results_file) as rf:
TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/sunwanying/.local/lib/python3.8/site-packages/synthaser/main.py", line 65, in synthaser
synthases = search.search(
File "/home/sunwanying/.local/lib/python3.8/site-packages/synthaser/search.py", line 134, in search
handle = _remote(
File "/home/sunwanying/.local/lib/python3.8/site-packages/synthaser/search.py", line 169, in _remote
response = ncbi.retrieve(cdsid, delay=delay, max_retries=max_retries)
File "/home/sunwanying/.local/lib/python3.8/site-packages/synthaser/ncbi.py", line 202, in retrieve
finished = check(cdsid)
File "/home/sunwanying/.local/lib/python3.8/site-packages/synthaser/ncbi.py", line 112, in check
raise ValueError("Empty results file; perhaps invalid query?")
ValueError: Empty results file; perhaps invalid query?`

Consider a different colour scheme for the output

Hi folks,

The current output visualisation uses a colour scheme that assigns very similar colours to domains that tend to occur adjacent to each other. This makes it a bit tricky to see what's going on when you have regular colour vision, and pretty much impossible if your colour vision is deficient.
Ideally, you'd want to ensure that adjacent domains have colours with distinct saturations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.