Giter Club home page Giter Club logo

blast_reporting's Introduction

blast_reporting

NCBI BLAST+ searches can output in a range of formats, but in the past only the XML format included fields like sequence description. This tool converts the NCBI BLAST XML report into 12, 24, 26 or custom column tabular and HTML reports. It is used as a command-line tool or via a Galaxy bioinformatics platform tool.

The tool allows almost complete control over which fields are displayed and filtered, how columns are named, and how the HTML report on each query is sectioned. Search result records can be filtered out based on values in numeric or textual fields. Matches (by accession id) to a selection of reference databases can be shown, and this can include a description of the matched sequence.

Currently this tool only takes as input the "Output format: BLAST XML" option of the NCBI Blast+ search tool, triggered by (for example)

blastn -outfmt 5 -query "...."

or via Galaxy by selecting the NCBI Blast+ search tool's option towards bottom of form ...

Documentation

A fairly comprehensive user guide is available in the doc/ folder.

Installation

The tool can be installed from https://toolshed.g2.bx.psu.edu/ . It draws upon the XML reports generated by the NCBI Blast+ tools.

The setup of Reference Bins and the Selectable HTML Report are optional as described below.

Using ''Reference Bins''

A reference bin file is simply a text file having line records each containing an accession id and a description. The accession id is cross-referenced with the accession id returned with each search hit. However we have to tell the Blast reporting tool where these tables are. Their names and paths are listed in the fasta_reference_dbs.loc.sample, which ends up in the Galaxy install's tool-data/fasta_reference_dbs.loc file. Example:

AADS00000000.1 Phanerochaete chrysosporium RP-78
AAEW02000014.2 Desulfuromonas acetoxidans DSM 684
AAEY01000007.0 Cryptococcus neoformans var. neoformans B-3501A
AAFI01000166 Dictyostelium discoideum AX4
AAFW02000169.3 Saccharomyces cerevisiae YJM789

Both the search result hit and the reference file accession ids are stripped of any fractional component before being compared.

Using the ''Selectable HTML Report'':

  • This was an EXPERIMENTAL feature that no longer works in current version of Galaxy, and so has been discontinued.

Development notes

A few changes are in the works: A galaxy form tool fix sheduled in the next month will enable setup of reference databases to be much easier. A Galaxy administrator will only have to create a "Reference Bin" data library, and load each reference bin file into it. No more need to set up the fasta_reference_dbs.loc file.

blast_reporting's People

Contributors

dfornika avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

jic523 dfornika

blast_reporting's Issues

Error producing error message when reference bin file not found

Tool fails with message:

Traceback (most recent call last):
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/blast_reporting.py", line 627, in <module>
    reportEngine.__main__()
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/blast_reporting.py", line 502, in __main__
    tagGroup = XMLRecordScan(options, output_format)
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/blast_reporting.py", line 220, in __init__
    self.binManager.build_bins(options.reference_bins, self.columns)
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/reference_bins.py", line 48, in build_bins
    newbin = self.buildBin(field_name, bin_filter)
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/reference_bins.py", line 93, in buildBin
    stop_err("Reference bin could not be found or opened: " + self.path + bin_folder_name + '/accession_ids.tab')
NameError: global name 'stop_err' is not defined

Other references to stop_err are prefixed by the module name common:

common.stop_err("Invalid bin name: " + field_name + ':' + myfield)

If stop_err is replaced with common.stop_err on reference_bins.py line 93 then this error is produced:

Traceback (most recent call last):
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/blast_reporting.py", line 627, in <module>
    reportEngine.__main__()
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/blast_reporting.py", line 502, in __main__
    tagGroup = XMLRecordScan(options, output_format)
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/blast_reporting.py", line 220, in __init__
    self.binManager.build_bins(options.reference_bins, self.columns)
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/reference_bins.py", line 48, in build_bins
    newbin = self.buildBin(field_name, bin_filter)
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/reference_bins.py", line 93, in buildBin
    common.stop_err("Reference bin could not be found or opened: " + self.path + bin_folder_name + '/accession_ids.tab')
AttributeError: ReferenceBins instance has no attribute 'path'

Error when 'Only field selections below' mode is used for 'Basic Report Field Output'

Traceback (most recent call last):
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/blast_reporting.py", line 627, in <module>
    reportEngine.__main__()
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/blast_reporting.py", line 581, in __main__
    common.fileSelections(out_tabular_file, selection_file, tagGroup, options)
  File "/opt/production_galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/damion/blast_reporting/812de0e282bd/blast_reporting/common.py", line 481, in fileSelections
    writer.writerow([row[qseqid_col], row[qseq_col], grouping, selectrow_count])
UnboundLocalError: local variable 'qseq_col' referenced before assignment

writer.writerow([row[sseqid_col], row[sseq_col], grouping, selectrow_count])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.