Giter Club home page Giter Club logo

staramr's Introduction

Build Status pypi conda

staramr

staramr (*AMR) scans bacterial genome contigs against the ResFinder, PointFinder, and PlasmidFinder databases (used by the ResFinder webservice and other webservices offered by the Center for Genomic Epidemiology) and compiles a summary report of detected antimicrobial resistance genes. The star|* in staramr indicates that it can handle all of the ResFinder, PointFinder, and PlasmidFinder databases.

Note: The predicted phenotypes/drug resistances are for microbiological resistance and not clinical resistance. This is provided with support from the NARMS/CIPARS Molecular Working Group and is continually being improved. A small comparison between phenotype/drug resistance predictions produced by staramr and those available from NCBI can be found in the tutorial. We welcome any feedback or suggestions.

For example:

staramr search -o out --pointfinder-organism salmonella *.fasta

out/summary.tsv:

Isolate ID Quality Module Genotype Predicted Phenotype CGE Predicted Phenotype Plasmid Scheme Sequence Type Genome Length N50 value Number of Contigs Greater Than Or Equal To 300 bp Quality Module Feedback
SRR1952908 Passed aadA1, aadA2, blaTEM-57, cmlA1, gyrA (S83Y), sul3, tet(A) streptomycin, ampicillin, chloramphenicol, ciprofloxacin I/R, nalidixic acid, sulfisoxazole, tetracycline Spectinomycin, Streptomycin, Amoxicillin, Ampicillin, Cephalothin, Piperacillin, Ticarcillin, Chloramphenicol, Nalidixic acid, Ciprofloxacin, Sulfamethoxazole, Doxycycline, Tetracycline ColpVC, IncFIB(S), IncFII(S), IncI1-I(Alpha) senterica_achtman_2 11 4785500 250423 41
SRR1952926 Passed blaTEM-57, gyrA (S83Y), tet(A) ampicillin, ciprofloxacin I/R, nalidixic acid, tetracycline Amoxicillin, Ampicillin, Cephalothin, Piperacillin, Ticarcillin, Nalidixic acid, Ciprofloxacin, Doxycycline, Tetracycline ColpVC, IncFIB(S), IncFII(S), IncI1-I(Alpha) senterica_achtman_2 11 4785451 228311 40

out/detailed_summary.tsv:

Isolate ID Data Data Type Predicted Phenotype CGE Predicted Phenotype %Identity %Overlap HSP Length/Total Length Contig Start End Accession
SRR1952908 ST11 (senterica_achtman_2) MLST
SRR1952908 ColpVC Plasmid 98.96 100.0 193/193 contig00038 1618 1426 JX133088
SRR1952908 aadA1 Resistance streptomycin Spectinomycin, Streptomycin 100.0 100.0 792/792 contig00030 5355 4564 JQ414041

out/resfinder.tsv:

Isolate ID Gene Predicted Phenotype CGE Predicted Phenotype %Identity %Overlap HSP Length/Total Length Contig Start End Accession Sequence CGE Notes
SRR1952908 sul3 sulfisoxazole Sulfamethoxazole 100.00 100.00 792/792 contig00030 2091 2882 AJ459418 ATGA[...]
SRR1952908 tet(A) tetracycline Doxycycline, Tetracycline 99.92 97.80 1247/1275 contig00032 1476 2722 AF534183 ATGT[...]

out/pointfinder.tsv:

Isolate ID Gene Predicted Phenotype CGE Predicted Phenotype Type Position Mutation %Identity %Overlap HSP Length/Total Length Contig Start End Pointfinder Position CGE Notes CGE Required Mutation CGE Mechanism CGE PMID
SRR1952908 gyrA (S83Y) ciprofloxacin I/R, nalidixic acid Nalidixic acid,Ciprofloxacin codon 83 TCC -> TAC (S -> Y) 99.96 100.00 2637/2637 contig00008 22801 20165 S83Y Target modification 7492118,10471553
SRR1952926 gyrA (S83Y) ciprofloxacin I/R, nalidixic acid Nalidixic acid,Ciprofloxacin codon 83 TCC -> TAC (S -> Y) 99.96 100.00 2637/2637 contig00011 157768 160404 S83Y Target modification 7492118,10471553

out/plasmidfinder.tsv:

Isolate ID Plasmid %Identity %Overlap HSP Length/Total Length Contig Start End Accession
SRR1952908 ColpVC 98.96 100 193/193 contig00038 1618 1426 JX133088
SRR1952908 IncFIB(S) 98.91 100 643/643 contig00024 10302 9660 FN432031

out/mlst.tsv:

Isolate ID Scheme Sequence Type Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 Locus 6 Locus 7
SRR1952908 senterica_achtman_2 11 aroC(5) dnaN(2) hemD(3) hisD(7) purE(6) sucA(6) thrA(11)
SRR1952926 senterica_achtman_2 11 aroC(5) dnaN(2) hemD(3) hisD(7) purE(6) sucA(6) thrA(11)

Table of Contents

Quick Usage

Search contigs

To search a list of contigs (in fasta format) for AMR genes using ResFinder please run:

staramr search -o out *.fasta

Output files will be located in the directory out/.

To include acquired point-mutation resistances using PointFinder, please run:

staramr search --pointfinder-organism salmonella -o out *.fasta

Where --pointfinder-organism is the specific organism you are interested in (currently only salmonella, campylobacter, enterococcus faecalis and enterococcus faecium are supported).

To specify which PlasmidFinder database to use, please run:

staramr search --plasmidfinder-database-type enterobacteriaceae -o out *.fasta

Where --plasmidfinder-database-type is the specific database type you are interested in (currently only gram_positive, enterobacteriaceae are supported). By default, both databases are used.

To specify which MLST scheme to use, please run:

staramr search -o out --mlst-scheme senterica *.fasta

Where --mlst-scheme is the specific organism you are interested in (please visit the scheme genus map to see which are available). By default, it detects the scheme automatically.

Database Info

To print information about the installed databases, please run:

staramr db info

Update Database

If you wish to update to the latest ResFinder, PointFinder, and PlasmidFinder databases, you may run:

staramr db update --update-default

If you wish to switch to specific git commits of either ResFinder, PointFinder, or PlasmidFinder databases you may also pass --resfinder-commit [COMMIT], --pointfinder-commit [COMMIT], and --plasmidfinder-commit [COMMIT].

Restore Database

If you have updated the ResFinder/PointFinder/PlasmidFinder databases and wish to restore to the default version, you may run:

staramr db restore-default

Installation

Bioconda

Separate conda environment

The easiest way to install staramr is through Bioconda (we recommend using mamba as an alternative to conda).

conda install mamba # Install mamba to make it easier to install later dependencies
mamba create -c conda-forge -c bioconda -c defaults --name staramr staramr

This will install the staramr software at the most recent version within the conda environment named staramr. Bioconda will install all necessary dependencies and databases. Once this is complete you can run:

conda activate staramr # Activate conda environment
staramr --help

Same conda environment

If, instead, you wish to install staramr to the current conda environment you can run:

mamba install -c conda-forge -c bioconda -c defaults staramr

You should now be able to run staramr --help and recieve a usage statement.

PyPI/Pip

You can also install staramr from PyPI using pip:

pip install staramr

However, you will have to install the external dependencies (listed below) separately.

Latest Code

If you wish to make use of the latest in-development version of staramr, you may update directly from GitHub using pip:

pip install git+https://github.com/phac-nml/staramr

This will only install the Python code, you will still have to install the dependencies listed below (or run the pip command from the previously installed Bioconda environment).

Alternatively, if you wish to do development with staramr you can use a Python virtual environment (you must still install the non-Python dependencies separately).

# Clone code
git clone https://github.com/phac-nml/staramr.git
cd staramr

# Setup virtual environment
virtualenv -p /path/to/python-bin .venv
source .venv/bin/activate

# Install staramr. Use '-e' to update the install on code changes.
pip install -e .

# Now run `staramr`
staramr

Due to the way we packaged the ResFinder/PointFinder/PlasmidFinder databases, the development code will not come with a default database. You must first build the database before usage. E.g.

staramr db restore-default

Dependencies

  • Python 3.7+
  • BLAST+
  • Git
  • MLST

Input

List of genes to exclude

By default, the ResFinder/PointFinder/PlasmidFinder genes listed in genes_to_exclude.tsv will be excluded from the final results. To pass a custom list of genes the option --exclude-genes-file can be used, where the file specified will contains a list of the sequence ids (one per line) from the ResFinder/PointFinder/PlasmidFinder databases. For example:

gene_id
aac(6')-Iaa_1_NC_003197
ColpVC_1__JX133088

Please make sure to include gene_id in the first line. The default exclusion list can also be disabled with --no-exclude-genes.

Complex Mutations

Complex mutations describe multiple point mutations that must be simultaneously present in order to confer resistance. One such example is the multiple pbp5 mutations that must be present in Enterococcus faecium in order to confer ampicillin resistance. These complex mutations may be specified by the user using a TSV-formatted file with the following format:

positions mandatory phenotype
gene (mutation1), gene (mutation2) gene (mutation1) phenotype

Where positions is all the point mutations to group into the complex mutation (optional and mandatory), mandatory is all the point mutations that must be present for the complex mutation to be reported (mandatory is a subset of positions), and phenotype is the phenotype that is conferred when this set of mutations is present. To see a specific example of this, please look at the default complex_mutations.tsv file included with StarAMR. The mutation will be reported in the pointfinder.tsv file similar to as follows:

Isolate ID Gene Predicted Phenotype CGE Predicted Phenotype Type Position Mutation %Identity %Overlap HSP Length/Total Length Contig Start End Pointfinder Position CGE Notes CGE Required Mutation CGE Mechanism CGE PMID
pbp5 pbp5 (A216S), pbp5 (A499T), pbp5 (A68T), pbp5 (D204G), pbp5 (E100Q), pbp5 (E525D), pbp5 (E629V), pbp5 (E85D), pbp5 (G66E), pbp5 (K144Q), pbp5 (L177I), pbp5 (M485A), pbp5 (N496K), pbp5 (P667S), pbp5 (R34Q), pbp5 (S27G), pbp5 (T172A), pbp5 (T324A), pbp5 (V24A), pbp5 (V586L) ampicillin - complex 524, 527, 534, 566, 568, 585, 5100, 5144, 5172, 5177, 5204, 5216, 5324, 5485, 5496, 5499, 5525, 5586, 5629, 5667 complex 98.28 100.00 2037/2037 pbp5_1_AAK43724.1 1 2037 pbp5 (A216S), pbp5 (A499T), pbp5 (A68T), pbp5 (D204G), pbp5 (E100Q), pbp5 (E525D), pbp5 (E629V), pbp5 (E85D), pbp5 (G66E), pbp5 (K144Q), pbp5 (L177I), pbp5 (M485A), pbp5 (N496K), pbp5 (P667S), pbp5 (R34Q), pbp5 (S27G), pbp5 (T172A), pbp5 (T324A), pbp5 (V24A), pbp5 (V586L) - - - -

The complex mutation TSV file may be specifed on the command line when running Pointfinder:

staramr search --pointfinder-organism enterococcus_faecium -o out pbp5.fa --complex-mutations-file complex_mutations.tsv

Output

There are 8 different output files produced by staramr:

  1. summary.tsv: A summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one genome per line. A series of descriptive statistics is also provided for each genome as well as feedback for whether or not the genome passes several quality metrics and if not, feedback on why the genome fails.
  2. detailed_summary.tsv: A detailed summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one gene per line.
  3. resfinder.tsv: A tabular file of each AMR gene and additional BLAST information from the ResFinder database, one gene per line.
  4. pointfinder.tsv: A tabular file of each AMR point mutation and additional BLAST information from the PointFinder database, one gene per line.
  5. plasmidfinder.tsv: A tabular file of each AMR plasmid type and additional BLAST information from the PlasmidFinder database, one plasmid type per line.
  6. mlst.tsv: A tabular file of each multi-locus sequence type (MLST) and it's corresponding locus/alleles, one genome per line.
  7. settings.txt: The command-line, database versions, and other settings used to run staramr.
  8. results.xlsx: An Excel spreadsheet containing the previous 6 files as separate worksheets.

In addition, the directory hits/ stores fasta files of the specific blast hits.

summary.tsv

The summary.tsv output file generated by staramr contains the following columns:

  • Isolate ID: The id of the isolate/genome file(s) passed to staramr.
  • Quality Module: The isolate/genome file(s) pass/fail result(s) for the quality metrics
  • Genotype: The AMR genotype of the isolate.
  • Predicted Phenotype: The predicted AMR phenotype (drug resistances) for the isolate.
  • CGE Predicted Phenotype: The CGE-predicted AMR phenotype (drug resistances) for the isolate.
  • Plasmid: Plasmid types that were found for the isolate.
  • Scheme: The MLST scheme used
  • Sequence Type: The sequence type that's assigned when combining all allele types
  • Genome Length: The isolate/genome file(s) genome length(s)
  • N50 value: The isolate/genome file(s) N50 value(s)
  • Number of Contigs Greater Than Or Equal To 300 bp: The number of contigs greater or equal to 300 base pair in the isolate/genome file(s)
  • Quality Module Feedback: The isolate/genome file(s) detailed feedback for the quality metrics

Example

Isolate ID Quality Module Genotype Predicted Phenotype CGE Predicted Phenotype Plasmid Scheme Sequence Type Genome Length N50 value Number of Contigs Greater Than Or Equal To 300 bp Quality Module Feedback
SRR1952908 Passed aadA1, aadA2, blaTEM-57, cmlA1, gyrA (S83Y), sul3, tet(A) streptomycin, ampicillin, chloramphenicol, ciprofloxacin I/R, nalidixic acid, sulfisoxazole, tetracycline Spectinomycin, Streptomycin, Amoxicillin, Ampicillin, Cephalothin, Piperacillin, Ticarcillin, Chloramphenicol, Nalidixic acid, Ciprofloxacin, Sulfamethoxazole, Doxycycline, Tetracycline ColpVC, IncFIB(S), IncFII(S), IncI1-I(Alpha) senterica_achtman_2 11 4785500 250423 41
SRR1952926 Passed blaTEM-57, gyrA (S83Y), tet(A) ampicillin, ciprofloxacin I/R, nalidixic acid, tetracycline Amoxicillin, Ampicillin, Cephalothin, Piperacillin, Ticarcillin, Nalidixic acid, Ciprofloxacin, Doxycycline, Tetracycline ColpVC, IncFIB(S), IncFII(S), IncI1-I(Alpha) senterica_achtman_2 11 4785451 228311 40

detailed_summary.tsv

The detailed_summary.tsv output file generated by staramr contains the following columns:

  • Isolate ID: The id of the isolate/genome file(s) passed to staramr.
  • Data: The particular gene detected from ResFinder, PlasmidFinder, PointFinder, or the sequence type.
  • Data Type: The type of gene (Resistance or Plasmid), or MLST.
  • Predicted Phenotype: The predicted AMR phenotype (drug resistances) found in ResFinder/PointFinder. Plasmids will be left blank by default.
  • CGE Predicted Phenotype: The CGE-predicted AMR phenotype (drug resistances) found in ResFinder/PointFinder. Plasmids will be left blank by default.
  • %Identity: The % identity of the top BLAST HSP to the gene.
  • %Overlap: THe % overlap of the top BLAST HSP to the gene (calculated as hsp length/total length * 100).
  • HSP Length/Total Length The top BLAST HSP length over the gene total length (nucleotides).
  • Contig: The contig id containing this gene.
  • Start: The start of the gene (will be greater than End if on minus strand).
  • End: The end of the gene.
  • Accession: The accession of the gene from either ResFinder or PlasmidFinder database.

Example

Isolate ID Data Data Type Predicted Phenotype CGE Predicted Phenotype %Identity %Overlap HSP Length/Total Length Contig Start End Accession
SRR1952908 ST11 (senterica_achtman_2) MLST
SRR1952908 ColpVC Plasmid 98.96 100.0 193/193 contig00038 1618 1426 JX133088
SRR1952908 aadA1 Resistance streptomycin Spectinomycin, Streptomycin 100.0 100.0 792/792 contig00030 5355 4564 JQ414041

resfinder.tsv

The resfinder.tsv output file generated by staramr contains the following columns:

  • Isolate ID: The id of the isolate/genome file(s) passed to staramr.
  • Gene: The particular AMR gene detected.
  • Predicted Phenotype: The predicted AMR phenotype (drug resistances) for this gene.
  • CGE Predicted Phenotype: The CGE-predicted AMR phenotype (drug resistances) for this gene.
  • %Identity: The % identity of the top BLAST HSP to the AMR gene.
  • %Overlap: THe % overlap of the top BLAST HSP to the AMR gene (calculated as hsp length/total length * 100).
  • HSP Length/Total Length The top BLAST HSP length over the AMR gene total length (nucleotides).
  • Contig: The contig id containing this AMR gene.
  • Start: The start of the AMR gene (will be greater than End if on minus strand).
  • End: The end of the AMR gene.
  • Accession: The accession of the AMR gene in the ResFinder database.
  • Sequence: The AMR Gene sequence
  • CGE Notes: Any CGE notes associated with the prediction.

Example

Isolate ID Gene Predicted Phenotype CGE Predicted Phenotype %Identity %Overlap HSP Length/Total Length Contig Start End Accession Sequence CGE Notes
SRR1952908 sul3 sulfisoxazole Sulfamethoxazole 100.00 100.00 792/792 contig00030 2091 2882 AJ459418 ATGA[...]
SRR1952908 tet(A) tetracycline Doxycycline, Tetracycline 99.92 97.80 1247/1275 contig00032 1476 2722 AF534183 ATGT[...]

pointfinder.tsv

The pointfinder.tsv output file generated by staramr contains the following columns:

  • Isolate ID: The id of the isolate/genome file(s) passed to staramr.
  • Gene: The particular AMR gene detected, with the point mutation within.
  • Predicted Phenotype: The predicted AMR phenotype (drug resistances) for this gene.
  • CGE Predicted Phenotype: The CGE-predicted AMR phenotype (drug resistances) for this gene.
  • Type: The type of this mutation from PointFinder (either codon or nucleotide).
  • Position: The position of the mutation. For codon type, the position is the codon number in the gene, for nucleotide type it is the nucleotide number.
  • Mutation: The particular mutation. For codon type lists the codon mutation, for nucleotide type lists the single nucleotide mutation.
  • %Identity: The % identity of the top BLAST HSP to the AMR gene.
  • %Overlap: The % overlap of the top BLAST HSP to the AMR gene (calculated as hsp length/total length * 100).
  • HSP Length/Total Length The top BLAST HSP length over the AMR gene total length (nucleotides).
  • Contig: The contig id containing this AMR gene.
  • Start: The start of the AMR gene (will be greater than End if on minus strand).
  • End: The end of the AMR gene.
  • Pointfinder Position: The Pointfinder-adjusted position, which may be off by one from the sequence position in the case of some indels.
  • CGE Notes: Any CGE notes associated with the prediction.
  • CGE Required Mutation: Any additional mutations that CGE predicts are required to confer the CGE predicted phenotype.
  • CGE Mechanism: The CGE-reported mechanism.
  • CGE PMID: The PMID ID associated with the CGE prediction.

Example

Isolate ID Gene Predicted Phenotype CGE Predicted Phenotype Type Position Mutation %Identity %Overlap HSP Length/Total Length Contig Start End Pointfinder Position CGE Notes CGE Required Mutation CGE Mechanism CGE PMID
SRR1952908 gyrA (S83Y) ciprofloxacin I/R, nalidixic acid Nalidixic acid,Ciprofloxacin codon 83 TCC -> TAC (S -> Y) 99.96 100.00 2637/2637 contig00008 22801 20165 S83Y Target modification 7492118,10471553
SRR1952926 gyrA (S83Y) ciprofloxacin I/R, nalidixic acid Nalidixic acid,Ciprofloxacin codon 83 TCC -> TAC (S -> Y) 99.96 100.00 2637/2637 contig00011 157768 160404 S83Y Target modification 7492118,10471553

plasmidfinder.tsv

The plasmidfinder.tsv output file generated by staramr contains the following columns:

  • Isolate ID: The id of the isolate/genome file(s) passed to staramr.
  • Plasmid: The particular plasmid type detected.
  • %Identity: The % identity of the top BLAST HSP to the plasmid type.
  • %Overlap: The % overlap of the top BLAST HSP to the plasmid type (calculated as hsp length/total length * 100).
  • HSP Length/Total Length The top BLAST HSP length over the plasmid type total length (nucleotides).
  • Contig: The contig id containing this plasmid type.
  • Start: The start of the plasmid type (will be greater than End if on minus strand).
  • End: The end of the plasmid type.
  • Accession: The accession of the plasmid type in the PlasmidFinder database.

Example

Isolate ID Plasmid %Identity %Overlap HSP Length/Total Length Contig Start End Accession
SRR1952908 ColpVC 98.96 100 193/193 contig00038 1618 1426 JX133088
SRR1952908 IncFIB(S) 98.91 100 643/643 contig00024 10302 9660 FN432031

mlst.tsv

The mlst.tsv output file generated by staramr contains the following columns:

  • Isolate ID: The id of the isolate/genome file(s) passed to staramr.
  • Scheme: The scheme that MLST has identified.
  • Sequence Type: The sequence type that's assigned when combining all allele types
  • Locus #: A particular locus in the specified MLST scheme.

Example

Isolate ID Scheme Sequence Type Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 Locus 6 Locus 7
SRR1952908 senterica_achtman_2 11 aroC(5) dnaN(2) hemD(3) hisD(7) purE(6) sucA(6) thrA(11)
SRR1952926 senterica_achtman_2 11 aroC(5) dnaN(2) hemD(3) hisD(7) purE(6) sucA(6) thrA(11)

settings.txt

The settings.txt file contains the particular settings used to run staramr.

  • command_line: The command line used to run staramr.
  • version: The version of staramr.
  • start_time,end_time,total_minutes: The start, end, and duration for running staramr.
  • resfinder_db_dir, pointfinder_db_dir, plasmidfinder_db_dir : The directory containing the ResFinder, PointFinder, and PlasmidFinder databases.
  • resfinder_db_url, pointfinder_db_url, plasmidfinder_db_url: The URL to the git repository for the ResFinder, PointFinder, and PlasmidFinder databases.
  • resfinder_db_commit, pointfinder_db_commit, plasmidfinder_db_commit: The git commit ids for the ResFinder, PointFinder, and PlasmidFinder databases.
  • resfinder_db_date, pointfinder_db_date, plasmidfinder_db_date: The date of the git commits of the ResFinder, PointFinder, and PlasmidFinder databases.
  • mlst_version: The version of MLST.
  • pointfinder_gene_drug_version, resfinder_gene_drug_version: A version identifier for the gene/drug mapping table used by staramr.

Example

Settings Output Example

hits/

The hits/ directory contains the BLAST HSP nucleotides for the entries listed in the resfinder.tsv and pointfinder.tsv files. There are up to two files per input genome, one for ResFinder and one for PointFinder.

For example, with an input genome named SRR1952908.fasta there would be two files hits/resfinder_SRR1952908.fasta and hits/pointfinder_SRR1952908.fasta. These files contain mostly the same information as in the resfinder.tsv, pointfinder.tsv, and plasmidfinder.tsv files. Additional information is the database_gene_start and database_gene_end listing the start/end of the BLAST HSP on the AMR resistance gene from the ResFinder/PointFinder/PlasmidFinder databases.

Example

>aadA1_3_JQ414041 isolate: SRR1952908, contig: contig00030, contig_start: 5355, contig_end: 4564, database_gene_start: 1, database_gene_end: 792, hsp/length: 792/792, pid: 100.00%, plength: 100.00%
ATGAGGGAAGCGGTGATCGCCGAAGTATCGACTCAACTATCAGAGGTAGTTGGCGTCATC
GAGCGCCATCTCGAACCGACGTTGCTGGCCGTACATTTGTACGGCTCCGCAGTGGATGGC
...

Tutorial

A tutorial guiding you though the usage of staramr, interpreting the results, and comparing with antimicrobial resistances available on NCBI can be found at staramr tutorial.

Usage

Main Command

Main staramr command. Can be used to set global options (primarily --verbose).

Main Command

Search

Searches input FASTA files for AMR genes.

Search Command

Database Build

Downloads and builds the ResFinder, PointFinder, and PlasmidFinder databases.

Database Build Command

Database Update

Updates an existing download of the ResFinder, PointFinder, and PlasmidFinder databases.

Database Update Command

Database Info

Prints information about an existing build of the ResFinder/PointFinder/PlasmidFinder databases.

Database Info Command

Database Restore Default

Restores the default database for staramr.

Database Restore Default Command

Caveats

This software is still a work-in-progress. In particular, not all organisms stored in the PointFinder database are supported (only salmonella, campylobacter are currently supported). Additionally, the predicted phenotypes are for microbiological resistance and not clinical resistance. Phenotype/drug resistance predictions are an experimental feature which is continually being improved.

staramr only works on assembled genomes and not directly on reads. A quick genome assembler you could use is Shovill. Or, you may also wish to try out the ResFinder webservice, or the command-line tools rgi or ariba which will work on sequence reads as well as genome assemblies. You may also wish to check out the CARD webservice.

Acknowledgements

Some ideas for the software were derived from the ResFinder, PointFinder, and PlasmidFinder command-line software, as well as from ABRicate and from SISTR (Salmonella In Silico Typing Resource) command-line tool .

Phenotype/drug resistance predictions are provided with support from the NARMS/CIPARS Molecular Working Group.

The Multi-locus sequence typing program is from the MLST Github.

Citations

If you find staramr useful, please cite the following paper:

Bharat A, Petkau A, Avery BP, Chen JC, Folster JP, Carson CA, Kearney A, Nadon C, Mabon P, Thiessen J, Alexander DC, Allen V, El Bailey S, Bekal S, German GJ, Haldane D, Hoang L, Chui L, Minion J, Zahariadis G, Domselaar GV, Reid-Smith RJ, Mulvey MR. Correlation between Phenotypic and In Silico Detection of Antimicrobial Resistance in Salmonella enterica in Canada Using Staramr. Microorganisms. 2022; 10(2):292. https://doi.org/10.3390/microorganisms10020292

You may also consider citing the following (databases or other resources used by staramr):

Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, Aarestrup FM, Larsen MV. 2012. Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 67:2640โ€“2644. doi: 10.1093/jac/dks261

Zankari E, Allesรธe R, Joensen KG, Cavaco LM, Lund O, Aarestrup F. PointFinder: a novel web tool for WGS-based detection of antimicrobial resistance associated with chromosomal point mutations in bacterial pathogens. J Antimicrob Chemother. 2017; 72(10): 2764โ€“8. doi: 10.1093/jac/dkx217

Carattoli A, Zankari E, Garcia-Fernandez A, Voldby Larsen M, Lund O, Villa L, Aarestrup FM, Hasman H. PlasmidFinder and pMLST: in silico detection and typing of plasmids. Antimicrob. Agents Chemother. 2014. April 28th. doi: 10.1128/AAC.02412-14

Seemann T, MLST Github https://github.com/tseemann/mlst

Jolley KA, Bray JE and Maiden MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications [version 1; peer review: 2 approved]. Wellcome Open Res 2018, 3:124. doi: 10.12688/wellcomeopenres.14826.1

Legal

Copyright 2018 Government of Canada

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

staramr's People

Contributors

apetkau avatar emarinier avatar jeffreythiessen avatar jennifertran avatar mjram0s avatar peterk87 avatar richarddavidsonthegreat avatar takadonet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

staramr's Issues

Report genomes lacking any AMR genes by default

Invert the behaviour of --include-negatives to include negative matches to the ResFinder/PointFinder databases in the final report by default. Add an option --exclude-negatives instead.

Fix up formatting in detailed_summary.tsv

I noticed that the formatting in detailed_summary.tsv looks like:

Isolate ID Gene %Identity %Overlap Start End
A gyrA (S83F) 99.92399999999999 100.0 2361282.0 2358646.0

That is, the %Identity is not being rounded to 2 decimal places, while the Start and End are printed as float (they should be int).

Incorporation of plasmids in final report

Incorporation of plasmids in final summary.tsv. That is summary.tsv should look like:

Isolate ID Genotype Predicted Phenotype Plasmid Genes
SRR1952908 aadA1, blaTEM-57 streptomycin IncX1, IncFIB(S)

Auto-fit width of column in Excel spreadsheet

The Excel spreadsheet looks great! The freezing of panes is nice and so is having all the parameters used to run staramr under its own sheet.

I would recommend one enhancement: auto-fitting the column widths to the contents of each column for readability (and since users will likely do it themselves anyway).

So after you create a worksheet in the workbook, iterate through each column in the df and adjust the corresponding column's width in the worksheet:

df.to_excel(writer, sheet_name=fixed_sheetname, **pd_to_excel_kwargs)
worksheet = writer.book.get_worksheet_by_name(fixed_sheetname)
for i, width in enumerate(get_col_widths(df, index=args.write_index)):
    worksheet.set_column(i, i, width)

where get_col_widths is:

def get_col_widths(df, index=False):
    """Calculate column widths based on column headers and contents"""
    if index:
        idx_max = max([len(str(s)) for s in df.index.values] + [len(str(df.index.name))])
        yield idx_max
    for c in df.columns:
        # get max length of column contents and length of column header
        yield np.max([df[c].astype(str).str.len().max(), len(c)])

It's not perfect (assumes uniform character length), but might save users some time.

Error message returned non-zero exit status 1

I have never had problems when using staramr, but suddenly I am getting this error message:

2018-06-29 11:45:58,535 ERROR: Command '['makeblastdb', '-in', '/tmp/tmp426ql20e/input-genomes/Patient_2_A.fasta', '-dbtype', 'nucl', '-parse_seqids']' returned non-zero exit status 1
What may I do?

Unexpected crashes on some FASTA files

Hello, having issues with the version 0.2.1. One of my fasta files crashes staramr at the parse results stage. The input assembly file can be located here

(mob_suite) kirill@Discovery20:~/Desktop$ staramr --verbose search   --nprocs 2 --pid-threshold 98.0 --percent-length-overlap-resfinder 60.0 --percent-length-overlap-pointfinder 95.0  --output-summary dataset_588.dat --output-resfinder dataset_589.dat --output-settings dataset_590.dat --output-excel dataset_591.dat.xlsx --output-hits-dir staramr_hits  "N18.fasta"
2018-08-10 10:29:01,343 INFO Search.run,292: No --pointfinder-organism specified. Will not search the PointFinder databases
2018-08-10 10:29:01,343 INFO Search.run,322: --output-dir not set. Files will be output to the respective --output-[type] setting
2018-08-10 10:29:01,344 DEBUG Search.run,337: Found --output-hits-dir [staramr_hits] and is a directory. Will write hits here
2018-08-10 10:29:01,429 DEBUG BlastHandler.run_blasts,90: Resfinder Databases: ['colistin', 'tetracycline', 'quinolone', 'fusidicacid', 'glycopeptide', 'rifampicin', 'trimethoprim', 'beta-lactam', 'aminoglycoside', 'oxazolidinone', 'macrolide', 'phenicol', 'fosfomycin', 'sulphonamide', 'nitroimidazole']
2018-08-10 10:29:01,430 INFO BlastHandler._make_db_from_input_files,108: Making BLAST databases for input files
2018-08-10 10:29:01,430 DEBUG BlastHandler._make_db_from_input_files,114: Creating symlink from [N18.fasta] to [/var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta]
2018-08-10 10:29:01,431 DEBUG BlastHandler._make_blast_db,200: makeblastdb -in /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -dbtype nucl -parse_seqids
2018-08-10 10:29:01,659 DEBUG BlastHandler.run_blasts,99: Done making blast databases for input files
2018-08-10 10:29:01,660 INFO BlastHandler.run_blasts,102: Scheduling blast for N18.fasta
2018-08-10 10:29:01,663 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.colistin.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/colistin.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:01,670 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.tetracycline.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/tetracycline.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:01,960 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.quinolone.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/quinolone.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,129 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.fusidicacid.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/fusidicacid.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,154 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.glycopeptide.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/glycopeptide.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,170 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.rifampicin.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/rifampicin.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,212 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.trimethoprim.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/trimethoprim.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,259 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.beta-lactam.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/beta-lactam.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,290 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.aminoglycoside.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/aminoglycoside.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,505 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.oxazolidinone.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/oxazolidinone.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,573 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.macrolide.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/macrolide.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,730 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.phenicol.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/phenicol.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,818 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.fosfomycin.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/fosfomycin.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,853 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.sulphonamide.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/sulphonamide.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,889 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.nitroimidazole.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/nitroimidazole.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,950 DEBUG BlastResultsParser.parse_results,58: /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.aminoglycoside.resfinder.blast.xml
2018-08-10 10:29:03,372 DEBUG BlastResultsParser.parse_results,58: /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.beta-lactam.resfinder.blast.xml
2018-08-10 10:29:03,405 DEBUG ResfinderHitHSP.__init__,25: record=qseqid                                 blaTEM-108_1_AF506748
sseqid                                                     4
pident                                                99.414
length                                                   853
qstart                                                     9
qend                                                     861
sstart                                                 39632
send                                                   40484
slen                                                   83930
qlen                                                     861
sstrand                                                 plus
sseq       TCAACATTTTCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGC...
qseq       TCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGC...
plength                                              99.0708
Name: 108, dtype: object
2018-08-10 10:29:03,425 ERROR staramr.<module>,75: expected string or bytes-like object
Traceback (most recent call last):
  File "/Users/kirill/miniconda/envs/mob_suite/bin/staramr", line 68, in <module>
    args.run_command(args)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/subcommand/Search.py", line 356, in run
    files=args.files)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/subcommand/Search.py", line 216, in _generate_results
    plength_threshold_pointfinder, report_all_blast)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/detection/AMRDetection.py", line 65, in run_amr_detection
    plength_threshold_resfinder, report_all)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/detection/AMRDetectionResistance.py", line 36, in _create_resfinder_dataframe
    return resfinder_parser.parse_results()
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/BlastResultsParser.py", line 61, in parse_results
    self._handle_blast_hit(file, database_name, blast_out, results, hit_seq_records)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/BlastResultsParser.py", line 93, in _handle_blast_hit
    partitions.append(self._create_hit(in_file, database_name, blast_record))
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/BlastHitPartitions.py", line 38, in append
    partition = self._get_existing_partition(hit)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/BlastHitPartitions.py", line 56, in _get_existing_partition
    partition_name = hit.get_genome_contig_id()
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/AMRHitHSP.py", line 101, in get_genome_contig_id
    re_search = re.search(r'^(\S+)', self._blast_record['sseqid'])
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/re.py", line 182, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

Check that input files exist

When trying to run staramr with non-existent input:

$ staramr search nofile
2018-05-07 14:00:39,855 INFO: Scheduling blast for nofile
Traceback (most recent call last):
  File "../staramr/venv/bin/staramr", line 11, in <module>
    load_entry_point('staramr', 'console_scripts', 'staramr')()
  File "../staramr/staramr/main.py", line 70, in main
    args.run_command(args)
  File "../staramr/staramr/subcommand/Search.py", line 170, in run
    args.plength_threshold_pointfinder, args.report_all_blast)
  File "../staramr/staramr/detection/AMRDetection.py", line 63, in run_amr_detection
    resfinder_blast_map = self._amr_detection_handler.get_resfinder_outputs()
  File "../staramr/staramr/blast/BlastHandler.py", line 119, in get_resfinder_outputs
    future_blast.result()
  File "../miniconda3/lib/python3.6/concurrent/futures/_base.py", line 398, in result
    return self.__get_result()
  File "../miniconda3/lib/python3.6/concurrent/futures/_base.py", line 357, in __get_result
    raise self._exception
  File "../miniconda3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "../staramr/staramr/blast/BlastHandler.py", line 139, in _launch_blast
    stdout, stderr = blastn_command()
  File "../staramr/venv/lib/python3.6/site-packages/Bio/Application/__init__.py", line 523, in __call__
    stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'blastn -out /tmp/tmpb6o46_2w/nofile.blast.xml -outfmt 5 -query nofile -db ../staramr/staramr/databases/data/dist/resfinder/aminoglycoside.fsa -evalue 0.001', message 'Command line argument error: Argument "query". File is not accessible:  `nofile\''

Specify which versions of BLAST staramr will work with

I should specify (in docs, possible check in software) which versions of BLAST staramr will work with. I suspect older versions of BLAST have a slightly different output format. I should do a bit of testing to determine the minimum BLAST version required.

Add contribution guide

Currently, right now we don't have have a predefined standard for how Staramr is maintained or how someone can contribute to this project. I propose implementing a contribution guide that contains some of the following:

  1. How pull requests are made
  2. Staramr project workflow ex (we use travis for our builds)
  3. How developers can access to the debugging tools ex --verbose, logging, etc
  4. How to make sure units tests and linting scripts pass locally before submitting

staramr in Galaxy error

Hi,
I installed staramr in Galaxy on a native Ubuntu, and got an error trying after using a Shovill-assembled configs (in a multi-fasta file). I was wondering what I did wrong and would appreciate any suggestion. Thank you!

Fatal error: Exit code 1 () 2018-06-10 14:51:10,922 INFO: No --pointfinder-organism specified. Will not search the PointFinder databases 2018-06-10 14:51:10,922 INFO: --output-dir not set. Files will be output to the respective --output-[type] setting 2018-06-10 14:51:10,931 INFO: Making BLAST databases for input files 2018-06-10 14:51:10,939 ERROR: Command '['makeblastdb', '-in', '/home/phemarajata/galaxy/database/tmp/tmpuuzos1zh/input-genomes/Shovill on data 11 and data 10: Contigs.fasta', '-dbtype', 'nucl', '-parse_seqids']' returned non-zero exit status 1. Traceback (most recent call last): File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/bin/staramr", line 68, in args.run_command(args) File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/site-packages/staramr/subcommand/Search.py", line 356, in run files=args.files) File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/site-packages/staramr/subcommand/Search.py", line 216, in _generate_results plength_threshold_pointfinder, report_all_blast) File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/site-packages/staramr/detection/AMRDetection.py", line 61, in run_amr_detection self._amr_detection_handler.run_blasts(files) File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/site-packages/staramr/blast/BlastHandler.py", line 96, in run_blasts db_files = self._make_db_from_input_files(self._input_genomes_tmp_dir, files) File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/site-packages/staramr/blast/BlastHandler.py", line 120, in _make_db_from_input_files future_blastdb.result() File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/concurrent/futures/thread.py", line 56, in run result = self.fn(*self.args, **self.kwargs) File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/site-packages/staramr/blast/BlastHandler.py", line 196, in _make_blast_db subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).check_returncode() File "/home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/lib/python3.6/subprocess.py", line 369, in check_returncode self.stderr) subprocess.CalledProcessError: Command '['makeblastdb', '-in', '/home/phemarajata/galaxy/database/tmp/tmpuuzos1zh/input-genomes/Shovill on data 11 and data 10: Contigs.fasta', '-dbtype', 'nucl', '-parse_seqids']' returned non-zero exit status 1.

Fix handling of spaces in Galaxy tool

Spaces in Galaxy dataset name will currently cause staramr to fail. This should be fixed so staramr in Galaxy can handle spaces in the dataset name.

Related to issue #18

Revert ResFinder/PointFinder default to previous release versions

Revert ResFinder/PointFinder default databases to previous release versions for this next release. That is, back to versions found in staramr 0.4.0.

My reason for this is our mapping between the AMR gene and drug resistance is only completed for these ResFinder/PointFinder database versions.

This would likely involve disabling support for enterococcus faecalis (#35) since I don't believe this is available in the earlier PointFinder database.

Support will be re-added in a later release.

pointfinder db results DataFrame is None; AttributeError: 'NoneType' object has no attribute 'to_csv'

The following error when trying to run staramr development branch against a couple genomes:

$ staramr search -o out SRR19529*.fasta
2018-05-14 12:42:27,227 INFO: Scheduling blast for SRR1952908.fasta
2018-05-14 12:42:27,261 INFO: Scheduling blast for SRR1952926.fasta
2018-05-14 12:42:31,589 INFO: Finished. Took 0.07 minutes.
2018-05-14 12:42:31,591 ERROR: 'NoneType' object has no attribute 'to_csv'
Traceback (most recent call last):
  File "../staramr/bin/staramr", line 68, in <module>
    args.run_command(args)
  File "../staramr/staramr/subcommand/Search.py", line 197, in run
    self._print_dataframe_to_text_file_handle(amr_detection.get_pointfinder_results(), fh)
  File "../staramr/staramr/subcommand/Search.py", line 108, in _print_dataframe_to_text_file_handle
    dataframe.to_csv(file_handle, sep="\t", float_format="%0.2f", na_rep=self.BLANK)
AttributeError: 'NoneType' object has no attribute 'to_csv'

It doesn't seem like the pointfinder db is being searched (--verbose shows only resfinder results being parsed).

Here's the db info:

$ staramr --verbose db info
resfinder_db_dir              = ../staramr/staramr/databases/data/dist/resfinder
resfinder_db_url              = https://bitbucket.org/genomicepidemiology/resfinder_db.git
resfinder_db_commit           = dc33e2f9ec2c420f99f77c5c33ae3faa79c999f2
resfinder_db_date             = Tue, 20 Mar 2018 16:49
pointfinder_db_dir            = ../staramr/staramr/databases/data/dist/pointfinder
pointfinder_db_url            = https://bitbucket.org/genomicepidemiology/pointfinder_db.git
pointfinder_db_commit         = ba65c4d175decdc841a0bef9f9be1c1589c0070a
pointfinder_db_date           = Fri, 06 Apr 2018 09:02
pointfinder_gene_drug_version = 111317
resfinder_gene_drug_version   = 041318

Doing a fresh staramr db build after clearing out the existing db doesn't seem to help.

This affects #12 as well.

Let me know if you need any other info!

Determine best approach to integrating typing information in staramr

There are some different approaches to integrate typing information into the staramr reports.

1. Run outside of staramr

In this case we can run programs for MLST/organism identification outside of staramr (e.g,. in a Galaxy workflow) and integrate this information into the staramr report in Galaxy.

2. Run within staramr

We can have staramr run MLST or organism identification (e.g,. Mash) internally, and directly integrate into a report.

Add MLST results to Detailed_Summary

Rename Genes/Plasmids to Genome Matches. Incorporate the MLST data to the detailed summary using the following format: Where the highlighted text should be in the form of ST19 (senterica)

image

Update pandas read_table to read_csv

When running tests, the following warning is displayed: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.. Remove all read_table calls and replace with read_csv.

Add support for multiple types of biological data files

It would be nice if staramr could support multiple types of input files (such as Genbank) and also compressed versions of each of this files (e.g., gzipped fasta). As an example, see the description of input for Abricate.

Conversion between different formats can likely use BioPython's SeqIO functionality.

Detection of file formats should also not depend on the extension (e.g., .fasta for fasta, .gz for gzipped) since this tool is integrated into Galaxy, which internally names all input files as .dat. Ideally, the file contents should be used to detect the type of file passed to staramr instead of the extension.

Rename "Plasmid Genes" column

It looks like the PlasmidFinder web server lists the plasmid incompatibility factors under a column named Plasmid:

image

We should maybe look at renaming the column Plasmid Genes in staramr to Plasmid to avoid confusion (or is there a better name?)

image

Results not matching CGE ResFinder web-interface output

I did a quick comparison between the staramr output and the output from the CGE's ResFinder web interface. For the most part things matched, but I did find one issue. On the CGE site, I got a hit to aadA1 at bps 22917-23708 with 99.75%. With staramr, ant(3'')-Ia_X02340 was identified in the same region, but extending an extra 176 bps (22917-23884 bp), with 99.49% identity. There appears to something different about the way starmar and CGE are "choosing" their top hit for a given open reading frame. Can anyone give me any insight into this? Percent identity threshold was set at 90% for both tools and I made sure that the same ResFinder database (2019-01-29) was used for both.

Look into optimizing the running of MLST

We are currently using --threads in the mlst program, but this may not be the fastest way. Look at running separate mlst instances instead of using the --threads parameter.

Update documentation for staramr to include plasmidfinder

Update the documentation in the README.md to include plasmidfinder. That is, we should add:

  • - An example of the PlasmidFinder results table and the Detailed_Summary table.
  • - A description of the columns in these tables.
  • - Update the tutorial to include desciptions of detecting plasmids.
  • - Update the usage docs to reflect new command-line parameters for staramr.

Output to individual files is raising an exception (development)

When running with output to single files --output-summary summary.tsv I get the following exception:

2019-04-12 11:00:20,475 ERROR: 'Namespace' object has no attribute 'output_detailed_summary'
Traceback (most recent call last):
  File "/home/CSCScience.ca/apetkau/workspace/staramr/bin/staramr", line 68, in <module>
    args.run_command(args)
  File "/home/CSCScience.ca/apetkau/workspace/staramr/staramr/subcommand/Search.py", line 375, in run
    output_detailed_summary = args.output_detailed_summary
AttributeError: 'Namespace' object has no attribute 'output_detailed_summary'

I think an argument for --output-detailed-summary is missing.

Add color logging to Staramr

There's currently a library called colored logs which is a python library that output colors in the terminal. I think this would be really helpful especially for debugging code in the project.

Example:

image

Predicted Phenotypes included even when using --exclude-resistance-phenotypes (in development)

It looks like the command-line option --exclude-resistance-phenotypes is not quite working out. This should have the behaviour of excluding the Predicted Phenotype columns, but they are still present for the Summary and Detailed_Summary results.

image

This command-line option works by selecting whether or not we are using the AMRDetectionSummary.py or AMRDetectionSummaryResistance.py classes (in

if include_resistances:
).

To fix, you may need to shift some of the code in the AMRDetectionSummary.py class which adds the Predicted Phenotype column down to the subclass AMRDetectionSummaryResistance.py.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.