staph-b / staphb_toolkit Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 18.0 7.27 MB

Python Toolkit for using containerized programs in either Singularity or Docker

Home Page: https://staph-b.github.io/staphb_toolkit/

License: GNU General Public License v3.0

Python 99.84% Shell 0.16%

staphb_toolkit's People

Contributors

Stargazers

Watchers

Forkers

stjacqrm unlhcc abigailshockey kevhill vikash84 carlosanayaphm fanninpm molecules kittychasenc rpetit3 garfinjm leuthrasp ljones1359 dboysci tives82 drb-s smithandrewk ybdong919

staphb_toolkit's Issues

Cecret need to update the default nextclade containers

I'm just putting this here so there's a record of it.

The prior nextclade containers have been moved, and the replacements require some tweaking.

path must not end with a /

When users provide a path in a workflow that ends with a / character nextflow throws an error indicating that it cannot end with a /. The workflow script should check paths and remove finishing / characters to aid users.

Error when trying to run monroe workflow

I get the following error when trying to run the monroe clearlabs workflow. I am running on an HPC with a SLURM scheduler (submitting job via SBATCH script).

Command in sbatch script:
staphb-wf monroe clearlabs_assembly fastqs/ --profile singularity -c 21-07-12_clearlabs_assembly.config

I edited the config file to use local containers. However, I get this same error when just using the default config file with singularity.

Traceback (most recent call last):
  File "/home/schmedess/anaconda3/bin/staphb-wf", line 10, in <module>
    sys.exit(main())
  File "/home/schmedess/anaconda3/lib/python3.7/site-packages/staphb_toolkit/toolkit_workflows.py", line 336, in main
    child.interact()
  File "/home/schmedess/anaconda3/lib/python3.7/site-packages/pexpect/pty_spawn.py", line 788, in interact
    mode = tty.tcgetattr(self.STDIN_FILENO)

Error when running 'Cecret' for single reads

There is an error that was sent to me on twitter that I need to fix.

Turns out that someone actually wanted to run this with single reads. This isn't unreasonable.

The command used :

staphb-wb cecret --reads_type single single --output test --profile singularity

The error:

gzip: seqyclean/104485_R1_001.fastq.gz_clean*fastq: No such file or directory

Which is true, 'cl*n' should be used instead of 'clean'. 🤦‍♀️

And, in all actuality, I should be using -gz with seqyclean.

In addition to this error, I now realize that there's another error parsing filenames that don't match '_S[0-9]+_L[0-9]+'. These are not unreasonable names (like 104485_R1_001.fastq.gz), so I should actually fix this issue.

Cutshaw Spades Memory

In the Cutshaw workflow, cutshaw.nf keeps hitting a snag when I run it on my machine, and it has to do with the ${task.memory} variable, it returns "4 GB", and SPAdes can't seem to handle the "GB" part. It thinks there's a new variable that we haven't assigned:

Error executing process > 'spades (sample1)'

Caused by:
Process spades (sample1) terminated with an error exit status (1)

Command executed:

spades.py --memory 4 GB -1 sample1_S12_L001_R1_001.fastq.gz -2 sample1_S12_L001_R2_001.fastq.gz -o ./spades_out
mv ./spades_out/contigs.fasta sample1_S12_L001_contigs.fasta

Command exit status:
1

Command output:

== Error == Please specify option (e.g. -1, -2, -s, etc) for the following paths: GB

I just hard coded it to "4" in the nextflow script in the toolkit on my computer, but there's probably a better fix somewhere, I just didn't have time to figure it out. Let me know if this is confusing, and I'll clarify! Thanks!

Are there plans to move workflows into their own repos?

Hi everyone,

The toolkit is great! I'm just curious if there were plans to move the workflows to their own repo.

My thinking is the staphb_toolkit could import them, or in the case of Nextflow-based ones just run from the repo directly. I also think it might be easier to submit contributions.

Thanks!
Robert

using python setup.py install breaks package manifest

Using the python setup.py install doesn't install the package files included in packaging/MANIFEST.in in a conda environment. This has been noted to occur in other python packages. The solution is to redesign the installer to use package_data within the setup.py. This should fix the issue.

Monroe Report Tool FASTA header characters

The Monroe Cluster Report Tool presents an error when the fasta header includes "/". The script should account for all legal fasta headers characters. Currently the following error is reported:

  processing file: report_template.Rmd
  Quitting from lines 160-164 (report_template.Rmd) 
  Error in `[.data.frame`(snp_mat, c(mpt.ord), c(mpt.ord)) : 
    undefined columns selected

Dryad AR report gene coverage and identity appear to be swapped

Container downloads

Hi,
When a user runs a pipeline, where do the underlying containers get saved?
Thanks.

No input files were found!

Hi,

I tried to run the cecret workflow using staphb-tk and it said no fastq file detected.

(base) [zl7w2@lewis4-r730-dtn1-node879 fastq]$ staphb-tk cecret  -c cecret.config
NOTE: Nextflow is not tested with Java 1.8.0_362 -- It's recommended the use of version 11 up to 18

Checking UPHL-BioNGS/Cecret ...
UPHL-BioNGS/Cecret contains uncommitted changes -- cannot pull from repository
NOTE: Nextflow is not tested with Java 1.8.0_362 -- It's recommended the use of version 11 up to 18

N E X T F L O W  ~  version 22.04.5
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [9a73277c2c]
Launching `https://github.com/UPHL-BioNGS/Cecret` [serene_knuth] DSL2 - revision: d191242b07 [master]
Currently using the Cecret workflow for use with amplicon Illumina library prep on MiSeq with a corresponding reference genome.

Author: Erin Young
email: [email protected]
Version: v.3.5.20230201

The maximum number of CPUS used in this workflow is 8
Using the subworkflow for SARS-CoV-2
The files and directory for results is cecret
A table summarizing results will be created: cecret/cecret_results.csv

[-        ] process > fasta_prep                     -
[-        ] process > cecret:seqyclean               -
[-        ] process > cecret:bwa                     -
[-        ] process > cecret:sort                    -
[-        ] process > cecret:ivar_trim               -
[-        ] process > cecret:ivar                    -
[-        ] process > cecret:filter                  -
[-        ] process > qc:fastqc                      -
[-        ] process > qc:kraken2                     -
[-        ] process > qc:samtools_flagstat           -
[-        ] process > qc:samtools_depth              -
[-        ] process > qc:samtools_coverage           -
[-        ] process > qc:samtools_stats              -
[-        ] process > qc:samtools_intial_stats       -
[-        ] process > qc:samtools_ampliconstats      -
[-        ] process > qc:samtools_plot_ampliconstats -
[-        ] process > qc:bcftools_variants           -
[-        ] process > qc:ivar_variants               -
[-        ] process > qc:bedtools_multicov           -
[-        ] process > sarscov2:vadr                  -
[-        ] process > sarscov2:pangolin              -
[-        ] process > sarscov2:nextclade             -
A bedfile for primers is required. Set with 'params.primer_bed'.
No fastq or fastq.gz files were found at /storage/htc/joshilab/zl7w2/staphb-test/fastq/reads or /storage/htc/joshilab/zl7w2/staphb-test/fastq/single_reads
No reference genome was selected. Set with 'params.reference_genome'
FATAL : No input files were found!
No paired-end fastq files were found at /storage/htc/joshilab/zl7w2/staphb-test/fastq/reads. Set 'params.reads' to directory with paired-end reads
No single-end fastq files were found at /storage/htc/joshilab/zl7w2/staphb-test/fastq/single_reads. Set 'params.single_reads' to directory with single-end reads
No fasta files were found at /storage/htc/joshilab/zl7w2/staphb-test/fastq/fastas. Set 'params.fastas' to directory with fastas.
No multifasta files were found at /storage/htc/joshilab/zl7w2/staphb-test/fastq/multifastas. Set 'params.multifastas' to directory with multifastas.
No sample sheet was fount at . Set 'params.sample_sheet' to sample sheet file.
No such variable: ch_dataset

 -- Check script '/home/zl7w2/assets/UPHL-BioNGS/Cecret/main.nf' at line: 363 or see '.nextflow.log' file for more details

I have checked the folder /storage/htc/joshilab/zl7w2/staphb-test/fastq/reads and the fastq.gz files are there. I dont think it's the permission issue becasue I try to run nextflow run UPHL-BioNGS/Cecret with the same permission 664 and it works.

(base) [zl7w2@lewis4-r730-dtn1-node879 reads]$ ls -lhtr
total 3.9G
-rw-rw-r--. 1 zl7w2 joshilab-group 26M Mar 28 23:11 591870-MO-M08605-230310_S35_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 29M Mar 28 23:11 591870-MO-M08605-230310_S35_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 52M Mar 28 23:11 591887-MO-M08605-230310_S36_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 58M Mar 28 23:11 591887-MO-M08605-230310_S36_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 42M Mar 28 23:11 591932-MO-M08605-230310_S37_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 45M Mar 28 23:11 591932-MO-M08605-230310_S37_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 45M Mar 28 23:11 591959-MO-M08605-230310_S38_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 48M Mar 28 23:11 591959-MO-M08605-230310_S38_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 40M Mar 28 23:11 591974-MO-M08605-230310_S39_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 45M Mar 28 23:11 591974-MO-M08605-230310_S39_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 46M Mar 28 23:11 592059-MO-M08605-230310_S40_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 52M Mar 28 23:11 592059-MO-M08605-230310_S40_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 33M Mar 28 23:11 592063-MO-M08605-230310_S41_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 35M Mar 28 23:11 592063-MO-M08605-230310_S41_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 53M Mar 28 23:11 592068-MO-M08605-230310_S42_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 56M Mar 28 23:11 592068-MO-M08605-230310_S42_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 44M Mar 28 23:11 592075-MO-M08605-230310_S43_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 46M Mar 28 23:11 592075-MO-M08605-230310_S43_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 52M Mar 28 23:11 592098-MO-M08605-230310_S44_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 54M Mar 28 23:11 592098-MO-M08605-230310_S44_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 42M Mar 28 23:11 597268-MO-M08605-230310_S34_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 45M Mar 28 23:11 597268-MO-M08605-230310_S34_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 48M Mar 28 23:11 797606-MO-M08605-230310_S1_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 52M Mar 28 23:11 797606-MO-M08605-230310_S1_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 26M Mar 28 23:11 797608-MO-M08605-230310_S2_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 28M Mar 28 23:11 797608-MO-M08605-230310_S2_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 34M Mar 28 23:11 797618-MO-M08605-230310_S45_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 37M Mar 28 23:11 797618-MO-M08605-230310_S45_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 46M Mar 28 23:11 797621-MO-M08605-230310_S3_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 52M Mar 28 23:11 797621-MO-M08605-230310_S3_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 54M Mar 28 23:11 797622-MO-M08605-230310_S4_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 56M Mar 28 23:11 797622-MO-M08605-230310_S4_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 42M Mar 28 23:11 797623-MO-M08605-230310_S5_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 45M Mar 28 23:11 797623-MO-M08605-230310_S5_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 34M Mar 28 23:11 797624-MO-M08605-230310_S6_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 37M Mar 28 23:11 797624-MO-M08605-230310_S6_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 51M Mar 28 23:11 797625-MO-M08605-230310_S7_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 54M Mar 28 23:11 797625-MO-M08605-230310_S7_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 50M Mar 28 23:11 797626-MO-M08605-230310_S8_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 52M Mar 28 23:11 797626-MO-M08605-230310_S8_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 32M Mar 28 23:11 797627-MO-M08605-230310_S9_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 35M Mar 28 23:11 797627-MO-M08605-230310_S9_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 50M Mar 28 23:11 797628-MO-M08605-230310_S10_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 56M Mar 28 23:11 797628-MO-M08605-230310_S10_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 42M Mar 28 23:11 797629-MO-M08605-230310_S11_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 46M Mar 28 23:11 797629-MO-M08605-230310_S11_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 52M Mar 28 23:11 797630-MO-M08605-230310_S12_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 54M Mar 28 23:11 797630-MO-M08605-230310_S12_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 43M Mar 28 23:11 797632-MO-M08605-230310_S13_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 47M Mar 28 23:11 797632-MO-M08605-230310_S13_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 51M Mar 28 23:11 797633-MO-M08605-230310_S14_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 54M Mar 28 23:11 797633-MO-M08605-230310_S14_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 45M Mar 28 23:11 797634-MO-M08605-230310_S15_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 47M Mar 28 23:11 797634-MO-M08605-230310_S15_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 46M Mar 28 23:11 797635-MO-M08605-230310_S16_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 49M Mar 28 23:11 797635-MO-M08605-230310_S16_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 20M Mar 28 23:11 797636-MO-M08605-230310_S17_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 23M Mar 28 23:11 797636-MO-M08605-230310_S17_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 46M Mar 28 23:11 797637-MO-M08605-230310_S18_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 49M Mar 28 23:11 797637-MO-M08605-230310_S18_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 42M Mar 28 23:11 797639-MO-M08605-230310_S19_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 47M Mar 28 23:11 797639-MO-M08605-230310_S19_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 54M Mar 28 23:11 797640-MO-M08605-230310_S20_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 56M Mar 28 23:11 797640-MO-M08605-230310_S20_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 23M Mar 28 23:11 797642-MO-M08605-230310_S21_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 25M Mar 28 23:11 797642-MO-M08605-230310_S21_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 53M Mar 28 23:11 797643-MO-M08605-230310_S22_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 56M Mar 28 23:11 797643-MO-M08605-230310_S22_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 46M Mar 28 23:11 797644-MO-M08605-230310_S23_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 50M Mar 28 23:11 797644-MO-M08605-230310_S23_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 41M Mar 28 23:11 797645-MO-M08605-230310_S24_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 44M Mar 28 23:11 797645-MO-M08605-230310_S24_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 50M Mar 28 23:11 797646-MO-M08605-230310_S25_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 53M Mar 28 23:11 797646-MO-M08605-230310_S25_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 41M Mar 28 23:11 797647-MO-M08605-230310_S26_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 44M Mar 28 23:11 797647-MO-M08605-230310_S26_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 47M Mar 28 23:11 797649-MO-M08605-230310_S27_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 49M Mar 28 23:11 797649-MO-M08605-230310_S27_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 48M Mar 28 23:11 797651-MO-M08605-230310_S28_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 51M Mar 28 23:11 797651-MO-M08605-230310_S28_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 37M Mar 28 23:11 797652-MO-M08605-230310_S29_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 40M Mar 28 23:11 797652-MO-M08605-230310_S29_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 48M Mar 28 23:11 797653-MO-M08605-230310_S30_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 51M Mar 28 23:11 797653-MO-M08605-230310_S30_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 46M Mar 28 23:11 797654-MO-M08605-230310_S31_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 49M Mar 28 23:11 797654-MO-M08605-230310_S31_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 53M Mar 28 23:11 797655-MO-M08605-230310_S32_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 60M Mar 28 23:11 797655-MO-M08605-230310_S32_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 18M Mar 28 23:11 NTC1-MO-M08605-230310_S33_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 14M Mar 28 23:11 NTC1-MO-M08605-230310_S33_L001_R2_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 12M Mar 28 23:11 NTC2-MO-M08605-230310_S46_L001_R1_001.fastq.gz
-rw-rw-r--. 1 zl7w2 joshilab-group 11M Mar 28 23:11 NTC2-MO-M08605-230310_S46_L001_R2_001.fastq.gz

CFSAN fails due to indexing error while running Dryad

Error message:

Command error:
Traceback (most recent call last):
File ".command.sh", line 12, in
fwd_read = glob.glob(sid+"_1.clean.fastq.gz")[0]
IndexError: list index out of range

execution of certain containers do not complete with python v3.7.9 and python v3.8

I use staphb-tk tools in my various pipelines. When running a basic read trimming/filtering, assembly, and annotation pipeline, my runs will execute but never finish and just time out when using python v3.7.9 and python v3.8.6 (using anaconda3). It starts staphb-tk prokka and then gets hung up and never finishes. Works just fine with python v3.7.4 (when that is the default python version with anaconda. Does not work with v3.7.4 when in a separate env when v3.8 is the default anaconda version). (This was tested on same dataset on an HPC - using singularity).

monroe workflows are not compatible with Pangolin 3.0+

When running monroe pe_assembly (or clearlabs_assembly, and probably the others) with pangolin 3.0+ monroe doesn't recognize the new headers (https://cov-lineages.org/resources/pangolin/output.html) when summarizing the assembly results, causing the worflow to exit with an error status.

Jul-14 09:43:49.703 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'assembly_results'

Caused by:
  Process `assembly_results` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env python3
  import os, sys
  import glob, csv
  import xml.etree.ElementTree as ET
  from datetime import datetime

  today = datetime.today()
  today = today.strftime("%m%d%y")

  class result_values:
      def __init__(self,id):
          self.id = id
          self.aligned_bases = "NA"
          self.percent_cvg = "NA"
          self.mean_depth = "NA"
          self.mean_base_q = "NA"
          self.mean_map_q = "NA"
          self.monroe_qc = "NA"
          self.pangolin_lineage = "NA"
          self.pangolin_probability = "NA"
          self.pangolin_notes = "NA"


  #get list of result files
  samtools_results = glob.glob("*_samtoolscoverage.tsv")
  pangolin_results = glob.glob("*_lineage_report.csv")
  results = {}
 # collect samtools results
  for file in samtools_results:
      id = file.split("_samtoolscoverage.tsv")[0]
      result = result_values(id)
      monroe_qc = []
      with open(file,'r') as tsv_file:
          tsv_reader = list(csv.DictReader(tsv_file, delimiter="        "))
          for line in tsv_reader:
              result.aligned_bases = line["covbases"]
              result.percent_cvg = line["coverage"]
              if float(line["coverage"]) < 98:
                  monroe_qc.append("coverage <98%")
              result.mean_depth = line["meandepth"]
              result.mean_base_q = line["meanbaseq"]
              if float(line["meanbaseq"]) < 30:
                  monroe_qc.append("meanbaseq < 30")
              result.mean_map_q = line["meanmapq"]
              if float(line["meanmapq"]) < 30:
                  monroe_qc.append("meanmapq < 30")
          if len(monroe_qc) == 0:
              result.monroe_qc = "PASS"
          else:
              result.monroe_qc ="WARNING: " + '; '.join(monroe_qc)

      file = (id + "_lineage_report.csv")
      with open(file,'r') as csv_file:
          csv_reader = list(csv.DictReader(csv_file, delimiter=","))
          for line in csv_reader:
              if line["status"] == "fail":
                  result.pangolin_lineage = "failed pangolin qc"
              else:
                  result.pangolin_lineage = line["lineage"]
                  result.pangolin_probability = line["probability"]
                  result.pangolin_notes = line["note"]

      results[id] = result


  #create output file
  with open(f"monroe_summary_{today}.csv",'w') as csvout:
      writer = csv.writer(csvout,delimiter=',')
      writer.writerow(["sample","aligned_bases","percent_cvg", "mean_depth", "mean_base_q", "mean_map_q", "monroe_qc", "pangolin_lineage", "pangolin_probability", "pangolin_notes"])
      for id in results:
          result = results[id]
          writer.writerow([result.id,result.aligned_bases,result.percent_cvg,result.mean_depth,result.mean_base_q,result.mean_map_q,result.monroe_qc,result.pangolin_lineage,result.pangolin_probability,result.pangolin_notes])

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File ".command.sh", line 61, in <module>
      result.pangolin_probability = line["probability"]
  KeyError: 'probability'

Work dir:
  /panfs/roc/groups/7/mdh/shared/sc2/GXB03107_210713_1/work/d8/59532a94f07b7895a579edb71219af

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Jul-14 09:43:49.709 [main] DEBUG nextflow.Session - Session await > all process finished
Jul-14 09:43:49.712 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `assembly_results` terminated with an error exit status (1)

Error when running Dryad pipeline

User reported the following error with the dryad pipeline:

Aug-17 15:00:11.294 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'multiqc (1)'

Caused by:
  Process `multiqc (1)` terminated with an error exit status (1)

Command executed:

  multiqc . >/dev/null 2>&1

Command exit status:
  1

Command output:
  (empty)

Command wrapper:
  ln: failed to create hard link ‘dryad_logo_250.png’ => ‘/hpc/apps/miniconda3/4.9.2/envs/staphb-tk-1.3.3/lib/python3.6/site-packages/staphb_toolkit/workflows/dryad/assets/dryad_logo_250.png’: Invalid cross-device link

Please advise.

Possible report generation hang

I am trying to generate a report for 18 sequences that were already processed by the pe_assembly step. Things seem to process without error and there is a great deal of resource utilization until the render step. At at the document generation step (pandoc) of render the process appears to hang. There is no cpu usage and there is no data being written to the output folder, or as far as I can tell (docker shell on running container) the directories within the Docker container.

Does this seem normal, should this process take a long time?

Code version:
master (today, 4/6/2021)

Run command:
./staphb-wf monroe cluster_analysis test -o results

Output:
Starting the Monroe cluster analysis:
N E X T F L O W ~ version 20.10.0
Launching /home/cody/staphb_toolkit/staphb_toolkit/workflows/monroe/monroe_cluster_analysis.nf [berserk_hamilton] - revision: cc45af1307
executor > local (4)
[45/a770f6] process > msa [100%] 1 of 1 ✔
[cc/316b0f] process > snp_matrix [100%] 1 of 1 ✔
[e1/c32ea7] process > iqtree [100%] 1 of 1 ✔
[5c/2eddec] process > render (1) [ 0%] 0 of 1

Docker container: staphb/cluster-report-env:1.0
Docker container log tail:

output file: report_template.knit.md

/usr/bin/pandoc +RTS -K512m -RTS report_template.utf8.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output report.tex --self-contained --highlight-style tango --latex-engine xelatex --variable graphics --include-in-header /tmp/Rtmp7TIwqp/rmarkdown-str1e12d54db9.html --variable 'geometry:margin=1in'

contents of Docker working directory:
drwxrwxr-x 3 1000 1000 4096 Apr 6 21:39 .
drwxrwxr-x 3 1000 1000 4096 Apr 6 21:39 ..
-rw-rw-r-- 1 1000 1000 0 Apr 6 21:39 .command.begin
-rw-rw-r-- 1 1000 1000 410 Apr 6 21:39 .command.err
-rw-rw-r-- 1 1000 1000 4361 Apr 6 21:39 .command.log
-rw-rw-r-- 1 1000 1000 3951 Apr 6 21:39 .command.out
-rw-rw-r-- 1 1000 1000 9873 Apr 6 21:39 .command.run
-rw-rw-r-- 1 1000 1000 357 Apr 6 21:39 .command.sh
-rw-r--r-- 1 root root 0 Apr 6 21:39 .command.trace
lrwxrwxrwx 1 1000 1000 93 Apr 6 21:39 040621_msa.tree -> /home/cody/staphb_toolkit/results/logs/work/e1/c32ea70e9ba082ebd6222617717087/040621_msa.tree
-rw-r--r-- 1 root root 35153 Apr 6 21:39 ML_tree.png
-rw-r--r-- 1 root root 254832 Apr 6 21:39 SNP_heatmap.png
lrwxrwxrwx 1 1000 1000 110 Apr 6 21:39 pairwise_snp_distance_matrix.tsv -> /home/cody/staphb_toolkit/results/logs/work/cc/316b0f487721e338e15b2b3cbc7bdd/pairwise_snp_distance_matrix.tsv
lrwxrwxrwx 1 1000 1000 75 Apr 6 21:39 report.Rmd -> /home/cody/staphb_toolkit/staphb_toolkit/workflows/monroe/report/report.Rmd
-rw-r--r-- 1 root root 6952 Apr 6 21:39 report.log
-rw-r--r-- 1 root root 4988 Apr 6 21:39 report.tex
drwxr-xr-x 3 root root 4096 Apr 6 21:39 report_files
-rw-r--r-- 1 root root 8022 Apr 6 21:39 report_template.Rmd
-rw-r--r-- 1 root root 3053 Apr 6 21:39 report_template.knit.md
-rw-r--r-- 1 root root 3053 Apr 6 21:39 report_template.utf8.md
-rw-r--r-- 1 root root 1165 Apr 6 21:39 snp_distance_matrix.tsv

'staphb-tk sra-toolkit fasterq-dump' requiering sudo

Installation error

Someone contacted me on twitter with an error when they went to run staph-wf

I'm putting this here

Dr. Young, I tried to install staphb, but it kept giving me error like this. Could you advise what is wrong? Thanks.
-JXXXXXX:

~/Downloads/staphb_toolkit$ staphb-wf
Traceback (most recent call last):
  File "/home/mdxhrd/Downloads/staphb_toolkit/staphb-wf", line 3, in <module>
    import staphb_toolkit.toolkit_workflows as tk_work
  File "/home/mdxhrd/Downloads/staphb_toolkit/staphb_toolkit/toolkit_workflows.py", line 11, in <module>
    import pexpect
  File "/usr/lib/python3/dist-packages/pexpect/__init__.py", line 75, in <module>
    from .pty_spawn import spawn, spawnu
  File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 15, in <module>
    from .spawnbase import SpawnBase
  File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 218
    def expect(self, pattern, timeout=-1, searchwindowsize=-1, async=False):
                                                                   ^
SyntaxError: invalid syntax

They cloned the repository from github, and then I don't really know what happened. I haven't been able to replicate what happened. They use ubuntu. This might be a python2 vs python3 issue.

When I figure it out and how to fix it, I'll update this issue with some suggestions on more information to put in the readme.

How to run toolkit without internet access

I'm trying to use this tool in a high-performance computing environment without internet access. I keep getting this error:
"Error: Cannot connect to GitHub to get app inventory."
How can I avoid this?

Pangolin, samtools and assembly results containers not included in ont_user_config file

@kevinlibuit The containers/processes recently added to the monroe ont pipeline are not included in the ont_user_config.config file and need to be added

dryad memory issues when running in a MacOS host environment

The cleanreads step of the workflow results in an Error: 137 out of memory. This could be related to how memory usage is supplied to the bbduk command.

add mafft

staphb docker container exists

Changing to ArticV4 Primer Scheme

Hello Kelsey and Erin,
Our SARS-CoV-2 NGS group recently transitioned to the ArticV4 Primers. I have created text files from the ARTIC group's github page, I was wondering how I go about implementing the bed files, both the primer and insert, into the Cecret workflow. The files are just text files created by copying and pasting the contents from the ARTIC group's nCoV-2019 primer scheme pagee into new notepad documents, and saving as a text file. Do i just save the files on our server in the >root directory and change our cecret config file? I found the location of the current primer and insert bed files (/root/anaconda3/lib/python3.8/site-packages/staphb_toolkit/workflows/cecret/configs/artic_V3_nCoV-2019.bed and /root/anaconda3/lib/python3.8/site-packages/staphb_toolkit/workflows/cecret/configs/nCoV-2019.insert.bed) on our Linux server I have access to through the Marshfield Clinic, but being less than a computer science novice, I do not want to disturb anything that would affect the many other labs using the same workflow, so will hold off on making any changes that would potentially affect the workflow everyone else is using. Thanks.

Adam Bissonnette

[email protected] or [email protected]

Erroe while runing the Cecret workflow

I am trying to run Cecret wokflow using the command
staphb-wf cecret /paired-end-reads --output /result --profile docker
I am having the following issue

Error executing process > 'fastqc (1)'

Caused by:
Process fastqc (1) terminated with an error exit status (1)

Command executed:

mkdir -p fastqc logs/fastqc
log_file=logs/fastqc/1.14d0e3e2-9b7d-49e8-9061-485eef68cef0.log
err_file=logs/fastqc/1.14d0e3e2-9b7d-49e8-9061-485eef68cef0.err

time stamp + capturing tool versions

date | tee -a $log_file $err_file > /dev/null
fastqc --version >> $log_file

fastqc --outdir fastqc --threads 1 1_S1_L001_R1.fastq.gz 1_S1_L001_R2.fastq.gz 2>> $err_file >> $log_file

zipped_fastq=($(ls fastqc/*fastqc.zip) "")

raw_1=$(unzip -p ${zipped_fastq[0]} */fastqc_data.txt | grep "Total Sequences" | awk '{ print $3 }' )
raw_2=NA
if [ -f "${zipped_fastq[1]}" ] ; then raw_2=$(unzip -p fastqc/*fastqc.zip */fastqc_data.txt | grep "Total Sequences" | awk '{ print $3 }' ) ; fi

if [ -z "$raw_1" ] ; then raw_1="0" ; fi
if [ -z "$raw_2" ] ; then raw_2="0" ; fi

Command exit status:
1

Command output:
(empty)

Command error:
touch: cannot touch '.command.trace': Permission denied

Can you please guide me to fix this issue?

Update NCBI-AMRFINDER-Plus

Update to 3.8 for enhanced point mutation detection

update docker config for nanoplot

staphb_toolkit/core/docker_config.json

Line 247 in 9e1b3fa

"tag": "1.27.0-cv1",

Just a heads up, I'm going to change the dockerhub tag for this to staphb/nanoplot:1.27.0 removing the -cv1 portion because I don't want anyone to use the first container version for nanoplot 1.27.0. Now on container version 2. Had to fix it since it was missing a dependency.

Also, I'm going to push NanoPlot 1.29.0 to dockerhub once I run a quick test on it today. You could just change it to staphb/nanoplot:1.29.0

ivar not trimming primers in Monroe pipeline

I recently installed the staphb toolkit to run the monroe pipeline (staphb-wf monroe pe_assembly fastq -o output --primer V3). Looking at one of the .command.out files in the logs/work folder I see the stdout from ivar, and it prints a note about the required columns in the (primer) BED file, and reports that it trimmed primers from 0 reads. Normally ivar would report how many primers were found in the BED file. It appears that the primer bed file provided to ivar in the Monroe pipeline isn't formatted correctly or isn't being found. I haven't been able to figure out where the primer BED file ivar is using is coming from (/reference/ARTIC-V3.bed)

1016027 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
1500 + 0 supplementary
0 + 0 duplicates
1016027 + 0 mapped (100.00% : N/A)
1014527 + 0 paired in sequencing
507258 + 0 read1
507269 + 0 read2
1014150 + 0 properly paired (99.96% : N/A)
1014506 + 0 with itself and mate mapped
21 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
iVar uses the standard 6 column BED format as defined here - https://genome.ucsc.edu/FAQ/FAQformat.html#format1.
It requires the following columns delimited by a tab: chrom, chromStart, chromEnd, name, score, strand

Number of references in file: 1
MN908947.3
Using Region: MN908947.3

Found 1016027 mapped reads
Found 0 unmapped reads
Sorted By Coordinate
-------
Processed 10% reads ...
Processed 20% reads ...
Processed 30% reads ...
Processed 40% reads ...
Processed 50% reads ...
Processed 60% reads ...
Processed 70% reads ...
Processed 80% reads ...
Processed 90% reads ...
Processed 100% reads ...

-------
Results:
Primer Name	Read Count

Trimmed primers from 0% (0) of reads.
0.02% (165) of reads were quality trimmed below the minimum length of 30 bp and were not writen to file.
99.98% (1015862) of reads started outside of primer regions. Since the -e flag was given, these reads were written to file.

report.Rmd substitutes X in column names

Workflows require .fastq.gz and should also look for .fq.gz

We should update any workflows using paired data to use both .fastq.gz and .fq.gz extensions.

Monroe results "crash" bc pangolin output columns changed

#create output file
with open(f"monroe_summary_{today}.csv",'w') as csvout:
writer = csv.writer(csvout,delimiter=',')
writer.writerow(["sample","aligned_bases","percent_cvg", "mean_depth", "mean_base_q", "mean_map_q", "monroe_qc", "pangolin_lineage", "pangolin_probability", "pangolin_notes"])
for id in results:
result = results[id]
writer.writerow([result.id,result.aligned_bases,result.percent_cvg,result.mean_depth,result.mean_base_q,result.mean_map_q,result.monroe_qc,result.pangolin_lineage,result.pangolin_probability,result.pangolin_notes])

Command exit status:
1

Command output:
(empty)

Pangolin probability no longer exists

Dryad Median and Average Coverage Calculator

Error calculating median and average coverage needing to be corrected (related to channels and processes)

typo in monroe docs

https://github.com/StaPH-B/staphb_toolkit/blob/master/docs/workflow_docs/monroe.md#data-workflow

$ staphbe-wf monroe <monroe_pipeline> [options]

Isn't it staphb-wf ...?

staphb-tk/staphb-wf --update command not finding toolkit updates

Recreated issue by running:

$ pip install staphb_toolkit==1.0.0
Collecting staphb_toolkit==1.0.0
  Using cached https://files.pythonhosted.org/packages/41/d9/57cd5ad70eb50645c0af02ff45cada352b3577c6b22653f7d15d196aa39d/staphb_toolkit-1.0.0-py3-none-any.whl
Requirement already satisfied: psutil>=5.6.3 in ./anaconda3/lib/python3.7/site-packages (from staphb_toolkit==1.0.0) (5.6.3)
Requirement already satisfied: spython>=0.0.73 in ./anaconda3/lib/python3.7/site-packages (from staphb_toolkit==1.0.0) (0.0.76)
Requirement already satisfied: docker>=4.1.0 in ./anaconda3/lib/python3.7/site-packages (from staphb_toolkit==1.0.0) (4.2.0)
Requirement already satisfied: semver>=2.8.0 in ./anaconda3/lib/python3.7/site-packages (from spython>=0.0.73->staphb_toolkit==1.0.0) (2.9.1)
Requirement already satisfied: six>=1.4.0 in ./anaconda3/lib/python3.7/site-packages (from docker>=4.1.0->staphb_toolkit==1.0.0) (1.12.0)
Requirement already satisfied: requests!=2.18.0,>=2.14.2 in ./anaconda3/lib/python3.7/site-packages (from docker>=4.1.0->staphb_toolkit==1.0.0) (2.22.0)
Requirement already satisfied: websocket-client>=0.32.0 in ./anaconda3/lib/python3.7/site-packages (from docker>=4.1.0->staphb_toolkit==1.0.0) (0.57.0)
Requirement already satisfied: idna<2.9,>=2.5 in ./anaconda3/lib/python3.7/site-packages (from requests!=2.18.0,>=2.14.2->docker>=4.1.0->staphb_toolkit==1.0.0) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in ./anaconda3/lib/python3.7/site-packages (from requests!=2.18.0,>=2.14.2->docker>=4.1.0->staphb_toolkit==1.0.0) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in ./anaconda3/lib/python3.7/site-packages (from requests!=2.18.0,>=2.14.2->docker>=4.1.0->staphb_toolkit==1.0.0) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in ./anaconda3/lib/python3.7/site-packages (from requests!=2.18.0,>=2.14.2->docker>=4.1.0->staphb_toolkit==1.0.0) (1.24.2)
Installing collected packages: staphb-toolkit
Successfully installed staphb-toolkit-1.0.0
$ staphb-tk --update
Checking for updates...
Done.
$ pip show staphb_toolkit
Name: staphb-toolkit
Version: 1.0.0

Tredegar assumes R[1,2] designation for forward/reverse reads

Hard coded in lines 375 and 378 of tredegar.nf

Allow demultipled reads in dirs not named "barcode*" for the monroe ont_assembly workflow

Currently the monroe ont_assembly workflow only works with demultiplexed reads that are stored in folders named barcode*. For our lab at least, it would be helpful if the subdirectories of params.fastq_dir were allowed to have any name, so we can have our normal sample ID attached to the intermedaite and output files rather than the name of the barcode we put on the sequencer.

This change could cause issues if labs don't have their demultiplexed read dirs isolated in their own directory, but I don't see that being the case for most labs.

I'll do a PR with a potential solution later tonight or tomrrow.

Workflow resource allocation

Currently some workflow processes use cpus=grep -c ^processor /proc/cpuinfo to determine the amount of cpus to use for a tool. This breaks down in certain cases because the resources available to a process often exceed what is described by process directives in nextflow.

The resources directives in nextflow need to be kept inline with what the tool is requiring so a method of mirroring these needs to be established.

Use of local singularity images

The toolkit currently grabs docker images from docker hub and converts them to singularity images. There should be an option to use local singularity images.

Dryad error: Missing `fromPath` parameter

Running into this error when running Dryad main with default Docker profile:

$ staphb-wf dryad main reads_dir -cg -o dryad_cg
Starting the Dryad pipeline:
N E X T F L O W  ~  version 20.01.0
Launching `/home/ubuntu/anaconda3/lib/python3.7/site-packages/staphb_toolkit/workflows/dryad/dryad.nf` [fabulous_liskov] - revision: 7e641c60e3
[-        ] process > preProcess        [  0%] 0 of 6
[-        ] process > fastqc            -
[-        ] process > trim              -
[-        ] process > cleanreads        -
[-        ] process > shovill           -
[-        ] process > mash              -
[-        ] process > quast             -
[-        ] process > prokka            -
[-        ] process > roary             -
[-        ] process > cg_tree           -
[-        ] process > amrfinder         -
[-        ] process > amrfinder_summary -
Missing `fromPath` parameter

Using custom .bed files

Hello, I'm a COVID related volunteer for the Colorado Dept of Public Health. I have lots of experience with data pipelines, python, docker, etc, but am new to sequencing, nextflow, etc.

I've been asked to help use some custom .bed files to use some different primers in any of the ARTIC versions. I think this is possible, but would involve some minor changes to staphb. I think they would be generally useful, so I wanted to give an overview of my plans, ask for suggestions, and then see if you agree that this would be a valuable pr.

So, specifically the lab is using the monroe pipeline, which depends on ivar, which gets those .bed files from the /reference folder in the ivar docker image. I can't even find where in the ivar repo they generate that /reference folder, so I'm guessing it is pretty deep in one of their dependencies. Updating all of that seems like a headache.

However, since staphb is using nextflow to run the dockerized steps, I think I should be able to use netflow's docker config options to solve the issue. Specifically, to use the runOptions parameter to mount a new /references at runtime. So the new command would end up being something like:

staphb-wf ... --runOptions "-v /path/to/custom/bed:/references" ... --primers custom_primer_name

So, I think that should be pretty straight forward, and only involve changing the toolkit_workflows.py file to add the relevant parsers and command builders.

I think this would be valuable for more than just the monroe pipeline, so I'd be happy to add it to all the subparsers if people agree this would be useful.

staph-b / staphb_toolkit Goto Github PK

staphb_toolkit's People

Contributors

Stargazers

Watchers

Forkers

staphb_toolkit's Issues

time stamp + capturing tool versions

Recommend Projects

Recommend Topics

Recommend Org