Giter Club home page Giter Club logo

jhuapl-bio / taxtriage Goto Github PK

View Code? Open in Web Editor NEW
15.0 5.0 4.0 86.87 MB

TaxTriage is a Nextflow workflow designed to agnostically identify and classify microbial organisms within short- or long-read metagenomic NGS data. This flexible tool was developed with various use-cases of mNGS in mind.

License: Other

HTML 0.60% Python 39.10% Nextflow 36.55% Groovy 7.74% Shell 15.55% CSS 0.03% Awk 0.44%
classification metagenomics multiqc taxonomy triage

taxtriage's People

Contributors

hannahgooden avatar jdratcliff avatar merritt-brian avatar raplayer avatar tasapl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

taxtriage's Issues

Breadth (X) of Coverage Column no longer works with aggregate assembly alignment

Description of the bug

During Confidence metrics, previously, we call depth on alignment file(s) then indicate how many positions in the chromosome(s) or contigs have an x of coverage. This was broken when pushing the new update for aggregate assemblies.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Aggregate Multi Contig/Accession/Chromosome Stats per Sample & Taxa

Description of feature

Currently, the final confidences are reported per alignment stat for all references for a given sample and taxa. We need to aggregate and generate stats (like breadth of coverage, depth, etc.) for a given taxa, including stats across all reference from alignment

There are log of warning and channel.view() lines that get printed to the screen while running.

Description of the bug

Kelly asked me to try this workflow on some metagenomic samples that we recently sequenced. The test ran okay, but it seems there are several *.view() statements and other irregularities that get printed to the screen.

Also, I'm unsure what usage to use. I currently have something running, but I thought I would share one of the warning messages that gets printed to the screen.

Command used and terminal output

WARN: Access to undefined parameter `skip_assembly` -- Initialise it to a default value eg. `params.skip_assembly = some_value`

Relevant files

No response

System information

No response

Adding Submodule - Gene prediction and annotation for mapped reads

Description of feature

Currently, Kraken2 reports a relatively large number of positives that are false for certain bacteria in messy sample types like stool. Consider adding GFF's into the mix for ensuring that CDS regions are being properly mapped and that there is enough breadth in coverage. Also consider filtering down to only important annotations such as AMR, virulence, pathogenicity etc.

Issue running TaxTriage in basestack

Description of the bug

Hello,

I am testing TaxTriage and running into an issue that I am having trouble resolving.

I have two samples, below is my sample sheet:

sample,single_end,fastq_1,fastq_2,barcode,from,trim,platform,sequencing_summary
004,FALSE,/home/centos/Desktop/TASS/004.k2dh_1.fastq.gz,/home/centos/Desktop/TASS/004.k2dh_2.fastq.gz,FALSE,,TRUE,ILLUMINA,
005,FALSE,/home/centos/Desktop/TASS/005.k2dh_1.fastq.gz,/home/centos/Desktop/TASS/005.k2dh_2.fastq.gz,FALSE,,TRUE,ILLUMINA,

It doesn't look like it's getting very far, seems like the issue might be related to Nextflow starting up. Log files and env are below, let me know if there is any other info I can provide to help troubleshoot.

Thanks,
Jake

Command used and terminal output

In Basestack:
**Inputs**
Samplesheet: /home/centos/Desktop/TASS/sample_sheet.csv
Input Samplesheet contents: **Populates correctly**
Maximum CPUs: 3
Lag Time: 5
Maximum Memory: 12
Classifier database: Minikraken2
Skip De Novo Assemly: Yes
Skip QC Plotting: Yes
Resume Nextflow: Yes
Assembly File Option: NCBI Refseq Assembly
Automatically Pull Latest Code from Git Hub: Yes
Filter Database: **No selection**
Run Directory: **No selection**
Samplesheet: **No selection**
Profile: Docker

Relevant files

clientError.log
docker.log
client.log
serverError.log
server.log

System information

Version: 20.10.17
Kernel: 3.10.0-1160.71.1.el7.x86_64
Driver: overlay2
Running Containers: 0
Data: /var/lib/docker/1001.1001
Socket: /var/run/docker.sock
MemAvailable (GB): 33.57
Memory
Total Mem (GB): 33.57
Using Mem (GB): 2.78
Available Mem (GB): 30.79
Processor
CPU Brand: Xeon® E5-2680 v4
Cores: 16
Physical Cores: 16
Manufacturer: Intel®
Virtualization Support: false
System
Kernel: 3.10.0-1160.71.1.el7.x86_64
Platform: linux
Distro: CentOS Linux
Release: 7
TaxTriage
Version: v1.1

Unmapped variable "ch_spades_hmm"

Description of the bug

I'm trying to run TaxTriage and have encountered a bug involving the channels passed to the SPades module. The TaxTriage workflow trying to run SPADES_ILLUMINA but this channel hasn't been assigned.

It looks like there is a config variable called spades_hmm that defaults to null. I don't need to run SPades with a custom hmm, if that's useful.

Thanks!

Command used and terminal output

nextflow run ./main.nf --input data/Samplesheet.csv --db data/databases/minikraken2_v2_8GB_201904_UPDATE --outdir tmp --remove_taxids "9606" --max_memory 8GB --max_cpus 3 -profile singularity -with-report --skip-assembly

N E X T F L O W  ~  version 21.10.6
Launching `./main.nf` [nasty_babbage] - revision: ac98c18eda

WARN: Found unexpected parameters:
* --max_time: 240.h
* --config_profile_name: null
* --config_profile_url: null
* --config_profile_contact: null
* --config_profile_description: null
* --custom_config_base: https://raw.githubusercontent.com/nf-core/configs/master
* --custom_config_version: master
* --enable_conda: false
* --show_hidden_params: false
* --validate_params: true
* --help: false
* --monochrome_logs: false
* --plaintext_email: false
* --email_on_fail: null
* --publish_dir_mode: copy
* --tracedir: tmp/pipeline_info
* --skip-assembly: true
* --max_cpus: 3
* --max_memory: 8GB
* --max_multiqc_email_size: 25.MB
* --igenomes_ignore: false
* --igenomes_base: s3://ngi-igenomes/igenomes
* --genome: null
* --skip_variants: null
* --skip_consensus: null
* --subsample: null
* --top_hits_counts: null
* --spades_hmm: null
* --low_memory: null
- Ignore this warning: params.schema_ignore_params = "max_time,config_profile_name,config_profile_url,config_profile_contact,config_profile_description,custom_config_base,custom_config_version,enable_conda,show_hidden_params,validate_params,help,monochrome_logs,plaintext_email,email_on_fail,publish_dir_mode,tracedir,skip-assembly,max_cpus,max_memory,max_multiqc_email_size,igenomes_ignore,igenomes_base,genome,skip_variants,skip_consensus,subsample,top_hits_counts,spades_hmm,low_memory"



------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/taxtriage v1.0dev
------------------------------------------------------
Core Nextflow options
  runName        : nasty_babbage
  containerEngine: singularity
  launchDir      : /panfs/jay/groups/32/mdh/kuner010/taxtriage
  workDir        : /panfs/jay/groups/32/mdh/kuner010/taxtriage/work
  projectDir     : /panfs/jay/groups/32/mdh/kuner010/taxtriage
  userName       : kuner010
  profile        : singularity
  configFiles    : /panfs/jay/groups/32/mdh/kuner010/taxtriage/nextflow.config

Input/output options
  input          : data/Samplesheet.csv
  db             : data/databases/minikraken2_v2_8GB_201904_UPDATE
  trim           : null
  outdir         : tmp
  remove_taxids  : 9606

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/taxtriage for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/taxtriage/blob/master/CITATIONS.md
------------------------------------------------------
WARN: Access to undefined parameter `reference` -- Initialise it to a default value eg. `params.reference = some_value`
WARN: Access to undefined parameter `top_hits_count` -- Initialise it to a default value eg. `params.top_hits_count = some_value`
Top hits not specified, defaulting to 10 per rank level in taxonomy tree for database for kraken2
No assembly file given, downloading the standard ncbi one
WARN: Access to undefined parameter `assembly_file_type` -- Initialise it to a default value eg. `params.assembly_file_type = some_value`
/panfs/jay/groups/32/mdh/kuner010/taxtriage/data/Samplesheet.csv
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:INPUT_CHECK:SAMPLESHEET_CHECK       -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:PYCOQC                              -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:GET_ASSEMBLIES                      -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:FASTQC                              -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:NANOPLOT                            -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:MOVE_NANOPLOT                       -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:KRAKEN2_KRAKEN2                     -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:REMOVETAXIDSCLASSIFICATION          -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:TOP_HITS                            -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:DOWNLOAD_ASSEMBLY                   -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:PULL_FASTA                          -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BWA_INDEX                 -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BWA_MEM                   -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:MINIMAP2_ALIGN            -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_MPILEUP_OXFORD   -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_MPILEUP_ILLUMINA -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_STATS            -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_CONSENSUS        -
empty
No such variable: ch_spades_hmm

 -- Check script './workflows/taxtriage.nf' at line: 318 or see '.nextflow.log' file for more details

Relevant files

No response

System information

N E X T F L O W ~ version 21.10.6
HPC
slurm eventually, but was running locally here
Singularity/apptainer
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
nf-core/taxtriage v1.0.0

Porechop causes workflow to fail if nothing trims from a sample

Description of the bug

If a sample doesn't contain any trimmable reads, the file remains the same and the output is not emitted. This makes the pipeline fail. Make the output optional if it fails so the workflow can continue by reverting the next step to use the original fastq file

Command used and terminal output

No response

Relevant files

No response

System information

No response

Conda Integrated

Description of feature

Currently, only Docker or Singularity are capable as profiles for launching the pipeline. Try to get conda up to date

bin/merge_assemblies_conf.py assumes file content, no QA check

Description of the bug

I ran a test (cmd below) where the outfile "BC05_flu.confidences.tsv" had the header but no second line content inside. So the python script failed with error:

Command error:
  Traceback (most recent call last):
    File "/my-local/.nextflow/assets/jhuapl-bio/taxtriage/bin/merge_assemblies_conf.py", line 108, in <module>
      main()
    File "/my-local/.nextflow/assets/jhuapl-bio/taxtriage/bin/merge_assemblies_conf.py", line 102, in main
      writer = csv.DictWriter(f, fieldnames=aggregated[0].keys(), delimiter='\t')
  IndexError: list index out of range

A test in this script prior to parsing it, or perhaps a QA test prior to it might be able to kill the wf reporting the error.

Command used and terminal output

nextflow run https://github.com/jhuapl-bio/taxtriage \
   --outdir tmp_viral \
   -resume \
   --input examples/Samplesheet.csv \
   --taxtab "default" \
   -r main \
   -latest \
   -profile local,singularity \
   --db /my-local/reference/kraken-databases/minusB

Relevant files

[21/6fb402] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMEN... [100%] 3 of 3
[- ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMEN... -
[- ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMEN... -
[5e/c55e0d] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMEN... [100%] 3 of 3 ✔
[a8/909432] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMEN... [100%] 4 of 4 ✔
[82/97500c] process > NFCORE_TAXTRIAGE:TAXTRIAGE:CONFIDEN... [100%] 4 of 4 ✔
[03/1f58a6] process > NFCORE_TAXTRIAGE:TAXTRIAGE:CONFIDEN... [100%] 4 of 4, failed: 1 ✘
[4a/13073e] process > NFCORE_TAXTRIAGE:TAXTRIAGE:CONVERT_... [100%] 3 of 3 ✔
[- ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:MERGE_CO... [ 0%] 0 of 1
[01/01eeeb] process > NFCORE_TAXTRIAGE:TAXTRIAGE:CUSTOM_D... [100%] 1 of 1 ✔
[- ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/taxtriage] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_TAXTRIAGE:TAXTRIAGE:CONFIDENCE_MERGE (BC05_flu)'

Caused by:
Process NFCORE_TAXTRIAGE:TAXTRIAGE:CONFIDENCE_MERGE (BC05_flu) terminated with an error exit status (1)

Command executed:

merge_assemblies_conf.py
-i BC05_flu.confidences.tsv
-o BC05_flu.fullconfidences.tsv

cat <<-END_VERSIONS > versions.yml
"NFCORE_TAXTRIAGE:TAXTRIAGE:CONFIDENCE_MERGE":
python3: $(python3 --version | sed 's/Python //g')
END_VERSIONS

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File "/my-local/.nextflow/assets/jhuapl-bio/taxtriage/bin/merge_assemblies_conf.py", line 108, in
main()
File "/my-local/.nextflow/assets/jhuapl-bio/taxtriage/bin/merge_assemblies_conf.py", line 102, in main
writer = csv.DictWriter(f, fieldnames=aggregated[0].keys(), delimiter='\t')
IndexError: list index out of range

System information

  N E X T F L O W
  version 23.10.0 build 5889
  created 15-10-2023 15:07 UTC (11:07 EDT)
  cite doi:10.1038/nbt.3820
  http://nextflow.io

local (no job scheduler)

singularity version 3.8.7-1.el7

Operating System: CentOS Stream 8
CPE OS Name: cpe:/o:centos:centos:8
Kernel: Linux 4.18.0-536.el8.x86_64
Architecture: x86-64

https://github.com/jhuapl-bio/taxtriage main branch version but also tested v1.2.2 tag with git checkout tags/v1.2.2

confidences.merged_mqc.tsv does not incorporate into the MultiQC

Description of the bug

Alignment results are displayed in the sample specific .tsv files in "convert" directory and subsequently are incorporated into the "confidences.merged_mqc.tsv" in the "merge" directory. The data is not being incorporated into the MultiQC report which only displays the explanations for headers in the table.

The pipeline was run by cloning of the repository main branch and running the command below.

See the screenshot of MultiQC report below:
image

Command used and terminal output

nextflow run main.nf -profile singularity,sge -work-dir /scicomp/scratch/ukm9/taxtriage -c /scicomp/reference/nextflow/configs/cdc.config --input data/Validation_data/0343_samples.csv --db /scicomp/groups/OID/NCEZID/DSR/BCFB/by-project/ukm9_TaxTriage/kraken2db/kraken2_MinusB --outdir  output/Validation_data_0343_MinusB --remove_taxids '"9606"' --demux --skip_assembly

Relevant files

No response

System information

Nextflow - version 22.10.06
Hardware - HPC
Executor - SGE
Container - Singularity
OS - CentOS

Remove the split VCF by contig

Description of feature

If you get VCF stats for scaffolds (usually 80+ accessions) this part becomes unwieldy. Remove it entirely for now as it is not necessary in general

Pipeline fails at Porechop step

Description of the bug

I am testing taxtriage on CLI with ONT metagenomics data and having issues with porechop. 4/5 samples were pre-processed (low complexity and quality filtering, adaptor clipping) using another pipeline and I left one sample unprocessed to test with taxtriage. The pipeline fails at the porechop step for the unprocessed sample (Sample65). The pipeline works fine when trim is set to "FALSE" for all samples.

Command used and terminal output

Nextflow command: 
 run jhuapl-bio/taxtriage --input samplesheet.csv --outdir taxtriage_out2 -profile docker --db /Volumes/IDGenomics_NAS/Data/kraken2_plus_pf16gb/k2_pluspf_16gb_20231009 -r main -latest --reference_assembly --remove_taxids \'9606\' -resume

Samplesheet:
sample,platform,fastq_1,fastq_2,sequencing_summary,trim
Sample14_UT-P2S01293-240126,OXFORD,taxprofiler/taxprofiler_out/analysis_ready_fastqs/Sample14_UT-P2S01293-240126_run1_filtered.fastq.gz,,,TRUE
Sample15_UT-P2S01293-240126,OXFORD,taxprofiler/taxprofiler_out/analysis_ready_fastqs/Sample15_UT-P2S01293-240126_run1_filtered.fastq.gz,,,TRUE
Sample59_UT-P2S01293-240126,OXFORD,taxprofiler/taxprofiler_out/analysis_ready_fastqs/Sample59_UT-P2S01293-240126_run1_filtered.fastq.gz,,,TRUE
Sample62_UT-P2S01293-240126,OXFORD,taxprofiler/taxprofiler_out/analysis_ready_fastqs/Sample62_UT-P2S01293-240126_run1_filtered.fastq.gz,,,TRUE
Sample65_UT-P2S01293-240126,OXFORD,combined/Sample65_UT-P2S01293-240126.fastq.gz,,,TRUE

Error:
Error executing process > 'NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)'

Caused by:
  Process `NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)` terminated with an error exit status (137)

Command executed:

  porechop \
      -i Sample65_UT-P2S01293-240126.fastq.gz \
      -t 12 \
       \
      -o Sample65_UT-P2S01293-240126.fastq.gz
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP":
      porechop: $( porechop --version )
  END_VERSIONS

Command exit status:
  137

Command output:
  
  [1m[4mLoading reads[0m
  Sample65_UT-P2S01293-240126.fastq.gz

Command error:
  
  [1m[4mLoading reads[0m
  Sample65_UT-P2S01293-240126.fastq.gz
  .command.sh: line 6:    29 Killed                  porechop -i Sample65_UT-P2S01293-240126.fastq.gz -t 12 -o Sample65_UT-P2S01293-240126.fastq.gz

Work dir:
  /Volumes/BioNGS_1/UT-P2S01293-240126/UT-P2S01293-240126/20240126_2026_P2S-01293-A_PAS21742_bded1596/work/ea/e089eaf142e00437c60686780fa8de

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Relevant files

No response

System information

Nextflow version (version 23.04.1)

Improvement: Speed of SPLIT_VCF

Description of feature

bin/split_vcf.py decompresses the singulary channel output vcf.gz file, then splits them on the kraken taxid in the accession columnd (second index of split on "|") into individual files. Python3 is too slow, consider changing to gawk

RE-implement BLAST refinement strategies

Description of feature

Previously, BLAST was functional at the end of the pipeline (post top hits/re-alignment) but computationally expensive. Consider automated subsampling by hit-type and diversity of sequences for tuning purposes on final reporting metrics.

Confidence Metrics fails for tsv files with mismatched column count

Description of the bug

If the confidences that is merged has a mismatch in columns, the table does not get populated

Due to the taxid not mapping in the merge portion

Command used and terminal output

No response

Relevant files

No response

System information

No response

top_hits.nf seems to have wrong number of arguments

Description of the bug

It seems like some recent changes to the top hits report generation may have introduced some sort of mis-specified array:

Here's my error:

RROR nextflow.extension.OperatorImpl - @unknown
groovy.lang.MissingMethodException: No signature of method: Script_861716d0$_runScript_closure1$_closure2$_closure22.call() is applicable for argument types: (ArrayList) values: [[[id:230029461_WB, single_end:false, platform:ILLUMINA, fastq_1:/home/mdh/shared/taxtriage/231005_test/samples/230029461.Illumina.kraken.dehosted.1.fastq.gz, ...], ...]]

The log file truncates it but grabbing it from the slurm output:

ERROR ~ Invalid method invocation call with arguments: [[id:230029461_WB, single_end:false, platform:ILLUMINA, fastq_1:/home/mdh/shared/taxtriage/231005_test/samples/230029461.Illumina.kraken.dehosted.1.fastq.gz, fastq_2:/home/mdh/shared/taxtriage/231005_test/samples/230029461.Illumina.kraken.dehosted.2.fastq.gz, trim:false, directory:false, sequencing_summary:null], /panfs/jay/groups/32/mdh/shared/taxtriage/231005_test/work/66/89eb4288f373a6ad78f6d9d8079efd/230029461_WB.top_report.tsv, [/panfs/jay/groups/32/mdh/shared/taxtriage/231005_test/work/a0/587a499c14235a2d369c0e418fca23/230029461_WB.classified_1.fastq.gz, /panfs/jay/groups/32/mdh/shared/taxtriage/231005_test/work/a0/587a499c14235a2d369c0e418fca23/230029461_WB.classified_2.fastq.gz], /panfs/jay/groups/32/mdh/shared/taxtriage/231005_test/work/4e/923481f65a775d1ffa8f53afbea9bb/230029461_WB.output.references.fasta] (java.util.ArrayList) on _closure22 type

I haven't had a chance to do much digging, but the addition of the $2 variable in the top_hits.nf module might be breaking things?

Command used and terminal output

nextflow run /home/mdh/shared/software_modules/taxtriage/1.2.0/main.nf -c /home/mdh/shared/software_modules/taxtriage/1.2.0/mdh.config --input samples//Samplesheet.csv --db /home/mdh/shared/software_modules/kraken/kraken2_databases/k2_standard_230605/ --outdir tt_out --email [email protected] --tmpdir /tmp --remove_taxids '"9606"' --max_memory 248GB --max_cpus 16 --skip_assembly FALSE --top_hits_count 50 --demux -profile singularity -with-report tt_out/tt_test_231005_report.html -with-dag ./work/tt_test_231005_taxtriage.html -resume

Relevant files

Here's the command.sh from the work directory where this breaks:

#!/bin/bash -euo pipefail
echo 230029460_WB "-----------------META variable------------------"
get_top_hits.py
-i "230029460_WB.filtered.report"
-o 230029460_WB.top_report.tsv
-t 50

awk -F '\t' -v id=230029460_WB
'BEGIN{OFS="\t"} { if (NR==1){ print "Sample_Taxid", $2, $1, $4, $6} else { $5 = id"_"$5; print $5, $2, $1, $4, $6 }}' 230029460_WB.top_report.tsv > 230029460_WB.krakenreport_mqc.tsv

cat <<-END_VERSIONS > versions.yml
"NFCORE_TAXTRIAGE:TAXTRIAGE:TOP_HITS":
python: $(python --version | sed 's/Python //g')
END_VERSIONS

nextflow.log

System information

Nextflow version 24.04.2
Hardware HPC, Desktop, Cloud
Executor slurm
Container engine: Singularity
OS CentOS Linux
Version of nf-core/taxtriage 1.2.0

Workflow didn't finish

Description of the bug

I likely didn't create the sample sheet correctly or use the correct options. Sorry! I've attached the .nextflow.log as well as my sample sheet.

I ran the taxtriage test profile, and that worked okay. I decided to use it on "real" samples. They are two metagenomic samples from milk. The workflow was running until it wasn't.

Command used and terminal output

$ nextflow run jhuapl-bio/taxtriage --input samplesheet.csv --outdir taxtriage -profile docker -resume -with-tower --db /Volumes/IDGenomics_NAS/Data/kraken2_db/2023-06-05 -r main

# top of stdout was cutoff due to screen limitations
No assembly file given, downloading the standard ncbi one
-[nf-core/taxtriage] Pipeline completed with errors- URL: https://tower.nf/user/erin-olde/watch/2BJYwM7lmUfLva
WARN: Killing running tasks (1)
Monitor the execution with Nextflow Tower using this URL: https://tower.nf/user/erin-olde/watch/2BJYwM7lmUfLva
executor >  local (10)
[6e/47237e] process > NFCORE_TAXTRIAGE:TAXTRIAGE:DOWNLOAD_TAXTAB                                     [100%] 1 of 1 ✔
Monitor the execution with Nextflow Tower using this URL: https://tower.nf/user/erin-olde/watch/2BJYwM7lmUfLva
executor >  local (10)
[6e/47237e] process > NFCORE_TAXTRIAGE:TAXTRIAGE:DOWNLOAD_TAXTAB                                     [100%] 1 of 1 ✔
[3b/275b6b] process > NFCORE_TAXTRIAGE:TAXTRIAGE:GET_ASSEMBLIES                                      [100%] 1 of 1 ✔
[ce/e9d577] process > NFCORE_TAXTRIAGE:TAXTRIAGE:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)     [100%] 1 of 1 ✔
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ARTIC_GUPPYPLEX                                     -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:PYCOQC                                              -
[0d/9e27ec] process > NFCORE_TAXTRIAGE:TAXTRIAGE:FASTP (sample58)                                    [100%] 1 of 1
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:FASTQC                                              -
[b7/67ad35] process > NFCORE_TAXTRIAGE:TAXTRIAGE:NANOPLOT (sample59)                                 [100%] 1 of 1
[98/187187] process > NFCORE_TAXTRIAGE:TAXTRIAGE:KRAKEN2_KRAKEN2 (sample59)                          [100%] 1 of 1
[6b/8c754c] process > NFCORE_TAXTRIAGE:TAXTRIAGE:VISUALIZE_REPORTS:KRONA_KTIMPORTTAXONOMY (sample59) [100%] 1 of 1
[26/c0417b] process > NFCORE_TAXTRIAGE:TAXTRIAGE:TOP_HITS (sample59)                                 [100%] 1 of 1
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:MERGEDKRAKENREPORT                                  -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:FILTERKRAKEN                                        -
[fe/ee62ee] process > NFCORE_TAXTRIAGE:TAXTRIAGE:DOWNLOAD_ASSEMBLY (sample59)                        [100%] 1 of 1
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:SAMTOOLS_FAIDX                            [  0%] 0 of 1
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BOWTIE2_BUILD                             -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BOWTIE2_ALIGN                             -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:MINIMAP2_ALIGN                            [  0%] 0 of 1
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:SAMTOOLS_DEPTH                            -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:SAMTOOLS_INDEX                            -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:SAMTOOLS_COVERAGE                         -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_MPILEUP_OXFORD                   -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_MPILEUP_ILLUMINA                 -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:SPLIT_VCF                                 -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_INDEX                            -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_STATS                            -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:BCFTOOLS_CONSENSUS                        -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:ALIGNMENT:RSEQC_BAMSTAT                             -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:CONFIDENCE_METRIC                                   -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:CONVERT_CONFIDENCE                                  -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:MERGE_CONFIDENCE                                    -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:SPADES_ILLUMINA                                     -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:FLYE                                                -
[-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:CUSTOM_DUMPSOFTWAREVERSIONS                         -                                                                                       [-        ] process > NFCORE_TAXTRIAGE:TAXTRIAGE:MULTIQC                                             -                                                                                       ERROR ~ Invalid method invocation `call` with arguments: [[id:sample59, single_end:true, platform:OXFORD, fastq_1:combined/sample59.fastq.gz, fastq_2:, trim:true, directory:false, sequencing_summary:null], /Volumes/BioNGS_1/UT-P2S01293-231116/UT-P2S01293-231116/20231116_2158_P2S-01293-B_PAS24124_ea91207e/work/26/c0417b071c657866ebeaf1d0173dc5/sample59.top_report.tsv, [/Volumes/BioNGS_1/UT-P2S01293-231116/UT-P2S01293-231116/20231116_2158_P2S-01293-B_PAS24124_ea91207e/work/98/187187a25512238459d3c4d3cb052c/sample59.classified.fastq.gz], /Volumes/BioNGS_1/UT-P2S01293-231116/UT-P2S01293-231116/20231116_2158_P2S-01293-B_PAS24124_ea91207e/work/fe/ee62eefbb15ce2841320b077ee4f00/sample59.output.references.fasta] (java.util.ArrayList) on _closure24 type

 -- Check '.nextflow.log' file for details

WARN: Tower request field `workflow.errorMessage` exceeds expected size | offending value: `No signature of method: Script_91470e4e8cb8ea83$_runScript_closure1$_closure3$_closure24.call() is applicable for argument types: (ArrayList) values: [[[id:sample59, single_end:true, platform:OXFORD, fastq_1:combined/sample59.fastq.gz, ...], ...]]
Possible solutions: any(), any(), each(groovy.lang.Closure), tap(groovy.lang.Closure), any(groovy.lang.Closure), each(groovy.lang.Closure)`, size: 386 (max: 255)

Relevant files

This is the .nextflow.log file:
nextflow.log

And my sample sheet:
samplesheet.csv

System information

N E X T F L O W version 23.10.0 build 5889
Hardware : Local
Executor : local
Container engine: Docker
OS : CentOS
Version of jhuapl-bio/taxtriage : current version (pulled 11/22/2023)

Add preliminary step: Host Removal via Alignment

Description of feature

Instead of only giving the option to use Kraken2 for de-hosting, provide minimap2/bowtie2 and save unaligned reads to another fastq file (.gz)

Additional parameters to add:

  • Reference for Host FASTA file (single file)
  • List of Taxid's (separated by space, integer value) to pull from NCBI (use assembly refseq file for mapping taxid to genome(s)).
  • Disable/Enable Stats that includes metrics on the host read removals. Add multiqc area with sufficient headers to mention these stats and differentiate from the downstream alignment steps post-kraken2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.