epi2me-labs / wf-artic Goto Github PK

View Code? Open in Web Editor NEW

44.0 11.0 33.0 71.32 MB

ARTIC SARS-CoV-2 workflow and reporting

Home Page: https://labs.epi2me.io/

License: Other

Nextflow 39.82% Python 26.62% Shell 1.62% Groovy 31.94%

wf-artic's Introduction

Artic Network SARS-CoV-2 Analysis

Run the ARTIC SARS-CoV-2 methodology on multiplexed MinION, GridION, and PromethION data.

Introduction

The wf-artic workflow implements a slightly modified ARTIC FieldBioinformatics workflow for the purpose of preparing consensus sequences from SARS-CoV-2 genomes that have been DNA sequenced using a pooled tiling amplicon strategy.

The workflow consumes a folder containing demultiplexed sequence reads as prepared by either MinKNOW or Guppy. The workflow needs to know the primer scheme that has been used during genome amplification and library preparation e.g. ARTIC/V3 or ONT_Midnight/V1. Other parameters can be specified too e.g. assign sample names to the barcodes or to adjust the length distribution of acceptable amplicon sequences.

Compute requirements

Recommended requirements:

CPUs = 4
Memory = 8GB

Minimum requirements:

CPUs = 2
Memory = 4GB

Approximate run time: 5 minutes per sample

ARM processor support: False

Install and run

These are instructions to install and run the workflow on command line. You can also access the workflow via the EPI2ME Desktop application.

The workflow uses Nextflow to manage compute and software resources, therefore Nextflow will need to be installed before attempting to run the workflow.

The workflow can currently be run using either [Docker](https://www.docker.com/products/docker-desktop or Singularity to provide isolation of the required software. Both methods are automated out-of-the-box provided either Docker or Singularity is installed. This is controlled by the -profile parameter as exemplified below.

It is not required to clone or download the git repository in order to run the workflow. More information on running EPI2ME workflows can be found on our website.

The following command can be used to obtain the workflow. This will pull the repository in to the assets folder of Nextflow and provide a list of all parameters available for the workflow as well as an example command:

nextflow run epi2me-labs/wf-artic --help

To update a workflow to the latest version on the command line use the following command:

nextflow pull epi2me-labs/wf-artic

A demo dataset is provided for testing of the workflow. It can be downloaded and unpacked using the following commands:

wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-artic/wf-artic-demo.tar.gz
tar -xzvf wf-artic-demo.tar.gz

The workflow can then be run with the downloaded demo data using:

nextflow run epi2me-labs/wf-artic \
	--fastq 'wf-artic-demo/fastq' \
	--sample_sheet 'wf-artic-demo/sample_sheet.csv' \
	--scheme_name 'SARS-CoV-2' \
	--scheme_version 'Midnight-ONT/V3' \
	-profile standard

For further information about running a workflow on the command line see https://labs.epi2me.io/wfquickstart/

Related protocols

This workflow is designed to take input sequences that have been produced from Oxford Nanopore Technologies devices.

The Midnight protocol for sample preparation and sequencing can be found in the Nanopore community.

Input example

This workflow accepts FASTQ files as input.

The FASTQ input parameters for this workflow accept one of three cases: (i) the path to a single FASTQ; (ii) the path to a top-level directory containing FASTQ files; (iii) the path to a directory containing one level of sub-directories which in turn contain FASTQ files. In the first and second cases (i and ii), a sample name can be supplied with --sample. In the last case (iii), the data is assumed to be multiplexed with the names of the sub-directories as barcodes. In this case, a sample sheet can be provided with --sample_sheet.

(i)                     (ii)                 (iii)    
input_reads.fastq   ─── input_directory  ─── input_directory
                        ├── reads0.fastq     ├── barcode01
                        └── reads1.fastq     │   ├── reads0.fastq
                                             │   └── reads1.fastq
                                             ├── barcode02
                                             │   ├── reads0.fastq
                                             │   ├── reads1.fastq
                                             │   └── reads2.fastq
                                             └── barcode03
                                              └── reads0.fastq

Input parameters

Input Options

Nextflow parameter name	Type	Description	Help	Default
fastq	string	FASTQ files to use in the analysis.	This accepts one of three cases: (i) the path to a single FASTQ file; (ii) the path to a top-level directory containing FASTQ files; (iii) the path to a directory containing one level of sub-directories which in turn contain FASTQ files. In the first and second case, a sample name can be supplied with `--sample`. In the last case, the data is assumed to be multiplexed with the names of the sub-directories as barcodes. In this case, a sample sheet can be provided with `--sample_sheet`.
analyse_unclassified	boolean	Analyse unclassified reads from input directory. By default the workflow will not process reads in the unclassified directory.	If selected and if the input is a multiplex directory the workflow will also process the unclassified directory.	False

Primer Scheme Selection

Nextflow parameter name	Type	Description	Help	Default
scheme_name	string	Primer scheme name.	This should be set to `SARS-CoV-2`, or `spike-seq` or your custom scheme name. This affects the choice of scheme versions you can use. The only scheme versions compatible with `spike-seq` are `ONT/V1` and `ONT/V4.1`	SARS-CoV-2
scheme_version	string	Primer scheme version.	This is the version of the primer scheme to use, more details about primer shemes can be found here.	ARTIC/V3
custom_scheme	string	Path to a custom scheme.	If you have a custom primer scheme you can enter the details here. This must be the full path to the directory containing your appropriately named scheme bed and fasta files; <SCHEME_NAME>.bed and <SCHEME_NAME>.fasta. More details here.

Sample Options

Nextflow parameter name	Type	Description	Help	Default
sample_sheet	string	A CSV file used to map barcodes to sample aliases. The sample sheet can be provided when the input data is a directory containing sub-directories with FASTQ files.	The sample sheet is a CSV file with, minimally, columns named `barcode` and `alias`. Extra columns are allowed. A `type` column is required for certain workflows and should have the following values; `test_sample`, `positive_control`, `negative_control`, `no_template_control`.
sample	string	A single sample name for non-multiplexed data. Permissible if passing a single .fastq(.gz) file or directory of .fastq(.gz) files.

Output Options

Nextflow parameter name	Type	Description	Help	Default
out_dir	string	Directory for output of all workflow results.		output

Reporting Options

Nextflow parameter name	Type	Description	Default
report_depth	integer	Min. depth for percentage coverage. (e.g. 89% genome covered at > `report_depth`)	100
report_clade	boolean	Show results of Nextclade analysis in report.	True
report_coverage	boolean	Show genome coverage traces in report.	True
report_lineage	boolean	Show results of Pangolin analysis in report.	True
report_variant_summary	boolean	Show variant information in report.	True

Advanced Options

Nextflow parameter name	Type	Description	Help	Default
artic_threads	number	Number of CPU threads to use per artic task.	The total CPU resource used by the workflow is constrained by the executor configuration.	4
pangolin_threads	number	Number of CPU threads to use per pangolin task.	The total CPU resource used by the workflow is constrained by the executor configuration.	4
genotype_variants	string	Report genotyping information for scheme's known variants of interest, optionally provide file path as argument.
list_schemes	boolean	List primer schemes and exit without running analysis.		False
min_len	number	Minimum read length (default: set by scheme).
max_len	number	Maximum read length (default: set by scheme).
max_softclip_length	integer	Remove reads with alignments showing large soft clipping
update_data	boolean	Update Pangolin and Nextclade data at runtime.		True
pangolin_options	string	Pass options to Pangolin, for example "--analysis-mode fast --min-length 26000".
nextclade_data_tag	string	The tag of the nextclade data packet
normalise	integer	Depth ceiling for depth of coverage normalisation		200
override_basecaller_cfg	string	Override auto-detected basecaller model that processed the signal data; used to select an appropriate Medaka model.	Per default, the workflow tries to determine the basecall model from the input data. This parameter can be used to override the detected value (or to provide a model name if none was found in the inputs). However, users should only do this if they know for certain which model was used as selecting the wrong option might give sub-optimal results. A list of recent models can be found here: https://github.com/nanoporetech/dorado#DNA-models.

Miscellaneous Options

Nextflow parameter name	Type	Description	Help	Default
lab_id	string	Laboratory identifier, used in reporting.
testkit	string	Test kit identifier, used in reporting.

Outputs

Output files may be aggregated including information for all samples or provided per sample. Per-sample files will be prefixed with respective aliases and represented below as {{ alias }}.

Title	File path	Description	Per sample or aggregated
Workflow report	./wf-artic-report.html	Report for all samples.	aggregated
Consensus sequences	./all_consensus.fasta	Final consensus sequences for all samples in the analysis.	aggregated
Pangolin results	./lineage_report.csv	Pangolin results for each of the samples in the analysis.	aggregated
Nextclade results	./nextclade.json	Nextclade results for each of the samples in the analysis.	aggregated
Coverage data	./all_depth.txt	Coverage of the reference genome in 20 base windows in all the samples in the analysis.	aggregated
Variants	./{{ alias }}.pass.named.vcf.gz	A VCF file containing high confidence variants in the sample when compared to the reference.	per-sample
Variants index	./{{ alias }}.pass.named.vcf.gz.tbi	An index file for the variants.	per-sample
Alignments	./{{ alias }}.primertrimmed.rg.sorted.bam	A BAM file containing the reads for the sample aligned to the reference.	per-sample
Alignments index	./{{ alias }}.primertrimmed.rg.sorted.bam.bai	An index file for the alignments.	per-sample

Pipeline overview

The pipeline is largely a wrapper around the Artic Network Field Bioinformatics analysis package.

1. Concatenates input files and generate per read stats.

The fastcat/bamstats tool is used to concatenate multifile samples to be processed by the workflow. It will also output per read stats including average read lengths and qualities. Reads are additionally filtered for sequence length and quality characteristics.

2. Mapping and primer trimming (Artic)

Concatenated reads are mapped to the reference SARS-CoV-2 genome using minimap2. A primer scheme-specific BED file is used to identify the regions of the mapped sequences that correspond to synthetic sequences (primers) - these regions are clipped to ensure that sequences are entirely of biological origin.

3. Variant calling and consensus generation (Artic)

The retained sequences are used to prepare a consensus sequence that is then polished using Medaka and variant calling is performed to produce a VCF file of genetic differences relative to the reference genome.

4. Lineage/clade assignment

The consensus sequence is annotated for virus clade information using NextClade, and strain assignment is performed using Pangolin.

Troubleshooting

If the workflow fails please run it with the demo data set to ensure the workflow itself is working. This will help us determine if the issue is related to the environment, input parameters or a bug.
See how to interpret some common nextflow exit codes here.

FAQ's

If your question is not answered here, please report any issues or suggestions on the github issues page or start a discussion on the community.

wf-artic's People

Contributors

Stargazers

Watchers

wf-artic's Issues

Allele frequency threshold for failed vcf

Hi @mattdmem @cjw85 ,

We saw some missing sites (N) in the genome, and we find those missing sites in the fail.vcf. These sites seems to be minor variants. Their allele frequency ranged from around 0.4 to 0.89.

May we know where could we find the information about the allele frequency threshold which classify the variants into fail.vcf when using SUP basecalling model?

Thank you very much!

Eddie

medaka_model is not a valid enum value

By running this command

nextflow run epi2me-labs/wf-artic -profile standard -w OUTPUT_DIRECTORY/work --samples /sample_sheet --fastq FASTQ_FOLDER --out_dir OUTPUT_DIRECTORY --scheme_version V1200 --medaka_model r941_min_hac_g507

appears issue

--medaka_model: r941_min_hac_g507 is a valid enum value (r941_min_hac_g507)

How we can resolve this please?

Analysis Failed but Coverage is Working

After running the nextflow workflow and it appearing to successfully run. The sequencing report seems to show some issues.

The reads were successfully mapped by the analysis failed which seems to indicate most of the tools did not actually work.

I am using N E X T F L O W ~ version 21.10.0 with the latest release wf-artic (v0.3.5) and one of your test files.

Any thoughts as to what the issue might be?

Incorrectly masking sites in primer regions

We are seeing variant that look like they are being incorrectly masked. (particularly for an omicron sample).
This example shows an AF of 21 /30 call for T (variant) vs C or deletion. It is in a triplicate for TTT, which is understandable why the accuracy is suffering as the live basecaller does not call 3 repeats with as high of accuracy as T or less repeats.
it looks to me this is the same position as the ncov-2019_27_right primer. I have seen this occur at least 5 other times in positions that overlap a primer in this omicron sample. It seems like the the workflow will mask variants in these regions of primers, if the AF is lower than 50%, because the primer's positions are being counted as "untrimmed". I have seen this happen on 30 other samples (but only in 1-2 positions).

Incorporating the latest pango-designations

Hi,

Thank you for maintaing this pipeline!

Is there a way of incorporating the latest pango-designations 2022-01-05 while running nextflow run epi2me-labs/wf-artic -profile standard?

It appears that running nextflow run epi2me-labs/wf-artic -profile standard --pangolin_version 3.1.17 will use pango-designation 2021-12-06.

/Linda

Demultiplexed samples not found in data mount

Hello,

I am having some trouble when using the SARS-Cov-2 Analysis Workflow on Ubuntu 18.04. When trying to merge the previously demultiplexed fastq files to generate a summary, no files can be found (see attached picture) as they do not appear in the selected data mount. Usually the files would be listed on the left of the screen together with the three folders. I have checked the path and could not find any issues there, although I did notice that Epi2Me is creating a new folder named "epi2melabs-data" containing the three aforementioned folders. In previous versions, the additional folder would not have been created, so that the folders generated were to be found in the same directory as the samples.

Both Epi2Me Labs (Version 3.1.1) and the environment (Version 1.1.25) are up to date.

Would love some input on how to fix this.
Thanks!

Andrei

conflict of SNP and deletion

Hi Team,

I would like to inquire a conflict on variant calling. Please refer to the attached picture for my case. In my sample, although there is a drop of coverage approximately from pos 21982 to 21993 and a SNP (G21987A) was found, the variant caller only called out a 9-base deletion starting from pos 21975, instead of some positions on front of pos 21975. So, my question is how the caller determine there is a 9-base deletion for this sample? Thank you very much.

Heads up for when you update Nextclade to v1.10.0

I wanted to give you a small heads up. I found your repo through a quick code search, because it doesn't use --input-dataset for Nextclade run. Thus if you update to v1.10.0 (it was released yesterday), you will have to add a line. We recommend to use --input-dataset, but you can also add --input-virus-properties excplicitly.

See this issue for a detailed explanation: nextstrain/nextclade#703

You need to make a change like here: broadinstitute/viral-pipelines@97fd339

Sorry for the trouble.

error while executing pipeline with custom_scheme parameter

Hi,

I run into a problem while running the latest version of the workflow (v0.3.13) with the --custom_scheme parameter. It just displays the parameters used in the pipeline followed by the following. The error says "No such variable: c_purple".

The nextflow log seems to point out that the variable c_purple doesn't exist (sorry if this doesn't make sense, Groovy errors are hard to get):

Apr-21 15:20:28.134 [main] DEBUG nextflow.Session - Session aborted -- Cause: No such property: c_purple for class: nextflow.script.WorkflowBinding Apr-21 15:20:28.145 [main] ERROR nextflow.cli.Launcher - @unknown groovy.lang.MissingPropertyException: No such property: c_purple for class: nextflow.script.WorkflowBinding at groovy.lang.Binding.getVariable(Binding.java:56) at nextflow.script.WorkflowBinding.getVariable(WorkflowBinding.groovy:132) at groovy.lang.Binding.getProperty(Binding.java:116) at nextflow.script.WorkflowBinding.getProperty(WorkflowBinding.groovy:121) at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:194) at groovy.lang.Closure.getPropertyTryThese(Closure.java:320) at groovy.lang.Closure.getPropertyDelegateFirst(Closure.java:310) at groovy.lang.Closure.getProperty(Closure.java:296) at org.codehaus.groovy.runtime.callsite.PogoGetPropertySite.getProperty(PogoGetPropertySite.java:49) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGroovyObjectGetProperty(AbstractCallSite.java:341) at Script_e058cee8$_runScript_closure20$_closure59.doCall(Script_e058cee8:602) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at groovy.lang.Closure.call(Closure.java:412) at groovy.lang.Closure.call(Closure.java:406) at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:186) at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:170) at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:52) at nextflow.script.ChainableDef$invoke_a.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at nextflow.script.BaseScript.runDsl2(BaseScript.groovy:191) at nextflow.script.BaseScript.run(BaseScript.groovy:200) at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:221) at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:212) at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:120) at nextflow.cli.CmdRun.run(CmdRun.groovy:308) at nextflow.cli.Launcher.run(Launcher.groovy:480) at nextflow.cli.Launcher.main(Launcher.groovy:639)

I found this using a code similar to this:

nextflow run epi2me-labs/wf-artic -profile docker -c docker.config --fastq /absolute/path/to/Run12/run12/run12/fastq_pass/ --scheme_name SARS-CoV-2 --custom_scheme /absolute/path/to/custom_scheme_folder/ --min_len 150 --max_len 1200 --sample_sheet /absolute/path/to/sample_sheet_wf-artic-run12.csv --medaka_model r941_min_fast_variant_g507 --out_dir /absolute/path/to/output_wf-artic --report_depth 100 --report_name Run12

Any hint will be appreciated! thanks :)

UCX package

We have this error,
In addition, the UCX support is also built but disabled by default.
To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment
variables OMPI_MCA_pml="ucx" OMPI_MCA_osc="ucx" before launching your MPI processes.
Equivalently, you can set the MCA parameters in the command line:
mpiexec --mca pml ucx --mca osc ucx ...
Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX.
Please consult UCX's documentation for detail.

however I notice that ucx is not installed in the 1.1.14 image.
It is impossible to install it because the node.js 16 package is not present in the epi2melabs repo.

It's a shame not to be able to use cuda on a 50M$ gridion

Thank's
Best regards

epi2me wf-artic for other viruses

Hi Guys,

Thanks a million for this amazing pipeline - one request/question. I know this has all been developed for COVID and this is a huge priority for all of us, however is there any chance there is a way of being able to use this pipeline for other viruses? With the original artic pipeline one of the amazing things was that we could all create primers and generate amplicons for our virus of interest and by placing the primer file and the reference file in the correct location it was possible to run the pipeline for our virus of interest. Of course the output report is tailored to SARS but even if we could just get the fasta output and the bam files this would increase the utility of wf-artic and allow all of us to use it for other applications? Especially for the midnight kit it would be absolutely amazing to be able to test and use it for amplicon sequencing of other viruses!

Thanks a lot in advance!

Liana

Minimum Depth for variant calling and masking in consensus fasta generation

Hi,

I would like to know does wf-artic follows the ARTIC-network protocol which sets the minimum depth needs to be >= 20X for both variant calling and avoid from masking with N?

Naming convention

What is the naming convention of the barcode folders allowed to be, I would like to name them something besides just barcode01 but it seems to cause an issue when i do?

"--samples" in v0.3.11

Hi,
I have switched from v0.3.9 to v0.3.11. It is not clear to me what the changes made in sample_sheet provided.. When I include in my command "--sample_sheet samplesheet.csv", the pipeline gives error, and when I keep it "--samples samplesheet.csv" as it is used for v0.3.9, the pipeline works. However, it goes to all barcode directories (96), not only what is included in the sample sheet.

How can I fix it?

best,
Nilay

[request] Avoid using colons in directories names

Hello,
While cloning the wf-artic repo, I got the error:
fatal: cannot create directory at 'data/nextclade/datasets/sars-cov-2/references/MN908947/versions/2021-06-25T00:00:00Z': Invalid argument
This is because the filesystem where I keep my data is NTFS, and in NTFS the colon character is not allowed.

If is not a big change, I would suggest to avoid invalid characters under NTFS (or ext4). Even if the workflows are meant to be run on Linux or WSL, someone may want to checkout the repo on a windows machine, or work on a Linux machine with an NTFS partition.

Thanks.

pipeline:runARTIC failing

nextflow run epi2me-labs/wf-artic -c ../my_config.cfg --fastq fastq_pass --scheme_name SARS-CoV-2 --scheme_version V1200 --min_len 300 --max_len 1000 --genotype_variants ../variants.vcf --report_clade TRUE --report_lineage FALSE
N E X T F L O W ~ version 20.10.0
Launching epi2me-labs/wf-artic [nasty_leavitt] - revision: d875912 [master]
[1mCore Nextflow options
revision : master
runName : nasty_leavitt
containerEngine : docker
launchDir : /seq/Development/Projects/covid/nanopore/runs/gridion/FAQ73004
workDir : /seq/Development/Projects/covid/nanopore/runs/gridion/FAQ73004/work
projectDir : /home/[email protected]/.nextflow/assets/epi2me-labs/wf-artic
userName : ron.m.kagan
profile : standard
configFiles : /home/[email protected]/.nextflow/assets/epi2me-labs/wf-artic/nextflow.config, /seq/Development/Projects/covid/nanopore/runs/gridion/FAQ73004/../my_config.cfg

Basic Input/Output Options
fastq : fastq_pass
genotype_variants: ../variants.vcf

Primer Scheme Selection
scheme_version : V1200

Advanced options
min_len : 300
max_len : 1000
normalise : 200

Reporting Options
report_lineage : false

!! Only displaying parameters that differ from the pipeline defaults !!

If you use wf-artic for your analysis please cite:

The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
Checking fastq input.
Barcoded directories detected.
executor > local (108)
executor > local (109)
[a4/8477ac] process > pipeline:getVersions [100%] 1 of 1 ✔
[42/25972b] process > pipeline:getParams [100%] 1 of 1 ✔
[6f/a3442e] process > pipeline:copySchemeDir [100%] 1 of 1 ✔
[40/a1941a] process > pipeline:preArticQC (93) [100%] 96 of 96 ✔
[43/1827b7] process > pipeline:runArtic (21) [ 1%] 1 of 88
[- ] process > pipeline:combineDepth -
[- ] process > pipeline:allConsensus -
[- ] process > pipeline:allVariants -
[- ] process > pipeline:genotypeSummary -
[- ] process > pipeline:combineGenotypeSummaries -
[51/98b751] process > pipeline:prep_nextclade [100%] 1 of 1 ✔
[- ] process > pipeline:nextclade -
[- ] process > pipeline:pangolin -
[- ] process > pipeline:telemetry -
[- ] process > pipeline:report -
[- ] process > output -
/seq/Development/Projects/covid/nanopore/runs/gridion/FAQ73004/work/5a/5ee3941bec41eab45a78561788d5b1/barcode83.consensus.fasta
Error executing process > 'pipeline:genotypeSummary (1)'

Caused by:
Unknown variable 'csvName' -- Make sure it is not misspelt and defined somewhere in the script before using it

Source block:
def lab_id = params.lab_id ? "--lab_id ${params.lab_id}" : ""
def testkit = params.testkit ? "--testkit ${params.testkit}" : ""
"""
genotype_summary.py
-b $bam
-v $vcf
-d reference.vcf
--sample $sample_id
$lab_id
$testkit
-o ${csvName}.genotype.csv
"""

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

mismatch between "V1200" max_len and min_len settings hardcoded in main.nf and nanopore community documentation

The nanopore community documentation recommends the max_len=1100 and the min_len=200 for the the midnight "V1200".
However, the hardcoded scheme in the main.nf of this github sets the max_len=1200 and the min_len=150.

Which max_len and min_len's should be used?

https://community.nanoporetech.com/protocols/pcr-tiling-of-sars-cov-2-virus-with-rapid-barcoding-and-midnight/v/mrt_9127_v110_revg_14jul2021/downstream-analysis-and-expected-results?devices=minion

if (!params.max_len) { 
        params.remove('max_len')
        if (params.scheme_version == "V1200") {
            params._max_len = 1200
        } else {
            params._max_len = 700
        }

pangolin and nextclade versions

Hello,

We have just successfully analysed using the latest release of wf-artic (v0.3.5). We notice however that the pangolin and nextclade versions used are older versions of both.

The current versions are:
pangolin version 3.1.16
nextclade 1.5.1

Can these be updated and how can we make sure the updated versions of each are being used?

Thanks,

Liana

guppy version for --medaka_model

Hi,

After successfully running the pipeline on the provided test data, I'm trying to explore data we generated using a minION Mk1C. We tried to figure out what version of guppy was installed on it and found:
ont-guppy-for-mk1c: 4.3.4

Thus I used this option in the command:
--medaka_model r941_min_fast_g434

But got this message:
Unknown configuration profile: 'medaka_model=r941_min_fast_g434'

I looked for a list of the available values for the option --medaka_model by running:
nextflow run epi2me-labs/wf-artic --help
and
nextflow run epi2me-labs/wf-artic --medaka_model --help
but only found the default value.

Could you help me figure out where to find such a list please?

Thanks :)

bin/scheme_to_nextclade.py generates incorrect primers for Midnight-ONT_V3

The Midnight-ONT V3 primers, collectively has 3 substitutions and 1 deletions from the >MN908947.3 (Wuhan-Hu-1/2019) reference sequence for primers 21R, 22R, 23L, 24L. The result is that these substitutions and deletions will not properly addressed when the primers sequences are generated by used in the following called script:
bin/scheme_to_nextclade.py.

The proposed fix would be to modify to the reference sequence in the following file, in order to reflect these substitutions and deletions in these primers, in order for the bed/reference files to generate the correct primers sequences:
/data/primer_schemes/SARS-CoV-2/Midnight-ONT/V3/SARS-CoV-2.reference.fasta
Because of the deletion in 21R, the ranges on the subsequent bed file primer positions (22L to 29R) would have to be shifted in this file:
/data/primer_schemes/SARS-CoV-2/Midnight-ONT/V3/SARS-CoV-2.scheme.bed

error with samples

i am quite new in MinION feild and i wanted to just test the pipeline . no problem with installation. i already have one demultiplexed fastq file and i made the sample sheet my self exactly as yours in the test data and my command line is bellow. for some reasons it says sample is missing. no log file is created though it says check the log file.

any helps appreciated.

nextflow run epi2me-labs/wf-artic -w home/Out -profile standard --fastq test.fastq --samples sample_sheet --out_dir
N E X T F L O W ~ version 21.04.1
Launching epi2me-labs/wf-artic [jolly_montalcini] - revision: 55e68f6 [master]

Parameter summary

help: false
out_dir: true
fastq: test.fastq
samples: sample_sheet
report_depth: 100
medaka_model: r941_min_high_g360
scheme_name: SARS-CoV-2
scheme_version: V3
genotype_variants: null
detect_samples: false
report_clade: true
report_lineage: true
report_coverage: true
report_variant_summary: true
wfversion: v0.2.3
aws_image_prefix: null
aws_queue: null
lab_id: null
testkit: null
_min_len: 400
_max_len: 700

No such variable: samples

-- Check script '/home/.nextflow/assets/epi2me-labs/wf-artic/main.nf' at line: 503 or see '.nextflow.log' file for more details

Warning message

Good Morning.
We are receiving a Warning message.

Checking fastq input.
Warning: ' --sample_sheet' given but single non-barcode directory found. Ignoring.

Please what kind of warning is this?
Thanks

Clean instal of wf-artic 0.3.9

Hi Matt,
Please I need to have a clean install of 0.3.9. So when I run nextflow info epi2me-labs/wf-artic to have only version until 0.3.9 and not 0.3.10.
I know that i can get the latest version and with the appropriate flag to run the previous version. But is something that I do not want at this moment.
I have tried to delete wf-artic v0.3.10 for /usr/local/bin/ folder but when I am downloading the zip file wf-artic0.3.9 and I am transferring the wf-artic folder into /usr/local/bin/ then I run again the info and I am still getting the latest version.
I appreciate any help

normalise artic minion value is hard coded

The normalise value for artic minion is hardcoded. For troubleshooting it is nice to get a primertrimed bam file without downsampling the file, as it will tell you the ratio between the different primers without caping them at 200. Also, normalization value may want to be changed based on the basecalling method in use.

max_len setting

Hi, I am wondering how to set max_len to a number longer than what the primer scheme has.
In my case, it is V1200 and I would like to include reads around 1400bp but I am having error when I try to set it.
Thanks.
/Nilay

Error executing process > 'pipeline:report'

Hello! I found a problem with the input file. If I give single fastq to --fastq parametr, I will see the error, but if I give directory with single fasta it worked. However, in in the documentation it is prepared that it is possible to transfer either a single fastq or directory.

(artic) krivonos_dv@r740gpu2:~/runs/PROJECTS/COVID$ nextflow run  epi2me-labs/wf-artic --fastq ./merged_reads/barcode01.fastq --scheme_version ARTIC/V4 --out_dir TEST_ASSAY/ --sample ./merged_reads/barcode01.fastq --scheme_name SARS-CoV-
2
N E X T F L O W  ~  version 22.04.0
Launching `https://github.com/epi2me-labs/wf-artic` [agitated_babbage] DSL2 - revision: 218aa1d6d0 [master]

WARN: Found unexpected parameters:
* --scheme_dir: primer_schemes
- Ignore this warning: params.schema_ignore_params = "scheme_dir"

Core Nextflow options
  revision       : master
  runName        : agitated_babbage
  containerEngine: docker
  launchDir      : /mnt/iscsidisk1/runs/runs-krivonos/PROJECTS/COVID
  workDir        : /mnt/iscsidisk1/runs/runs-krivonos/PROJECTS/COVID/work
  projectDir     : /home/krivonos_dv/.nextflow/assets/epi2me-labs/wf-artic
  userName       : krivonos_dv
  profile        : standard
  configFiles    : /home/krivonos_dv/.nextflow/assets/epi2me-labs/wf-artic/nextflow.config

Basic Input/Output Options
  out_dir        : TEST_ASSAY/
  fastq          : ./merged_reads/barcode01.fastq
  sample         : ./merged_reads/barcode01.fastq

Primer Scheme Selection
  scheme_version : ARTIC/V4

Advanced options
  normalise      : 200

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use epi2me-labs/wf-artic for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x




      ------------------------------------
      Available Primer Schemes:
      ------------------------------------

  Name          Version
  spike-seq     ONT/V4.1
  spike-seq     ONT/V1
  SARS-CoV-2    ARTIC/V3
  SARS-CoV-2    ARTIC/V4
  SARS-CoV-2    ARTIC/V2
  SARS-CoV-2    ARTIC/V4.1
  SARS-CoV-2    ARTIC/V1
  SARS-CoV-2    NEB-VarSkip/v2b
  SARS-CoV-2    NEB-VarSkip/v1a
  SARS-CoV-2    NEB-VarSkip/v1a-long
  SARS-CoV-2    NEB-VarSkip/v2
  SARS-CoV-2    Midnight-IDT/V1
  SARS-CoV-2    Midnight-ONT/V3
  SARS-CoV-2    Midnight-ONT/V2
  SARS-CoV-2    Midnight-ONT/V1

      ------------------------------------

WARN: Access to undefined parameter `detect_samples` -- Initialise it to a default value eg. `params.detect_samples = some_value`
Checking fastq input.
Single file input detected.
executor >  local (7)
executor >  local (7)
[aa/c51244] process > handleSingleFile (1)    [100%] 1 of 1 ✔
[81/b0891b] process > pipeline:getVersions    [100%] 1 of 1 ✔
[fb/4ddaed] process > pipeline:getParams      [100%] 1 of 1 ✔
[94/fbb340] process > pipeline:copySchemeDir  [100%] 1 of 1 ✔
[30/9a57a7] process > pipeline:preArticQC (1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > pipeline:runArtic (1)   -
[-        ] process > pipeline:combineDepth   -
[-        ] process > pipeline:allConsensus   -
[-        ] process > pipeline:allVariants    -
[ba/d1ce66] process > pipeline:prep_nextclade [100%] 1 of 1 ✔
[-        ] process > pipeline:nextclade      -
[-        ] process > pipeline:pangolin       -
[-        ] process > pipeline:telemetry      -
[-        ] process > pipeline:report         -
[-        ] process > output                  -
Error executing process > 'pipeline:preArticQC (1)'

Caused by:
  Process `pipeline:preArticQC (1)` terminated with an error exit status (139)

Command executed:

  fastcat -s ./merged_reads/barcode01.fastq -r ./merged_reads/barcode01.fastq.stats -x barcode01 > /dev/null

Command exit status:
  139

Command output:
  (empty)

Command error:
  .command.sh: line 2:    27 Segmentation fault      (core dumped) fastcat -s ./merged_reads/barcode01.fastq -r ./merged_reads/barcode01.fastq.stats -x barcode01 > /dev/null

Work dir:
  /mnt/iscsidisk1/runs/runs-krivonos/PROJECTS/COVID/work/30/9a57a7079be8610757a592f2a02807

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Failed to generate pipeline report

Hello,
I ran this workflow with the following command:
nextflow run epi2me-labs/wf-artic -c /media/MinION/MinIT_Data/Positive_std_artic/nf_config.cfg -w ${OUTPUT}/workspace -profile standard --fastq /media/MinION/MinIT_Data/APL_V1_V20/gfast_demux --samples /media/MinION/MinIT_Data/ARTIC_Analysis/APL_V1_V20/APL_V1_V20_sample_sheet.txt --out_dir ${OUTPUT}

All steps of the pipeline completed successfully except for pipeline:report which generated the following error:

Error executing process > 'pipeline:report'

Caused by:
Process pipeline:report terminated with an error exit status (1)

Command executed:

echo "--pangolin pangolin.csv"
echo "--nextclade nextclade.json"
report.py consensus_status.txt wf-artic-report.html --pangolin pangolin.csv --nextclade nextclade.json --revision master --commit 5ecadc6 --min_len 400 --max_len 700 --report_depth 100 --depths depth_stats/* --summaries read_stats/* --bcftools_stats vcf_stats/*

Command exit status:
1

Command output:
--pangolin pangolin.csv
--nextclade nextclade.json

Command error:
Traceback (most recent call last):
File "/home/.nextflow/assets/epi2me-labs/wf-artic/bin/report.py", line 330, in
main()
File "/home/.nextflow/assets/epi2me-labs/wf-artic/bin/report.py", line 204, in main
p = lines.line(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/aplanat/lines.py", line 30, in line
return simple(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/aplanat/base.py", line 82, in simple
p.y_range = Range1d(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/models/ranges.py", line 145, in init
super().init(**kwargs)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/model.py", line 236, in init
super().init(**kwargs)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/core/has_props.py", line 269, in init
setattr(self, name, value)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/core/has_props.py", line 298, in setattr
super().setattr(name, value)
executor > local (55)
[fa/36a705] process > checkSampleSheet (1) [100%] 1 of 1 ✔
[a5/83c00b] process > pipeline:copySchemeDir [100%] 1 of 1 ✔
[6b/913534] process > pipeline:preArticQC (19) [100%] 22 of 22 ✔
[8e/7ff09f] process > pipeline:runArtic (20) [100%] 22 of 22 ✔
[fd/b4fb58] process > pipeline:allConsensus [100%] 1 of 1 ✔
[e5/d9cb28] process > pipeline:allVariants [100%] 1 of 1 ✔
[95/f3b5aa] process > pipeline:nextclade [100%] 1 of 1 ✔
[d3/96b99a] process > pipeline:pangolin [100%] 1 of 1 ✔
[b6/fcc7ef] process > pipeline:report [100%] 1 of 1, failed: 1 ✘
[b3/7e1f6a] process > output (3) [100%] 4 of 4 ✔
Error executing process > 'pipeline:report'

Caused by:
Process pipeline:report terminated with an error exit status (1)

Command executed:

Command exit status:
1

Command output:
--pangolin pangolin.csv
--nextclade nextclade.json

Command error:
Traceback (most recent call last):
File "/home/.nextflow/assets/epi2me-labs/wf-artic/bin/report.py", line 330, in
main()
File "/home/.nextflow/assets/epi2me-labs/wf-artic/bin/report.py", line 204, in main
p = lines.line(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/aplanat/lines.py", line 30, in line
return simple(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/aplanat/base.py", line 82, in simple
p.y_range = Range1d(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/models/ranges.py", line 145, in init
super().init(**kwargs)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/model.py", line 236, in init
super().init(**kwargs)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/core/has_props.py", line 269, in init
setattr(self, name, value)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/core/has_props.py", line 298, in setattr
super().setattr(name, value)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/core/property/descriptors.py", line 552, in set
self._internal_set(obj, value, setter=setter)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/core/property/descriptors.py", line 784, in _internal_set
value = self.property.prepare_value(obj, self.name, value)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/bokeh/core/property/bases.py", line 350, in prepare_value
raise ValueError(f"failed to validate {obj_repr}.{name}: {error}")
ValueError: failed to validate Range1d(id='1237', ...).bounds: expected either None or a value of type MinMaxBounds(Auto, Tuple(Float, Float), Tuple(Nullable(Float), Float), Tuple(Float, Nullable(Float)), Tuple(TimeDelta, TimeDelta), Tuple(Nullable(TimeDelta), TimeDelta), Tuple(TimeDelta, Nullable(TimeDelta)), Tuple(Datetime, Datetime), Tuple(Nullable(Datetime), Datetime), Tuple(Datetime, Nullable(Datetime))), got (0.0, 0.0)

Do you have any suggestions the have the report generated?
Thanks,
Scott

Process `pipeline:preArticQC (1)` terminated with an error exit status (127)

Hi,
Is any one can help me on the error (127)? Detail information is as follows. Thanks!

$ /home/grid/program/nextflow run epi2me-labs/wf-artic --scheme_name SARS-CoV-2 --scheme_version V1200 --min_len 200 --max_len 1100 --our_dir midnight --fastq sars_cov_2_reads/ -resume
N E X T F L O W ~ version 21.10.6
Launching epi2me-labs/wf-artic [drunk_mercator] - revision: 8d13aab [master]

WARN: Found unexpected parameters:

--our_dir: midnight

Ignore this warning: params.schema_ignore_params = "our_dir"

Core Nextflow options
revision : master
runName : drunk_mercator
containerEngine: docker
launchDir : /home/grid/data/wf_artic
workDir : /home/grid/data/wf_artic/work
projectDir : /home/grid/.nextflow/assets/epi2me-labs/wf-artic
userName : grid
profile : standard
configFiles : /home/grid/.nextflow/assets/epi2me-labs/wf-artic/nextflow.config

Basic Input/Output Options
fastq : sars_cov_2_reads/

Primer Scheme Selection
scheme_version : V1200

Advanced options
min_len : 200
max_len : 1100

!! Only displaying parameters that differ from the pipeline defaults !!

If you use wf-artic for your analysis please cite:

The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x

Checking input directory structure.

Found barcode directories
executor > local (3)
executor > local (3)
[ae/d15473] process > pipeline:getVersions [100%] 1 of 1, failed: 1 ✘
[be/7ad0fc] process > pipeline:getParams [100%] 1 of 1, cached: 1 ✔
[- ] process > pipeline:copySchemeDir -
[24/9dd3be] process > pipeline:preArticQC (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > pipeline:runArtic -
[- ] process > pipeline:allConsensus -
[- ] process > pipeline:allVariants -
[- ] process > pipeline:nextclade -
[- ] process > pipeline:pangolin -
[- ] process > pipeline:telemetry -
[- ] process > pipeline:report -
[- ] process > output -
Error executing process > 'pipeline:preArticQC (1)'

Caused by:
Process pipeline:preArticQC (1) terminated with an error exit status (127)

Command executed:

fastcat -s barcode01 -r barcode01.stats -x barcode01 > /dev/null

Command exit status:
127

Command output:
(empty)

Command error:
/bin/bash: .command.run: No such file or directory

Work dir:
/home/grid/data/wf_artic/work/24/9dd3be42bf36ee9ba8124473a00c7e

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Not recognising sample sheet in macbookpro

Hi,
I am trying to run the pipeline in Macbookpro which is successfully running.
But after several tries, I am not able to make it recognize the sample sheet.
I have transformed the sample sheet in txt, CSV, and exec files but with no success.
Please do you have to suggest me something?

Succeded run but not correct output.

Hi. I have installed v0.3.10 and I have changed the parameters as in the updated documentation. Even if I receive a correct execution of the pipeline I do not have the report outputs. I am getting only the work folder. Also I am attaching you 3 screenshots. Please for your help

p

process runArtic silently failing

Hi,

I am running your pipeline using the conda profile like this:

nextflow run epi2me-labs/wf-artic --scheme_name SARS-CoV-2 --scheme_version V1200 --min_len 200 --max_len 1100 --out_dir $output_dir --fastq $fastq_dir -profile conda --samples $samples -work-dir $workdir

And frequently got consensus sequences containing only one "N" for all barcodes although the coverage seems to be decent for most samples. However, for some reason this seemed to happen randomly depending on the $workdir I specified.

Looking at the closed issues on your repo and into the working directories of the failing pipeline I was able to track the error down. I'm working behind a proxy which I did not specify in my working environment, only in my testing environment as environment variable.. So basically medaka failed (similar to #21 , because it was not able to download the required model files (while conda takes the global proxy_servers.http setting for setting up the conda-env).

The error message was:

Traceback (most recent call last):
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/site-packages/medaka/medaka.py", line 35, in __call__
    model_fp = medaka.models.resolve_model(val)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/site-packages/medaka/models.py", line 66, in resolve_model
    raise DownloadError(
medaka.models.DownloadError: The model file for r941_prom_variant_g360 is not already installed and could not be downloaded. Check you are connected to the internet and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/bin/medaka", line 11, in <module>
    sys.exit(main())
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/site-packages/medaka/medaka.py", line 686, in main
    args = parser.parse_args()
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1768, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1800, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1988, in _parse_known_args
    positionals_end_index = consume_positionals(start_index)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1965, in consume_positionals
    take_action(action, args)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1874, in take_action
    action(self, namespace, argument_values, option_string)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1159, in __call__
    subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1800, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 2006, in _parse_known_args
    start_index = consume_optional(start_index)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1946, in consume_optional
    take_action(action, args, option_string)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/argparse.py", line 1874, in take_action
    action(self, namespace, argument_values, option_string)
  File "/srv/sequencer/MiSeqAnalysis_OUT/work/conda/epi2melabs-nf-artic-08dd4ae442f90d9110476d0a93da01f2/lib/python3.8/site-packages/medaka/medaka.py", line 38, in __call__
    raise RuntimeError(msg.format(self.dest, str(e)))
RuntimeError: Error validating model from '--model' argument: The model file for r941_prom_variant_g360 is not already installed and could not be downloaded. Check you are connected to the internet and try again..
Running hacked up minion.py

However, since the process script is out-sourced to the bin/run_artic.sh file, and there is no error handling in this script, it will always return an exit status of 0 and the nextflow process never fails as long as the output files are present. This makes debugging quite difficult for users with no experience in nextflow. I assume that issue #9 was also related to this underlying problem.

Maybe you can think of a cleaner solution like the errorStrategy ignore directive?

error with creating pangolin environment step when using conda

It cannot make the pangolin conda environment when it gets to this step. Have test

Running on s node on our cluster that's running ubuntu 20.04 (focal) in a KVM.

(base) callum@dgt-gpu2:~$ nextflow -version

      N E X T F L O W
      version 21.10.6 build 5660
      created 21-12-2021 16:55 UTC (22-12-2021 01:55 JDT)
      cite doi:10.1038/nbt.3820
      http://nextflow.io

git clone https://github.com/epi2me-labs/wf-artic

Running example data set from this wf-artic repository

(base) callum@dgt-gpu2:~$ nextflow run epi2me-labs/wf-artic -w my_artic_output/workspace -profile conda --fastq wf-artic/test_data/sars-samples-demultiplexed --samples wf-artic/test_data/sample_sheet --out_dir my_artic_output

Running with conda as I have issues with running docker containers as we cannot run with such privileges on our server

N E X T F L O W  ~  version 21.10.6
Launching `epi2me-labs/wf-artic` [serene_ardinghelli] - revision: d875912881 [master]

WARN: Found unexpected parameters:
* --samples: wf-artic/test_data/sample_sheet
- Ignore this warning: params.schema_ignore_params = "samples" 

Core Nextflow options
  revision   : master
  runName    : serene_ardinghelli
  launchDir  : /home/callum
  workDir    : /home/callum/my_artic_output/workspace
  projectDir : /home/callum/.nextflow/assets/epi2me-labs/wf-artic
  userName   : callum
  profile    : conda
  configFiles: /home/callum/.nextflow/assets/epi2me-labs/wf-artic/nextflow.config

Basic Input/Output Options
  out_dir    : my_artic_output
  fastq      : wf-artic/test_data/sars-samples-demultiplexed

Advanced options
  normalise  : 200

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use wf-artic for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x



Checking fastq input.
Barcoded directories detected.
Warning: Excluding directories not containing .fastq(.gz) files:
   - /home/callum/wf-artic/test_data/sars-samples-demultiplexed/barcode13
executor >  local (15)
[03/5a3323] process > pipeline:getVersions    [100%] 1 of 1 ✔
[16/8ca38e] process > pipeline:getParams      [100%] 1 of 1 ✔
[78/8f820d] process > pipeline:copySchemeDir  [100%] 1 of 1 ✔
[90/f85fb0] process > pipeline:preArticQC (2) [100%] 2 of 2 ✔
executor >  local (15)
[03/5a3323] process > pipeline:getVersions    [100%] 1 of 1 ✔
[16/8ca38e] process > pipeline:getParams      [100%] 1 of 1 ✔
[78/8f820d] process > pipeline:copySchemeDir  [100%] 1 of 1 ✔
[90/f85fb0] process > pipeline:preArticQC (2) [100%] 2 of 2 ✔
[83/e713e5] process > pipeline:runArtic (1)   [100%] 2 of 2 ✔
[b2/3ad293] process > pipeline:combineDepth   [100%] 1 of 1 ✔
[47/671bb0] process > pipeline:allConsensus   [100%] 1 of 1 ✔
[05/5ffb11] process > pipeline:allVariants    [100%] 1 of 1 ✔
[2f/e165b5] process > pipeline:prep_nextclade [100%] 1 of 1 ✔
[-        ] process > pipeline:nextclade      -
[76/8a40ee] process > pipeline:pangolin       [100%] 1 of 1, failed: 1 ✘
[-        ] process > pipeline:telemetry      -
[-        ] process > pipeline:report         -
[da/590670] process > output (1)              [100%] 1 of 1
/home/callum/my_artic_output/workspace/59/af49e5a08f55a48fbd1a23f781f35d/barcode01.consensus.fasta
/home/callum/my_artic_output/workspace/83/e713e5ae66f70ae672eef1a3657484/barcode02.consensus.fasta
Creating env using mamba: bioconda::pangolin=3.1.20 conda-forge::git [cache /home/callum/my_artic_output/workspace/conda/env-a349ca5a7dd11bbbf1f68583e35a8e59]
Error executing process > 'pipeline:pangolin'

Caused by:
  Process `pipeline:pangolin` terminated with an error exit status (1)

Command executed:

  if [ "false" == "true" ]
  then
    pangolin --update
  fi
  
  pangolin --all-versions 2>&1 | sed 's/: /,/' > pangolin.version
  pangolin consensus.fasta

Command exit status:
  1

Command output:
  (empty)

Command wrapper:
  Not a conda environment: /home/callum/my_artic_output/workspace/conda/env-a349ca5a7dd11bbbf1f68583e35a8e59

Work dir:
  /home/callum/my_artic_output/workspace/76/8a40ee027e5fa0f902284630721077

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

ResolvePackageNotFound: nextclade-cli[version='>=0.13.0']

Hi,

I am trying to run the pipeline using conda, as described in https://labs.epi2me.io/wfquickstart/
I git clone and cd into the repo, define a value for OUTPUT and use the following command:
nextflow run epi2me-labs/wf-artic
-w ${OUTPUT}/workspace
-profile conda
--fastq test_data/sars-samples-demultiplexed/
--samples test_data/sample_sheet
--out_dir ${OUTPUT}

But get:
Error executing process > 'checkSampleSheet (1)'

Caused by:
Failed to create Conda environment
command: conda env create --prefix /Users/jl/Tools/wf-artic/onTenteLeCoup/workspace/conda/epi2melabs-nf-artic-863315fcdb46d01447deb2e70bf6b9e4 --file /Users/jl/Tools/wf-artic/environment.yaml
status : 1
message:
ResolvePackageNotFound:
- nextclade-cli[version='>=0.13.0']

Could you help me solve this please?
Thanks!

Pangolin version update

Thank you for developing this pipeline! It is amazing especially in the data presentation!

Since pangolin has been updated to v3.1.17, may I know will there be a updated version for wf-artic in the coming days?

Thank you very much!

epi2me/wf-artic version check

Hello,

How can we check which version of wf-artic is running on the computer?

Thanks,

Liana

docker image can't be built Dockerfile

I am getting the following error when building the docker. I split up the RUN statement to focus on the error region:
Step 6/8 : RUN . $CONDA_DIR/etc/profile.d/mamba.sh && micromamba activate && micromamba clean --all --yes
---> Running in 2e684fa6976a
The following arguments were not expected: --yes --all clean
Run with --help for more information.

It looks like the micromamba Version: 0.7.13 gets installed automatically and is missing the clean subcommand:
Subcommands:
shell Generate shell init scripts
create Create new environment
install Install packages in active environment
remove Remove packages from active environment
list List packages in active environment
constructor Commands to support using micromamba in constructor

It looks like version 0.13.1 and later gets the clean command (April 23 2021 commit) though the versioning is less than clear with micromamba.

minions cannot keep up demultiplex in realtime

In all of our experiments Minions cannot keep up with live-basecalling and demultiplexing in realtime. When this happens, the software stops live basecalling and demultiplexing and sends all the reads to fast5_skip folder. Every one of our runs has sent >99% of the reads are sent to fast5_skip.

When we turn off demultiplexing, the live-basecalling can keep up and the reads are sent to fastq_pass or fastq_fail, though not demultiplexed, we at least get a lot of the computational processing out of the way.

With any sequencer, there is a possibility that software does not keep up and the reads are sent to fast5_skip.

This workflow will not address that issue as it assumes an error free run that potentially will miss considering reads.

This means we either have to build a pipeline to pre-process the reads (basecall and/or demultiplex). We would need the option to run a demultiplexing step for this pipeline so we can continue to use the pipeline for our Minions.

Additionally, it would be nice to see an example of software and arguments needed to fulfill the following criteria (guppy_barcoder?):
"The Midnight protocol uses a rapid barcoding kit; it is therefore important to note that the demultiplexing step must not require barcodes at both ends of the sequence. It is also not necessary to filter against mid-strand barcodes."

Required internet during the pipeline analysis

Please we face the above issue.With no connection internet cannot proceed the pipeline.

WARN: Found unexpected parameters: --scheme_dir: primer_schemes, cannot run test_data

Dear wf-artic dev team & fellow users,

I cannot run the test_data nor own data, while docker test is working (docker run hello-world) as well as the command:

nextflow run epi2me-labs/wf-artic --help --show_hidden_params
N E X T F L O W ~ version 22.04.4
Launching https://github.com/epi2me-labs/wf-artic [deadly_joliot] DSL2 - revision: 218aa1d [master]
Typical pipeline command:

nextflow run epi2me-labs/wf-artic \
--fastq test_data/sars-samples-demultiplexed \
--sample_sheet test_data/sample_sheet.csv

I tried a couple of different parameters but first always the warning shows:
WARN: Found unexpected parameters:

--scheme_dir: primer_schemes

Then the process hangs up on the first barcode,

nextflow run epi2me-labs/wf-artic -profile standard --fastq test_data/fastq/ --sample_sheet test_data/sample_sheet.csv
N E X T F L O W ~ version 22.04.4
Launching https://github.com/epi2me-labs/wf-artic [reverent_spence] DSL2 - revision: 218aa1d [master]

WARN: Found unexpected parameters:

--scheme_dir: primer_schemes

Ignore this warning: params.schema_ignore_params = "scheme_dir"

Core Nextflow options
revision : master
runName : reverent_spence
containerEngine: docker
launchDir : /home/.../wf-artic/test
workDir : /home/xyz/BioinformaticsPrograms/wf-artic/test/work
projectDir : /home/xyz/.nextflow/assets/epi2me-labs/wf-artic
userName : xyz
profile : standard
configFiles : /home/xyz/.nextflow/assets/epi2me-labs/wf-artic/nextflow.config

Basic Input/Output Options
fastq : test_data/fastq/
sample_sheet : test_data/sample_sheet.csv

Advanced options
normalise : 200

!! Only displaying parameters that differ from the pipeline defaults !!

If you use epi2me-labs/wf-artic for your analysis please cite:

The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
```
------------------------------------
Available Primer Schemes:
------------------------------------
```
Name Version
SARS-CoV-2 NEB-VarSkip/v2b
SARS-CoV-2 NEB-VarSkip/v1a
SARS-CoV-2 NEB-VarSkip/v2
SARS-CoV-2 NEB-VarSkip/v1a-long
SARS-CoV-2 ARTIC/V2
SARS-CoV-2 ARTIC/V4.1
SARS-CoV-2 ARTIC/V3
SARS-CoV-2 ARTIC/V1
SARS-CoV-2 ARTIC/V4
SARS-CoV-2 Midnight-ONT/V2
SARS-CoV-2 Midnight-ONT/V3
SARS-CoV-2 Midnight-ONT/V1
SARS-CoV-2 Midnight-IDT/V1
spike-seq ONT/V4.1
spike-seq ONT/V1
```
------------------------------------
```

Checking fastq input.
Barcoded directories detected.
Checking sample sheet.
executor > local (4)
[9f/9ec1ad] process > checkSampleSheet [ 0%] 0 of 1
executor > local (4)
[- ] process > checkSampleSheet -
[- ] process > pipeline:getVersions -
[- ] process > pipeline:getParams -
[- ] process > pipeline:copySchemeDir [ 0%] 0 of 1
[- ] process > pipeline:preArticQC -
[- ] process > pipeline:runArtic -
[- ] process > pipeline:combineDepth -
[- ] process > pipeline:allConsensus -
[- ] process > pipeline:allVariants -
[- ] process > pipeline:prep_nextclade -
[- ] process > pipeline:nextclade -
[- ] process > pipeline:pangolin -
[- ] process > pipeline:telemetry -
[- ] process > pipeline:report -
[- ] process > output -

output/execution/trace.txt:
task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar
5 3d/862548 322296 pipeline:preArticQC (1) ABORTED - 2022-06-21 00:18:39.505 - - - - - - -
6 55/239d93 322321 pipeline:preArticQC (2) ABORTED - 2022-06-21 00:18:39.527 - - - - - - -
3 30/c21009 322346 pipeline:prep_nextclade ABORTED - 2022-06-21 00:18:39.539 - - - - - - -
1 be/983785 322374 pipeline:getParams ABORTED - 2022-06-21 00:18:39.549 - - - - - - -
2 a1/3f1423 322410 pipeline:getVersions ABORTED - 2022-06-21 00:18:39.560 - - - - - - -
4 05/7e1ddb 322442 pipeline:copySchemeDir ABORTED - 2022-06-21 00:18:39.571 - - - - - - -
~

What's missing here to make it work?

Thanks for your help in advance.
Cheers,
Katharina

Issue at the report stage

Good morning.
By running the pipeline we have faced a new issue
I attach screenshot. Can you help to us please

?

run wf-artic with conda --> unable to create Conda environment

Hey!

I want to run the artic wf via conda and ran into multiple problems. Some I could solve myself but still want to bring them to your attention and one I could not and hope for help:

When I downloaded the latest version it had really weird user rights, in the sense that me as a user was not even able to execute it. I changed this via chmod command. I also had to do it in the subfolders.
when I ran: nextflow run ./main.nf -profile conda --fastq /ngs_rohdaten/fastq_pass/ --sample_sheet /ngs_rohdaten/fastq_pass/sample_sheet.csv --scheme_version V4.1 --update_data

I got the following message:

WARN: Access to undefined parameter `samples` -- Initialise it to a default value eg. `params.samples = some_value`
Unparseable date: "2021-12-09T18_09_18Z"

Not sure why this nextclade_data_tag is unparsable but the earlier once (e.g 2021-06-25T00_00_00Z) seem to work fine. I solved this problem by manually adding the flag --nextclade_data_tag 2022-01-05T19_54_31 because I thought using the most recent is the best. Is this correct or do you have a better idea?

what I could not solve:

When now running the wf (nextflow run ./main.nf -profile conda --fastq /ngs_rohdaten/fastq_pass/ --sample_sheet /ngs_rohdaten/fastq_pass/sample_sheet.csv --scheme_version V4.1 --nextclade_data_tag 2022-01-05T19_54_31Z) I get the following error:

WARN: Access to undefined parameter `samples` -- Initialise it to a default value eg. `params.samples = some_value`
Checking fastq input.
Barcoded directories detected.
Checking sample sheet.
executor >  local (93)
[76/02902b] process > checkSampleSheet         [100%] 1 of 1 ✔
[22/dbed04] process > pipeline:getVersions     [100%] 1 of 1 ✔
[40/98a28f] process > pipeline:getParams       [100%] 1 of 1 ✔
[04/d42fee] process > pipeline:copySchemeDir   [100%] 1 of 1 ✔
[bc/44301b] process > pipeline:preArticQC (12) [100%] 13 of 13 ✔
[80/0549f6] process > pipeline:runArtic (7)    [100%] 13 of 13 ✔
[c8/5c9338] process > pipeline:allConsensus    [100%] 1 of 1 ✔
[90/255c6c] process > pipeline:allVariants     [100%] 1 of 1 ✔
[03/b5bd74] process > pipeline:prep_nextclade  [100%] 1 of 1 ✔
[cc/33a289] process > pipeline:nextclade       [100%] 1 of 1 ✔
[-        ] process > pipeline:pangolin        -
[bc/86cdfe] process > pipeline:telemetry       [100%] 1 of 1 ✔
[-        ] process > pipeline:report          -
[1e/f36b17] process > output (58)              [100%] 58 of 58

Error executing process > 'pipeline:pangolin'

Caused by:
  Failed to create Conda environment
  command: conda create --mkdir --yes --quiet --prefix /data/SARS-CoV-2/ONT_artic_wf_aktuell/wf-artic-master/work/conda/env-d26fa193d248355ac3ce39abc12fab9e bioconda::pangolin=3.1.17 conda-forge::git
  status : 143
  message:

but the error message is empty. Could somebody help me? That would be really great!

Summary for depth of coverage file is grouping by 20's not by primer or singles

The depth of coverage files that are outputted groups by 20's, this is odd as the rapid kit cuts up the reads, and the summation will give a misconception on what may be going on at import read sites. This means the user must regenerate the files manually with the non summarized values to conduct analysis such as getting a missing position count in the consensus sequence and conduct troubleshooting that information may need.

Sample sheet with integer sample names error

Hello @cjw85, thanks to make the project available.
Every time we have a sample name with just numbers inside our sample sheet for the --sample parameter with other sample names as a string the pipeline crashes in the report step.

Samplesheet

barcode,sample_name
barcode01,sample01
barcode02,20210818
barcode03,sample03
barcode04,sample04
barcode05,sample05

Command example and error

/path/to/nextflow run /path/to/wf-artic/main.nf --fastq /path/to/fastq_pass --out_dir /path/to/output -w /path/to/work/ --samples /path/to/samplesheet.csv -resume
N E X T F L O W  ~  version 20.10.0
Launching `/path/to/wf-artic/main.nf` [desperate_archimedes] - revision: f6f86e18c7
WARN: Access to undefined parameter `detect_samples` -- Initialise it to a default value eg. `params.detect_samples = some_value`

Parameter summary
=================
    help: false
    out_dir: /path/to/output
    fastq: /path/to/fastq_pass
    sanitize_fastq: false
    samples: /path/to/samplesheet.csv
    report_depth: 100
    medaka_model: r941_prom_variant_g360
    scheme_name: SARS-CoV-2
    scheme_version: V3
    genotype_variants: null
    report_clade: true
    report_lineage: true
    report_coverage: true
    report_variant_summary: true
    wfversion: v0.3.2
    aws_image_prefix: null
    aws_queue: null
    lab_id: null
    testkit: null
    _min_len: 400
    _max_len: 700

Checking sample sheet.
Checking input directory structure.
[-        ] process > checkSampleSheet -
executor >  local (30)
[c9/d4bd0d] process > checkSampleSheet (1)    [100%] 1 of 1 ✔
[98/bc48a2] process > pipeline:get_versions   [100%] 1 of 1, cached: 1 ✔
[f2/faa4fb] process > pipeline:copySchemeDir  [100%] 1 of 1, cached: 1 ✔
[f1/d3bd35] process > pipeline:preArticQC (4) [100%] 5 of 5 ✔
[48/c30c77] process > pipeline:runArtic (4)   [100%] 5 of 5 ✔
[2a/7b1c6a] process > pipeline:allConsensus   [100%] 1 of 1 ✔
[9d/d3aee4] process > pipeline:allVariants    [100%] 1 of 1 ✔
[07/9574bd] process > pipeline:nextclade      [100%] 1 of 1 ✔
[d4/2a244a] process > pipeline:pangolin       [100%] 1 of 1 ✔
[ee/090cbc] process > pipeline:report         [  0%] 0 of 1
[b6/01e9b2] process > output (4)              [100%] 14 of 14
/path/to/work/2a/7b1c6a28ac090e0560f0f3398a725c/all_consensus.fasta
/path/to/work/2a/7b1c6a28ac090e0560f0f3398a725c/consensus_status.txt
/path/to/work/9d/d3aee4ebfca7cec6fe7d45ded62ef2/all_variants.vcf.gz
/path/to/work/9d/d3aee4ebfca7cec6fe7d45ded62ef2/all_variants.vcf.gz.tbi
/path/to/work/b4/7019d23562194a22d51d6e03d93581/sample03.primertrimmed.rg.sorted.bam
/path/to/work/b4/7019d23562194a22d51d6e03d93581/sample03.primertrimmed.rg.sorted.bam.bai
/path/to/work/df/14968a22ca46e3a526a456769b0352/sample05.primertrimmed.rg.sorted.bam
/path/to/work/df/14968a22ca46e3a526a456769b0352/sample05.primertrimmed.rg.sorted.bam.bai
/path/to/work/17/fbabd0fa52c6649c1307d1b4356343/sample01.primertrimmed.rg.sorted.bam
/path/to/work/17/fbabd0fa52c6649c1307d1b4356343/sample01.primertrimmed.rg.sorted.bam.bai
/path/to/work/80/30b97c5b5ec73f7c9fdf715c076e6c/20210818.primertrimmed.rg.sorted.bam
/path/to/work/80/30b97c5b5ec73f7c9fdf715c076e6c/20210818.primertrimmed.rg.sorted.bam.bai
/path/to/work/48/c30c77e0b749d84d27086b2b8423e0/sample04.primertrimmed.rg.sorted.bam
/path/to/work/48/c30c77e0b749d84d27086b2b8423e0/sample04.primertrimmed.rg.sorted.bam.bai

Error executing process > 'pipeline:report'

Caused by:
  Process `pipeline:report` terminated with an error exit status (1)

Command executed:

  echo "--pangolin pangolin.csv"
      echo "--nextclade nextclade.json"
      echo "help,false
  out_dir,/path/to/output
  fastq,/path/to/fastq_pass
  sanitize_fastq,false
  samples,/path/to/samplesheet.csv
  report_depth,100
  medaka_model,r941_prom_variant_g360
  scheme_name,SARS-CoV-2
  scheme_version,V3
  genotype_variants,null
  report_clade,true
  report_lineage,true
  report_coverage,true
  report_variant_summary,true
  wfversion,v0.3.2
  aws_image_prefix,null

executor >  local (30)
[c9/d4bd0d] process > checkSampleSheet (1)    [100%] 1 of 1 ✔
[98/bc48a2] process > pipeline:get_versions   [100%] 1 of 1, cached: 1 ✔
[f2/faa4fb] process > pipeline:copySchemeDir  [100%] 1 of 1, cached: 1 ✔
[f1/d3bd35] process > pipeline:preArticQC (4) [100%] 5 of 5 ✔
[48/c30c77] process > pipeline:runArtic (4)   [100%] 5 of 5 ✔
[2a/7b1c6a] process > pipeline:allConsensus   [100%] 1 of 1 ✔
[9d/d3aee4] process > pipeline:allVariants    [100%] 1 of 1 ✔
[07/9574bd] process > pipeline:nextclade      [100%] 1 of 1 ✔
[d4/2a244a] process > pipeline:pangolin       [100%] 1 of 1 ✔
[ee/090cbc] process > pipeline:report         [100%] 1 of 1, failed: 1 ✘
[b6/01e9b2] process > output (4)              [100%] 14 of 14 ✔
/path/to/work/2a/7b1c6a28ac090e0560f0f3398a725c/all_consensus.fasta
/path/to/work/2a/7b1c6a28ac090e0560f0f3398a725c/consensus_status.txt
/path/to/work/9d/d3aee4ebfca7cec6fe7d45ded62ef2/all_variants.vcf.gz
/path/to/work/9d/d3aee4ebfca7cec6fe7d45ded62ef2/all_variants.vcf.gz.tbi
/path/to/work/b4/7019d23562194a22d51d6e03d93581/sample03.primertrimmed.rg.sorted.bam
/path/to/work/b4/7019d23562194a22d51d6e03d93581/sample03.primertrimmed.rg.sorted.bam.bai
/path/to/work/df/14968a22ca46e3a526a456769b0352/sample05.primertrimmed.rg.sorted.bam
/path/to/work/df/14968a22ca46e3a526a456769b0352/sample05.primertrimmed.rg.sorted.bam.bai
/path/to/work/17/fbabd0fa52c6649c1307d1b4356343/sample01.primertrimmed.rg.sorted.bam
/path/to/work/17/fbabd0fa52c6649c1307d1b4356343/sample01.primertrimmed.rg.sorted.bam.bai
/path/to/work/80/30b97c5b5ec73f7c9fdf715c076e6c/20210818.primertrimmed.rg.sorted.bam
/path/to/work/80/30b97c5b5ec73f7c9fdf715c076e6c/20210818.primertrimmed.rg.sorted.bam.bai
/path/to/work/48/c30c77e0b749d84d27086b2b8423e0/sample04.primertrimmed.rg.sorted.bam
/path/to/work/48/c30c77e0b749d84d27086b2b8423e0/sample04.primertrimmed.rg.sorted.bam.bai


Error executing process > 'pipeline:report'

Caused by:
  Process `pipeline:report` terminated with an error exit status (1)

Command executed:

  echo "--pangolin pangolin.csv"
      echo "--nextclade nextclade.json"
      echo "help,false
  out_dir,/path/to/output
  fastq,/path/to/fastq_pass
  sanitize_fastq,false
  samples,/path/to/samplesheet.csv
  report_depth,100
  medaka_model,r941_prom_variant_g360
  scheme_name,SARS-CoV-2
  scheme_version,V3
  genotype_variants,null
  report_clade,true
  report_lineage,true
  report_coverage,true
  report_variant_summary,true
  wfversion,v0.3.2
  aws_image_prefix,null
  aws_queue,null
  lab_id,null
  testkit,null
  _min_len,400
  _max_len,700
  full_scheme_name,SARS-CoV-2/V3" > params.csv
      report.py         consensus_status.txt wf-artic-report.html         --pangolin pangolin.csv --nextclade nextclade.json           --revision null --params params.csv --commit null         --min_len 400 --max_len 700 --report_depth         100 --depths depth_stats/* --summaries read_stats/*         --bcftools_stats vcf_stats/*          --versions versions

Command exit status:
  1

Command output:
  --pangolin pangolin.csv
  --nextclade nextclade.json

Command error:
  Traceback (most recent call last):
    File "/path/to/wf-artic/bin/report.py", line 371, in <module>
      main()
    File "/path/to/wf-artic/bin/report.py", line 163, in main
      pd.DataFrame(good_reads['sample_name'].value_counts())
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/frame.py", line 5582, in sort_index
      return super().sort_index(
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/generic.py", line 4537, in sort_index
      indexer = get_indexer_indexer(
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/sorting.py", line 87, in get_indexer_indexer
      indexer = nargsort(
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/sorting.py", line 368, in nargsort
      return items.argsort(ascending=ascending, kind=kind, na_position=na_position)
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/arrays/base.py", line 586, in argsort
      return nargsort(
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/sorting.py", line 380, in nargsort
      indexer = non_nan_idx[non_nans.argsort(kind=kind)]
  TypeError: '<' not supported between instances of 'int' and 'str'

Work dir:
  /path/to/work/ee/090cbcb4dd8b06d739e9fb6b0bd399

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.

Could you help us with that?

No Artic output bam file found for primer set 1, Artic failed.

We have been experiencing the following problem since the last update.

Writing log file to: /epi2melabs/ncov_tutorial/analysis/artic/barcode01/run0_artic.log.
Running artic guppyplex to filter reads

artic guppyplex finished
Running artic minion --medaka to call variants
Running alignment QC
error - No Artic output bam file found for primer set 1, Artic failed.
error - No Artic output bam file found for primer set 2, Artic failed.
Results for barcode01 can be found at /epi2melabs/ncov_tutorial/analysis/artic/barcode01/run0
ARTIC finished for: barcode01

Process `pipeline:getVersions` terminated with an error exit status (141)

I have encountered some inconsistencies when running the wf-artic pipeline.
Sometimes the pipeline is running without any issues, wheres on other occasions I get the error below, which appears to be caused by a failed pipeline:getVersions, and I am not sure what is causing it?

Does anyone know what the issue could be?

Thanks a lot in advance.

$nextflow run epi2me-labs/wf-artic -w ${OUTPUT}/workspace -profile conda --fastq /path/to/reads/fastq_demultiplexed --samples /path/to/sample/sheet/sample_sheet --out_dir ${OUTPUT}
N E X T F L O W ~ version 21.10.4
WARN: Access to undefined parameter detect_samples -- Initialise it to a default value eg. params.detect_samples = some_value
Checking sample sheet.
executor > local (8)
[8d/de2916] process > checkSampleSheet (1) [100%] 1 of 1 ✔
executor > local (8)
[8d/de2916] process > checkSampleSheet (1) [100%] 1 of 1 ✔
[88/63dc7b] process > pipeline:getVersions [100%] 1 of 1, failed: 1 ✘
[1a/bf87f8] process > pipeline:getParams [100%] 1 of 1 ✔
[35/7951d4] process > pipeline:copySchemeDir [100%] 1 of 1 ✔
[c4/64a846] process > pipeline:preArticQC (3) [ 0%] 0 of 1
[- ] process > pipeline:runArtic [ 0%] 0 of 5
[- ] process > pipeline:allConsensus -
[- ] process > pipeline:allVariants -
[- ] process > pipeline:nextclade -
[- ] process > pipeline:pangolin -
[- ] process > pipeline:telemetry -
[- ] process > pipeline:report -
[- ] process > output -
Checking input directory structure.

Found barcode directories
Error executing process > 'pipeline:getVersions'

Caused by:
Process pipeline:getVersions terminated with an error exit status (141)

Command executed:

Command exit status:
141

Command output:
(empty)

Work dir:
${OUTPUT}/workspace/88/63dc7bd7dd452b057b7b440902af74

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.

More primer sets

It would be good to have more primer sets available to users.
e.g. varskip, artic v4.1 spike in and replacement primers.

Would a PR for this be helpful?

Artic Analysis Failed

Hello,
We are trying to run the pipeline using V3/V4/V1200 primer,all workflow succeeded,but artic analysis failed.So we only got one “N” in "all_consensus.fasta" file,have you encountered this situation before,and how to fix?
Another question,we used Artic’s fieldbioinformatics workflow directly.We don’t know if it is still suitable for the rapid library of nanopore V1200 primer.Did you modify the primer removal module when you called the artic process? Can we still directly use the artic process by adding a V1200 primer file?

FR: add pangolin --usher parameter as workflow parameter

Hi,

we are seeing some new sub-lineages of BA.1 which are not classified by pangoLEARN currently (at least in our data) even though they are in the latest designation dataset. However, pangolin also comes with the alternative classifier UShER, which is reported to produce more stable results (over time) and to have a higher accuracy in general.

So I was wondering if you could add this as a parameter to your pipeline that we can make pangolin to use UShER instead of pangoLEARN.

Thanks in advance!

best,
Patrick

Downsampling algorithm over-downsamples at positions causing lower depth of coverage, and masking positions.

For midnight, We have a sample that had at least 20 depth of coverage at every positions (when i changed the hardcoded script to not downsample). However, with the downsampling turned on we had a region of 20-26 depth of coverage across 75 positions that dropped to 16-19, and then is masked. Unlike Artic V3/4, midnight/rapid chops up the reads so each read does not get completely cover the primer region. However it appears, the downsampling will not take that into consideration, and as soon as a position between the primers reaches the hardcoded depth of coverage of 200 (or 200 reads map to the primer region) it will not consider anymore reads. Which, means these boarder line read coverage regions are going to be systematically masked when coverage is close to 20, when if the algorithm was allowed to continue, it would not have been masked. This is not an isolated event and occurred on several samples in a single ONT run. And since the primers amplify in a similar ratio, we will systematically mask certain regions more often.

Coverage drop in the first amplicon

Hi
Thank you for developing this amazing pipeline!
We have been using this pipeline for analyzing ONT sequencing data which generated from the Midnight protocol!

But one thing is we observed that there is always a drop in the front of amplicon 1 when running wf-artic in all of samples we have sequenced (< 500). May I know is it normal or how could we solve this problem?

Thank you very much!

epi2me-labs / wf-artic Goto Github PK

wf-artic's Introduction

Artic Network SARS-CoV-2 Analysis

Introduction

Compute requirements

Install and run

Related protocols

Input example

Input parameters

Input Options

Primer Scheme Selection

Sample Options

Output Options

Reporting Options

Advanced Options

Miscellaneous Options

Outputs

Pipeline overview

1. Concatenates input files and generate per read stats.

2. Mapping and primer trimming (Artic)

3. Variant calling and consensus generation (Artic)

4. Lineage/clade assignment

Troubleshooting

FAQ's

Related blog posts

wf-artic's People

Contributors

Stargazers

Watchers

Forkers

wf-artic's Issues

!! Only displaying parameters that differ from the pipeline defaults !!

Parameter summary

!! Only displaying parameters that differ from the pipeline defaults !!

!! Only displaying parameters that differ from the pipeline defaults !!

Recommend Projects

Recommend Topics

Recommend Org