This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.
qiime2 / q2-cutadapt Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.
Bug Description
There are three errors in our test suite that seem to be related to changes in the latest cutadapt release. A test failure can be seen in this busywork run. All of the errors are in test_demux.py::TestDemuxSingle
.
TestDemuxSingle.test_none_matched
may be related to marcelm/cutadapt#478.
References
Improvement Description
Is there a reason the -j option for cutadapt v3.0 isn't used which support multicore runs for cutadapt
Current Behavior
Single-thread cutadapt (default -j 1) is what current runs
Proposed Behavior
add a CLI option to specify a core / thread count to use
Questions
is there any reason not to?
References
forum xref
The parameter description for adapter
should be updated in all methods to indicate how to use the linked primer syntax.
This recently came up on the forum.
Hi,
I've encountered a weird issue that I can't figure out - hopefully one of you can! Here's what I did:
qiime cutadapt trim-paired
--i-demultiplexed-sequences POTATOE-BACT.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--o-trimmed-sequences POTATOE-BACT_primers_trimmed.qza
--verbose
I noticed the issue when I looked at the standard output from this command. There were 6 files out of 37 that said no adapters were detected (in this case primers). I checked manually, and sure enough they were there in the fastq files. A simple grep search showed that most of the reads had the fwd/rev primer in them.
So, then I tried importing two of the samples separately (1 that worked and 1 that didn't). This time, the issue didn't occur... the file that failed before had its primers detected successfully.
I've also confirmed that cutadapt standalone does not have this issue. I have included some of the log information below. I'm running the same cutadapt that qiime has access to (qiime2-2018.2, installed from conda, both were run from same environment).
As far as I can tell the options are more or less the same, so I'm not sure how I am getting this unpredictable behaviour.
Upon closer inspection of the logs, it appears that somehow the CLI arguments are getting messed up such that R1 and R2 are switched in these samples in qiime2 (see below) which would explain why it's not finding primers. I don't understand how this can be the case as the manifest file I used for the initial batch test looks fine to me (pasted at the very bottom) and it works for the majority of samples. And I also ran another test where I pulled out one of the offending files from the qza archive and ran it with standalone cutadapt and didn't get this issue. So to me, it seems like qiime2 is importing it correctly according to the manifest but somehow the instructions to cutadapt are getting garbled. But I would love this to be a simple mistake on my part but I just can't see where it is...
Thanks in advance for your help,
Jesse
Log for failed file (batch import of 37 samples to qiime2 then cutadapt run on all together with qiime2):
This is cutadapt 1.15 with Python 3.5.5
Command line parameters: --cores 1 --error-rate 0.1 --times 1 --overlap 3 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-r6zhwz2c/stationP5_41_L001_R2_001.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-r6zhwz2c/stationP5_4_L001_R1_001.fastq.gz --front CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC /tmp/qiime2-archive-gwcyl56w/08e4e960-f54e-4038-86e5-486610afe00b/data/stationP5_41_L001_R2_001.fastq.gz /tmp/qiime2-archive-gwcyl56w/08e4e960-f54e-4038-86e5-486610afe00b/data/stationP5_4_L001_R1_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 18.84 s (111 us/read; 0.54 M reads/minute).
=== Summary ===
Total read pairs processed: 170,296
Read 1 with adapter: 3 (0.0%)
Read 2 with adapter: 6 (0.0%)
Pairs written (passing filters): 170,296 (100.0%)
Total basepairs processed: 85,488,592 bp
Read 1: 42,744,296 bp
Read 2: 42,744,296 bp
Total written (filtered): 85,488,496 bp (100.0%)
Read 1: 42,744,254 bp
Read 2: 42,744,242 bp
=== First read: Adapter 1 ===
Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 3 times.
No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1
Overview of removed sequences
length count expect max.err error counts
8 1 2.6 0 1
17 2 0.0 1 2
=== Second read: Adapter 2 ===
Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 6 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Overview of removed sequences
length count expect max.err error counts
3 4 2660.9 0 4
21 2 0.0 2 0 2
Log for successful run where I imported the failed file in a smaller batch (only 2 samples) and it worked inexplicably:
This is cutadapt 1.15 with Python 3.5.5
Command line parameters: -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o stationP5outR1.fastq.gz -p stationP5outR2.fastq.gz stationP5_4_L001_R1_001.fastq.gz stationP5_41_L001_R2_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 14.72 s (86 us/read; 0.69 M reads/minute).
=== Summary ===
Total read pairs processed: 170,296
Read 1 with adapter: 169,781 (99.7%)
Read 2 with adapter: 169,584 (99.6%)
Pairs written (passing filters): 170,296 (100.0%)
Total basepairs processed: 85,488,592 bp
Read 1: 42,744,296 bp
Read 2: 42,744,296 bp
Total written (filtered): 312,231 bp (0.4%)
Read 1: 131,759 bp
Read 2: 180,472 bp
=== First read: Adapter 1 ===
Sequence: CCTACGGGNGGCWGCAG; Type: regular 3'; Length: 17; Trimmed: 169781 times.
No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.0%
T: 0.0%
none/other: 100.0%
Overview of removed sequences
length count expect max.err error counts
3 6 2660.9 0 6
4 4 665.2 0 4
250 18 0.0 1 15 3
251 169753 0.0 1 162055 7698
=== Second read: Adapter 2 ===
Sequence: GACTACHVGGGTATCTAATCC; Type: regular 3'; Length: 21; Trimmed: 169584 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.0%
T: 0.0%
none/other: 100.0%
Overview of removed sequences
length count expect max.err error counts
3 4 2660.9 0 4
5 1 166.3 0 1
9 2 0.6 0 0 2
244 1 0.0 2 1
248 4 0.0 2 0 0 4
250 19 0.0 2 10 9
251 169553 0.0 2 164116 5030 407
Log for successful run of same file with standalone cutadapt:
This is cutadapt 1.15 with Python 3.5.5
Command line parameters: --discard-untrimmed -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC -o 3237-WHJ-0005_S5_L001_R1_primers-trimmed.fastq.gz -p 3237-WHJ-0005_S5_L001_R2_primers-trimmed.fastq.gz 3237-WHJ-0005_S5_L001_R1_001.fastq.gz 3237-WHJ-0005_S5_L001_R2_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 14.24 s (84 us/read; 0.72 M reads/minute).
=== Summary ===
Total read pairs processed: 170,296
Read 1 with adapter: 169,895 (99.8%)
Read 2 with adapter: 169,605 (99.6%)
Pairs written (passing filters): 169,282 (99.4%)
Total basepairs processed: 85,488,592 bp
Read 1: 42,744,296 bp
Read 2: 42,744,296 bp
Total written (filtered): 78,550,377 bp (91.9%)
Read 1: 39,613,870 bp
Read 2: 38,936,507 bp
=== First read: Adapter 1 ===
Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 169895 times.
No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1
Overview of removed sequences
length count expect max.err error counts
3 85 2660.9 0 85
11 1 0.0 1 1
14 4 0.0 1 2 2
15 24 0.0 1 9 15
16 943 0.0 1 203 740
17 168558 0.0 1 162055 6503
18 280 0.0 1 15 265
=== Second read: Adapter 2 ===
Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 169605 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Overview of removed sequences
length count expect max.err error counts
15 2 0.0 1 0 2
16 3 0.0 1 0 3
17 13 0.0 1 10 3
18 20 0.0 1 3 17
19 103 0.0 1 4 5 94
20 1748 0.0 2 86 1604 58
21 167289 0.0 2 164116 2972 201
22 420 0.0 2 10 379 31
23 2 0.0 2 0 0 2
25 4 0.0 2 0 0 4
28 1 0.0 2 1
My manifest file:
sample-id,absolute-filepath,direction
stationP1,$PWD/BACT-341F-805R/3237-WHJ-0001_S1_L001_R1_001.fastq.gz,forward
stationP2,$PWD/BACT-341F-805R/3237-WHJ-0002_S2_L001_R1_001.fastq.gz,forward
stationP3,$PWD/BACT-341F-805R/3237-WHJ-0003_S3_L001_R1_001.fastq.gz,forward
stationP4,$PWD/BACT-341F-805R/3237-WHJ-0004_S4_L001_R1_001.fastq.gz,forward
stationP5,$PWD/BACT-341F-805R/3237-WHJ-0005_S5_L001_R1_001.fastq.gz,forward
stationP6,$PWD/BACT-341F-805R/3237-WHJ-0006_S6_L001_R1_001.fastq.gz,forward
stationP7,$PWD/BACT-341F-805R/3237-WHJ-0007_S7_L001_R1_001.fastq.gz,forward
stationP8,$PWD/BACT-341F-805R/3237-WHJ-0008_S8_L001_R1_001.fastq.gz,forward
stationP9,$PWD/BACT-341F-805R/3237-WHJ-0009_S9_L001_R1_001.fastq.gz,forward
stationP10,$PWD/BACT-341F-805R/3237-WHJ-0010_S10_L001_R1_001.fastq.gz,forward
stationP11,$PWD/BACT-341F-805R/3237-WHJ-0011_S11_L001_R1_001.fastq.gz,forward
stationP12,$PWD/BACT-341F-805R/3237-WHJ-0012_S12_L001_R1_001.fastq.gz,forward
stationP13,$PWD/BACT-341F-805R/3237-WHJ-0013_S13_L001_R1_001.fastq.gz,forward
stationP14,$PWD/BACT-341F-805R/3237-WHJ-0014_S14_L001_R1_001.fastq.gz,forward
stationP15,$PWD/BACT-341F-805R/3237-WHJ-0015_S15_L001_R1_001.fastq.gz,forward
stationP16,$PWD/BACT-341F-805R/3237-WHJ-0016_S16_L001_R1_001.fastq.gz,forward
stationP17,$PWD/BACT-341F-805R/3237-WHJ-0017_S17_L001_R1_001.fastq.gz,forward
stationP18,$PWD/BACT-341F-805R/3237-WHJ-0018_S18_L001_R1_001.fastq.gz,forward
stationP20,$PWD/BACT-341F-805R/3237-WHJ-0019_S19_L001_R1_001.fastq.gz,forward
stationP21,$PWD/BACT-341F-805R/3237-WHJ-0020_S20_L001_R1_001.fastq.gz,forward
stationP22,$PWD/BACT-341F-805R/3237-WHJ-0021_S21_L001_R1_001.fastq.gz,forward
stationP23,$PWD/BACT-341F-805R/3237-WHJ-0022_S22_L001_R1_001.fastq.gz,forward
stationP24,$PWD/BACT-341F-805R/3237-WHJ-0023_S23_L001_R1_001.fastq.gz,forward
stationP25,$PWD/BACT-341F-805R/3237-WHJ-0024_S24_L001_R1_001.fastq.gz,forward
stationP26,$PWD/BACT-341F-805R/3237-WHJ-0025_S25_L001_R1_001.fastq.gz,forward
stationP27,$PWD/BACT-341F-805R/3237-WHJ-0026_S26_L001_R1_001.fastq.gz,forward
stationP28,$PWD/BACT-341F-805R/3237-WHJ-0027_S27_L001_R1_001.fastq.gz,forward
stationP29,$PWD/BACT-341F-805R/3237-WHJ-0028_S28_L001_R1_001.fastq.gz,forward
stationP30,$PWD/BACT-341F-805R/3237-WHJ-0029_S29_L001_R1_001.fastq.gz,forward
stationP31,$PWD/BACT-341F-805R/3237-WHJ-0030_S30_L001_R1_001.fastq.gz,forward
stationM1,$PWD/BACT-341F-805R/3237-WHJ-0031_S31_L001_R1_001.fastq.gz,forward
stationM2,$PWD/BACT-341F-805R/3237-WHJ-0032_S32_L001_R1_001.fastq.gz,forward
stationM3,$PWD/BACT-341F-805R/3237-WHJ-0033_S33_L001_R1_001.fastq.gz,forward
stationM4,$PWD/BACT-341F-805R/3237-WHJ-0034_S34_L001_R1_001.fastq.gz,forward
stationM5,$PWD/BACT-341F-805R/3237-WHJ-0035_S35_L001_R1_001.fastq.gz,forward
stationM6,$PWD/BACT-341F-805R/3237-WHJ-0036_S36_L001_R1_001.fastq.gz,forward
stationM7,$PWD/BACT-341F-805R/3237-WHJ-0037_S37_L001_R1_001.fastq.gz,forward
stationP1,$PWD/BACT-341F-805R/3237-WHJ-0001_S1_L001_R2_001.fastq.gz,reverse
stationP2,$PWD/BACT-341F-805R/3237-WHJ-0002_S2_L001_R2_001.fastq.gz,reverse
stationP3,$PWD/BACT-341F-805R/3237-WHJ-0003_S3_L001_R2_001.fastq.gz,reverse
stationP4,$PWD/BACT-341F-805R/3237-WHJ-0004_S4_L001_R2_001.fastq.gz,reverse
stationP5,$PWD/BACT-341F-805R/3237-WHJ-0005_S5_L001_R2_001.fastq.gz,reverse
stationP6,$PWD/BACT-341F-805R/3237-WHJ-0006_S6_L001_R2_001.fastq.gz,reverse
stationP7,$PWD/BACT-341F-805R/3237-WHJ-0007_S7_L001_R2_001.fastq.gz,reverse
stationP8,$PWD/BACT-341F-805R/3237-WHJ-0008_S8_L001_R2_001.fastq.gz,reverse
stationP9,$PWD/BACT-341F-805R/3237-WHJ-0009_S9_L001_R2_001.fastq.gz,reverse
stationP10,$PWD/BACT-341F-805R/3237-WHJ-0010_S10_L001_R2_001.fastq.gz,reverse
stationP11,$PWD/BACT-341F-805R/3237-WHJ-0011_S11_L001_R2_001.fastq.gz,reverse
stationP12,$PWD/BACT-341F-805R/3237-WHJ-0012_S12_L001_R2_001.fastq.gz,reverse
stationP13,$PWD/BACT-341F-805R/3237-WHJ-0013_S13_L001_R2_001.fastq.gz,reverse
stationP14,$PWD/BACT-341F-805R/3237-WHJ-0014_S14_L001_R2_001.fastq.gz,reverse
stationP15,$PWD/BACT-341F-805R/3237-WHJ-0015_S15_L001_R2_001.fastq.gz,reverse
stationP16,$PWD/BACT-341F-805R/3237-WHJ-0016_S16_L001_R2_001.fastq.gz,reverse
stationP17,$PWD/BACT-341F-805R/3237-WHJ-0017_S17_L001_R2_001.fastq.gz,reverse
stationP18,$PWD/BACT-341F-805R/3237-WHJ-0018_S18_L001_R2_001.fastq.gz,reverse
stationP20,$PWD/BACT-341F-805R/3237-WHJ-0019_S19_L001_R2_001.fastq.gz,reverse
stationP21,$PWD/BACT-341F-805R/3237-WHJ-0020_S20_L001_R2_001.fastq.gz,reverse
stationP22,$PWD/BACT-341F-805R/3237-WHJ-0021_S21_L001_R2_001.fastq.gz,reverse
stationP23,$PWD/BACT-341F-805R/3237-WHJ-0022_S22_L001_R2_001.fastq.gz,reverse
stationP24,$PWD/BACT-341F-805R/3237-WHJ-0023_S23_L001_R2_001.fastq.gz,reverse
stationP25,$PWD/BACT-341F-805R/3237-WHJ-0024_S24_L001_R2_001.fastq.gz,reverse
stationP26,$PWD/BACT-341F-805R/3237-WHJ-0025_S25_L001_R2_001.fastq.gz,reverse
stationP27,$PWD/BACT-341F-805R/3237-WHJ-0026_S26_L001_R2_001.fastq.gz,reverse
stationP28,$PWD/BACT-341F-805R/3237-WHJ-0027_S27_L001_R2_001.fastq.gz,reverse
stationP29,$PWD/BACT-341F-805R/3237-WHJ-0028_S28_L001_R2_001.fastq.gz,reverse
stationP30,$PWD/BACT-341F-805R/3237-WHJ-0029_S29_L001_R2_001.fastq.gz,reverse
stationP31,$PWD/BACT-341F-805R/3237-WHJ-0030_S30_L001_R2_001.fastq.gz,reverse
stationM1,$PWD/BACT-341F-805R/3237-WHJ-0031_S31_L001_R2_001.fastq.gz,reverse
stationM2,$PWD/BACT-341F-805R/3237-WHJ-0032_S32_L001_R2_001.fastq.gz,reverse
stationM3,$PWD/BACT-341F-805R/3237-WHJ-0033_S33_L001_R2_001.fastq.gz,reverse
stationM4,$PWD/BACT-341F-805R/3237-WHJ-0034_S34_L001_R2_001.fastq.gz,reverse
stationM5,$PWD/BACT-341F-805R/3237-WHJ-0035_S35_L001_R2_001.fastq.gz,reverse
stationM6,$PWD/BACT-341F-805R/3237-WHJ-0036_S36_L001_R2_001.fastq.gz,reverse
stationM7,$PWD/BACT-341F-805R/3237-WHJ-0037_S37_L001_R2_001.fastq.gz,reverse
Bug Description
When I run the q2-cutadapt plugin directly on the command line, it behaves fine, but when I call it from inside a script it calls the system version of cutadapt instead of the qiime2 version.
It is not a case of the qiime2 environment failing to activate inside the script. If that were the case, the qiime2 commands wouldn't work at all.
I have attached a screenshot, the script in question, and qiime2's log file pertaining to the error.
In particular, I draw your attention to line 7 in the log file, where the traceback shows that it's using my system version of cutadapt:
Just like with q2-demux, it looks like we need to investigate filehandle accounting here, too. Unfortunately, it looks like the issue is originating within cutadapt or one of its related tools, xopen.
This recently came up on the forum.
Improvement Description
I don't know if this has already been discussed, but I wonder if people would be open to including quality trimming in the cutadapt plugin.
Current Behavior
Right now the q2-cutadapt plugin only trims adapters, and quality trimming needs to be done using the q2-quality-filter plugin.
Proposed Behavior
It would be nice if the option was available to take advantage of cutadapt's quality-trimming functionality in this plugin. I think it would have a few advantages:
I've never contributed to an open-source project before, but I could probably fork this repo and add the functionality if that's desirable (or whatever your process is).
Hi all,
I have an error with cudadapt plugin. I am trying to demultiplex a file which contain single end sequences of both forward and reverse reads. I have just tried once this demultiplexing script with a similar file and it worked, but now that I used another file with the same characteristics it didn't. Could it may be related with some wrong sample, which has no forward reads?
Here i live you the error message I get:
Command '['cutadapt', '--front', 'file:/scratch-local/mbloemen/tmpvu_nizlr', '--error-rate', '0.0', '-o', '/scratch-local/mbloemen/q2-CasavaOneEightSingleLanePerSampleDirFmt-s8ii_4q0/{name}.1.fastq.gz', '--untrimmed-output', '/scratch-local/mbloemen/q2-MultiplexedSingleEndBarcodeInSequenceDirFmt-d72v4q_v/forward.fastq.gz', '/scratch-local/mbloemen/qiime2-archive-23n9wvxj/2cc054c9-525f-4181-951e-9ea0e0c5b3c6/data/forward.fastq.gz']' returned non-zero exit status -9
Thank you very much in advance,
Serena
Should use the new citation API in qiime2/qiime2#387
Current Behavior
Sounds like this is the output produced by at least one company, so may want to provide support for this.
Proposed Behavior
References
raised on forum
Improvement Description
Convert --verbose stats to visualization
Questions
Addition Description
It would be useful to bin reads by primer prior to primer removal. I'd like to separate a single FASTQ-based artifact (containing several different primers) into multiple output artifacts by primer; each output artifact would be characterized by a single primer. This would be helpful for meta-analyses in which sequences with multiple primers/variable regions may be found in a single QIIME artifact.
This is possible with native Cutadapt (as of v4.5
) using steps to demultiplex, but not in the QIIME 2 plugin as its inputs are restricted to specific semantic types.
Current Behavior
qiime cutadapt demux
(based on adapter sequence), but generates only a single output for demultiplexed sequences. It also requires an input artifact of type MultiplexedSingleEndBarcodeInSequence
and does not accept SampleData[Single/PairedEndSequencesWithQuality]
.qiime cutadapt trim
could technically perform this by running the command once per primer (pair), but that is quite inefficient.Proposed Behavior
q2-cutadapt
would take as input 1) a FASTQ artifact of SampleData[Single/PairedEndSequencesWithQuality]
, which contains N different primer sequences among its many reads, and 2) a tab-separated metadata file containing the N primer names and corresponding primer sequences.SampleData[Single/PairedEndSequencesWithQuality]
; each output artifact would contain reads of the same primer sequence. There would also be an output artifact (also SampleData[Single/PairedEndSequencesWithQuality]
) of sequences that did not have any of the N primer names.Questions
References
For now reads that don't pass this filter will have to be written to /dev/null
, since we haven't squared away the nullable outputs situation brought up in #10. In the meantime, the methods can use a Range
to enforce a minimum read length of 1, which will prevent FastqGzFormat
validation issues.
This recently came up on the forum:
This cutadapt parameter controls if unmatched reads should be discarded - this would be pretty useful to wrap (and straightforward).
This recently came up on the forum.
Hi
Is it possible to add an option to print/save the log of the software?
I just want to be sure that all the reads contained the primers that I want to remove.
This information was shown in the "original" cutadapt software.
Best
Greg
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.