Giter Club home page Giter Club logo

cutadapt's People

Contributors

adelq avatar ccwang002 avatar chris7 avatar davmlaw avatar donkirkby avatar ebedthan avatar frederic-mahe avatar greggles avatar jdidion avatar jvhaarst avatar klmr avatar klugem avatar lparsons avatar luchaoqi avatar marcelm avatar mdshw5 avatar necrolyte2 avatar odoublewen avatar peterjc avatar rhpvorderman avatar sage-service-user avatar stekaz avatar sylvainde avatar tbooth avatar tolot27 avatar wlokhorst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cutadapt's Issues

quality trimming assumes phred+33 encoding

From [email protected] on February 09, 2011 15:06:19

The quality trimming in cutadapt currently assumes that the qualities in the FASTQ file are encoded as ascii(phred_quality+33). If the qualities are encoded as ascii(phred_quality+64), quality trimming won't work correctly.

Workaround is to increase the specified cutoff value by 31. For example, if you actually mean "-q 10", you have to write "-q 41".

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=7

Trimming known adapter pairings

From [email protected] on December 16, 2011 04:18:57

This is not a bug, just a request. To illustrate the issue, lets pretend:

  1. I have a sequence to be trimmed that looks like this:
    5' TTGsomenucleotides-adapter2 3'

  2. My adapter sequences are:

    adapter1
    adapter2, which ends in TTG on the 3' end

  3. All possible sequenced fragments generated for sequencing will be flanked with Adapter1+Adapter2, or RCAdapter1+RCAdapter2. No authentic fragment will ever be flanked by Adapter2 on both sides, or Adapter1 on both sides. This is the case with the Illumina TruSeq protocols, if I'm not mistaken; it may be common with many protocols.

In this scenario, 5' trimming of the file using all 4 potential adapters would mean that the example sequence would have the beginning "TTG" removed erroneously. We can be sure it is not the end of Adapter2, even though it might match, because Adapter2 will never flank the same fragment on both sides.

A single -b (or -a or -g) argument cannot be relied upon, because other sequences in the same file might have "real" adapter sequences on both ends. However, two iterations of -b will again erroneously remove this portion of "real" sequence after having removed the better match at the 3' end in the first iteration.

Proposed solution(s):

It would be useful if a user could enter adapters as linked pairs. The algorithm which currently decides which of all possible single adapters is most valid could weigh which of the "pairs" was most likely occurring based on combined match length of both adapters of a pair. Then, strong/certain matches of an adapter at one end could prevent the opposite end from being erroneously trimmed using an adapter match that should not occur in the reaction. This implementation could also provide two useful mechanisms for screening adapter concatenations from real organism sequences with minimal loss of true sequence without (I think) over-encumbering the program.

First, a person could run the program using pairs that should not occur flanking a "real" fragment (for example Adapter1+Adapter1), set the -O very high to make sure only near-complete adapters were recognized, and instruct the program to discard reads matching these criteria rather than trimming the reads.

A second mechanism is enabled during normal trimming by the match comparison for deciding which pair is correct to trim a given sequence. For example, I could enter 2 pairs of adapter sequences, each 30 bp in length: 1&2 occur together, and 3&4 occur together. Normally, (I think) your program would find the best match or, in the case of equally good matches, choose the first pair. A second length parameter (-OO or something) could be set so that if a "tie" occurred with both pairs matching along at least that length, that read would be discarded rather than trimmed using the first pair. In my example, if I set -OO 28, any "ties" in best pair decision resulting from a sequence matching 28 bp of adapter1 AND 28 bp of adapter3 would cause a sequence to be discarded (or output to a separate file).

In summary, I think the ability to input/compare adapter "pairs" could help reduce loss of authentic data at high stringency while enabling quick identification and removal of "junk" sequences through mechanisms you have already set in place.

Lastly, thank you for your work creating, updating, and improving this tool. It is really fantastic and is one of very few bioinformatic tools that permits immediate, flexible, and accurate use with little background computer knowledge demanded from the user and only a single dependency. I cannot express how much time and frustration your tool has saved me, and how much I appreciate and admire your work.

Sincerely,
Elspeth Murday
Clemson University
[email protected]

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=34

Log output to file

From [email protected] on April 17, 2012 19:29:24

What version of the product are you using? On what operating system? Cutadapt v1.0, CentOS 5.8. Please provide any additional information below. I'd like to be able to set a filename for the log output instead of getting it into the console. This would help in using this tool in a pipeline software better.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=42

ImportError: No module named lib.cutadapt

From [email protected] on March 16, 2011 21:48:36

What steps will reproduce the problem? 1. run 'python setup.py build' What is the expected output? What do you see instead? Expected setup to run but get the error message

Traceback (most recent call last):
File "setup.py", line 4, in
from lib.cutadapt import version
ImportError: No module named lib.cutadapt What version of the product are you using? On what operating system? 0.9.2 Please provide any additional information below.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=11

cutadapt -b crashes

From [email protected] on April 02, 2012 21:09:35

cutadapt -b GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT 6_cutadap_trim_2.fastq -o recut_o -r recut_r --untrimmed-output=recut_untrim

Traceback (most recent call last):
File "/home/kesner/bin/python/bin/cutadapt", line 708, in
sys.exit(main())
File "/home/kesner/bin/python/bin/cutadapt", line 657, in main
for read in reader:
File "/home/kesner/bin/python/lib/python2.7/site-packages/cutadapt/seqio.py", line 249, in iter
raise ValueError("Length of quality sequence and length of read do not match (%d+%d!=%d)" % (len(qualities), lengthdiff, len(sequence)))
ValueError: Length of quality sequence and length of read do not match (22+0!=45) What version of the product are you using? On what operating system? on redhat linux. Only crashes with -b option so far

python 2.7.2

Ran fine for most of data

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=40

bam format and stdin/stdout

From [email protected] on October 25, 2011 16:35:44

What steps will reproduce the problem? 1. 2. 3. What is the expected output? What do you see instead? What version of the product are you using? On what operating system? Please provide any additional information below. Would you consider adding support for bam format as an input and output, or/and stdin/stdout to pipe it to other scripts? That would be fantastic!!

Also I would add the option to provide a txt file with the adaptors and their names to add them to the reports.

Thanks for your great software.

Carlos

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=31

make Ns align to any base

From [email protected] on June 11, 2011 01:09:04

It would be helpful to have an option that would allow Ns to align to any base (i.e. consider the alignment between GGGGG and GGNGG to have 0 errors instead of 1 error).

This is my test file (the adapter sequence will be GGGGGGG):

$ cat test.fa

perfect
TTTGGGGGGG
withN
TTTGGNGGGG
1mism
TTTGGGGCGG

Currently the N is treated like any other mismatch: if I set -e to 0, the withN sequence won't align to the adapter; in order to make it align I have to set -e 0.15 or higher, same as I would for a sequence with a non-ambiguous incorrect base.

$ cutadapt -e 0 -a GGGGGGG test.fa

perfect
TTT
withN
TTTGGNGGGG
1mism
TTTGGGGCGG

$ cutadapt -e 0.15 -a GGGGGGG test.fa

perfect
TTT
withN
TTT
1mism
TTT

I would like to have an option to make the TTTGGNGGGG version be trimmed even with -e 0.

Version/system:

$ cutadapt --version
0.9.4
$ uname -a
Linux bleen 2.6.32-31-generic #61-Ubuntu SMP Fri Apr 8 18:25:51 UTC 2011 x86_64 GNU/Linux

P.S. cutadapt is a very useful program, thank you!

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=23

Add multi-threading and a suggested way to increase speed

From [email protected] on May 08, 2012 23:36:57

This is a request for feature enhancement, not a defect.

Would it be possible to add multithreading as an option?

I'm running against a conf file of 25 adapters with -b and -n 2, with about 7M 50 bp reads, and it takes a few hours. Not a big deal, just wondering what multithreading would do for this. I'm using version 1.0

Another thought. I've not looked at the source code (Python and C are not my strong suits), so you may already be doing something like I am about to suggest:
if no 'quality'-related options are selected (e.g., -q --quality-base), then how about keeping only the unique reads, processing those, and then applying the same rule to the duplicate reads. When there is a lot of contamination (and therefore duplicate reads), this may speed things up. I realize there are tools made just for that purpose, but I thought I would throw it out there.

Thanks for this handy tool!

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=44

overlapping prefix adapters

From [email protected] on May 08, 2012 17:27:10

The new -g ^ADAPTER option isn’t enough. There has been a request to allow less strict anchoring, where the adapter overlaps the beginning of the read.

This is easily achieved by this change:
-PREFIX = align.STOP_WITHIN_SEQ2
+PREFIX = align.STOP_WITHIN_SEQ2 | align.START_WITHIN_SEQ1

The question is whether that is the desired behaviour or whether both versions should be possible.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=43

Support FASTA + QUAL (not just for colour space)

From [email protected] on February 08, 2011 11:35:39

Hi,

The source code comment at the start of the script says:

If two file names are given, they are assumed to be
.csfasta and .qual files as produced by the SOLiD sequencer.
(You still need to provide the -c option to correctly deal
with color space.)

It could be useful to support sequence space FASTA + QUAL, most commonly found as the output from Roche 454 since the manufacturer's software will convert binary SFF files to FASTA + QUAL (and at the time of writing does not offer SFF to FASTQ).

If cutadapt does already cope with this, then the description quoted needs to be updated.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=6

Add a --too-long option to print too long (by -M) reads to a file instead of discarding

From [email protected] on October 25, 2011 19:57:58

(Enhancement request, not bug)

When too-short reads are discarded using the --minimum-length option to specify minimum length, there's a --too-short-output option to write them to a separate file. There's also an --maximum-length option, but currently there doesn't seem to be any way of writing the resulting too-long reads to a separate file. It would be good to have that option.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=32

better error reporting on parse errors

From [email protected] on March 18, 2011 00:29:09

What steps will reproduce the problem? 1. I downloaded the latest version
2. cutadapt -O 10 -b CAGACGTGCCTCACTACGT TEE.TEST.fastq > TEE.test.trim.a1.fastq (TEE.TEST.fastq is just a standard fastq file) What is the expected output? What do you see instead? Trimmed sequences with statistics.

[kumarlab@BatLC1 TEE]$ cutadapt -O 10 -b CAGACGTGCCTCACTACGT TEE.TEST.fastq > TEE.test.trim.a1.fastq
Traceback (most recent call last):
File "/home/kumarlab/bioinfo/cutadapt-0.9.3/cutadapt", line 549, in
sys.exit(main())
File "/home/kumarlab/bioinfo/cutadapt-0.9.3/cutadapt", line 508, in main
for desc, seq, qualities in reader:
File "/home/kumarlab/bioinfo/cutadapt-0.9.3/lib/cutadapt/seqio.py", line 196, in iter
assert line[0] == '@'
AssertionError What version of the product are you using? On what operating system? 0.9.3 on linux Please provide any additional information below. I have performed this operation many times before. I got a new server and installed the latest version of cutadapt and encountered this problem.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=13

[email protected]

From [email protected] on January 11, 2011 12:32:14

I am so confused, because the program just remove adaptor in colorspace coding "330201030313112312" for solid data.

330201030313112312 can translate to 4 different neocleobase sequences,and CGCCTTGGCCGTACAGCAG is one of them.
So, if the program just remove "330201030313112312", we will lose some information.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=5

-c: command not found

From [email protected] on May 06, 2011 13:39:21

What steps will reproduce the problem? 1. First step 2. 3. What is the expected output? What do you see instead? I expect the program to run. But it always say -c: command not found. When I remove -c option, it says -e: command not found What version of the product are you using? On what operating system? 0.9.3 Ubuntu 10.04 LTS Lucid Please provide any additional information below.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=18

problem on installation

From [email protected] on March 19, 2012 19:49:17

Cannot run cutadapt1.0..

I used the installation instructions with both python2.6.5 and python3.1.2. I am getting the following error when running cutadapt-1.0/cutadapt:

b2a:cutadapt-1.0] ./cutadapt
File "./cutadapt", line 99
print("length", "count", sep="\t")
^
SyntaxError: invalid syntax What steps will reproduce the problem? 1. 2. 3. What is the expected output? What do you see instead? What version of the product are you using? On what operating system? Please provide any additional information below.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=39

Remove adaptor from 5' end in color-space data

From [email protected] on April 16, 2012 17:44:27

What steps will reproduce the problem? 1. The option -g does not work with color-space data. The script says "Using --anywhere or --front with color space reads is currently not supported (if you think this may be useful, contact the author)." What is the expected output? What do you see instead? Trimmed reads for 'de novo' RNA-Seq What version of the product are you using? On what operating system? v1.0 on Rocks Viper Please provide any additional information below. I'm working with the same datase using the new ECC module that allows you to convert color-space into base-space data and from that I know there are adaptors in the 5' end in my reads that should be removed prior the 'de novo' assembly of my data. I detected these adaptors using your tool with the reads in base-space but now I'd like to do the same process with the data in color-space but I'd need to removed the adaptor in the 5' end. Any ideas about how to perform 5' adaptor removal in color-space data?

Thanks very much in advance.

Sheila

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=41

Weird behavior of CutAdapt -a/-g/-b option

From [email protected] on July 06, 2012 19:23:08

Hi, I tried to testing the performance of CutAdapt. So I simulated 1M reads with 2 adapters randomly added to either 3' or 5' end. I also randomized the length of adapter sequences to be added to the read.

As a summary, I generated 39957 contaminated reads, half of them contaminated with adapter 1 and the other half contaminated with adapter 2. The size of adapter 1 is 25 bp, the size of adapter 2 is 33 bp.

Then I ran CutAdapt 3 times using the following commands:

CutAdapt -b adapter_1 -b adapter_2 contamined.reads
CutAdapt -a adapter_1 -a adapter_2 contamined.reads
CutAdapt -g adapter_1 -g adapter_2 contamined.reads

Here is the histogram of adapter lengths. (For the sake of the issue, I only posted the relevant information):

For command: CutAdapt -b adapter_1 -b adapter_2 contamined.reads
===Adapter 1===
Histogram of adapter lengths (5')
length count
24 375
25 402

Histogram of adapter lengths (3' or within)
length count
24 400
25 394

=== Adapter 2 ===
Histogram of adapter lengths (5')
length count
32 277
33 288

Histogram of adapter lengths (3' or within)
length count
32 301
33 346

For command: CutAdapt -a adapter_1 -a adapter_2 contamined.reads
===Adapter 1===
Histogram of adapter lengths
length count
24 400
25 1558

=== Adapter 2 ===
Histogram of adapter lengths
length count
32 301
33 1526

For command: CutAdapt -g adapter_1 -g adapter_2 contamined.reads
===Adapter 1===
Histogram of adapter lengths
length count
24 375
25 1604

=== Adapter 2 ===
Histogram of adapter lengths
length count
32 277
33 1546

In my simulation, there are 402 reads contaminated with adapter 1 of size 25 bp at 5' end, 394 reads contaminated with adapter 1 of size 25 bp at 3' end, 288 reads contaminated with adapter 2 of size 33 bp at 5' end, 346 reads contaminated with adapter 2 with size 33 bp at 3' end.

Therefore, when I used -b option, the sensitivity and specificity of CutAdapt are almost 100%. But when I used -a or -g option, the sensitivity of CutAdapt is still 100%, while the false positive rate of trimming increased significantly (from 0 to around 0.1%).

I am really confused by this result.

Based on a previous post: https://code.google.com/p/cutadapt/issues/detail?id=8 , I thought the decreased specificity might be caused by the error-tolerate mapping in longer sequence contamination. To test this possibility, I ran command: CutAdapt -a adapter_1 -a adapter_2 -e 0.05 contamined.reads

===Adapter 1===
Histogram of adapter lengths
length count
24 400
25 1171

=== Adapter 2 ===
Histogram of adapter lengths
length count
32 301
33 911

So I did observe a decrease in the false positive rate when I used -e 0.05.

But why the specificity is "perfect" when I used the -b option?

I am running the test using cutadpat 1.0 version on CentOS.

Best,
Ying

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=47

Paired-end trimming

From [email protected] on September 05, 2012 10:45:42

I would like to see paired-end trimming in cutadapt. Input two fastq files which are ordered identically, and have all trimming done on both files simultaneously. After the trimming of each pair there should be a user-settable cutoff to discard the entire pair if one of the reads is shorter than this cutoff.

BWA can't handle out of sync files, and trimmomatic and trim_galore can't handle 3' adapters. This functionality in cutadapt would make is super-powerful, like batman or ironman.

cheerio

Daniel

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=50

Enhancement - specify trim sequences in file?

From [email protected] on November 12, 2011 20:20:02

Would it be possible to specify sequences to be trimmed in a file, rather than directly on the command line? Right now I do this:

$cutadapt -m 20 -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTACATCTCGTATGCCGTCTTCTGCTTG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG -a CTTCACCGTGCCAGACTAGAGTCAAGCTCAACAGGGTCTTCTTTCCCCGCTG -a GGATGAACGAGATTCCCACTGTCCCTACCTACTATCCAGCGAAACCACAGCC -a CTCCCTTTCGATCGGCCGAGGGCAACGGAGGCCATCGCCCGTCCCTTCGGAA -a CGAGATTCCCACTGTCCCTACCTACTATCCAGCGAAACCACAGCCAAGGGAA -a CCACTCTCGACTGCCGGCGACGGCCGGGTATGGGCCCGACGCTCCAGCGCCA -a TGGAAGTCGGAATCCGCTAAGGAGTGTGTAACAACTCACCTGCCGAATCAAC -a CCTATACCCAGGTCGGACGACCGATTTGCACGTCAGGACCGCTACGGACCTC -a CACGAGCGCACGTGTTAGGACCCGAAAGATGGTGAACTATGCCTGGGCAGGG -a GTCGGAATCCGCTAAGGAGTGTGTAACAACTCACCTGCCGAATCAACTAGCC -a CTCCCGTCCACTCTCGACTGCCGGCGACGGCCGGGTATGGGCCCGACGCTCC -a CGCAGGTTCAGACATTTGGTGTATGTGCTTGGCTGAGGAGCCAATGGGGCGA -a GAACGAGATTCCCACTGTCCCTACCTACTATCCAGCGAAACCACAGCCAAGG -a CAGAAGGGCAAAAGCTCGCTTGATCTTGATTTTCAGTACGAATACAGACCGT -a TTTCGATCGGCCGAGGGCAACGGAGGCCATCGCCCGTCCCTTCGGAACGGCG input.fastq > output.fastq

I would be much easier if I could do:
$cutadapt -m 20 -a <trim_sequence_file> input.fastq > output.fastq

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=33

"Histogram of adapter lengths" reports confusing numbers

From [email protected] on February 13, 2011 17:23:46

Hi,

I run the cutadapter 0.9 in Linux and got out put summary .... histogram of adaptar length and number of occurance (see below). ..... My adapter length is 41nt but my input seq is 30nt in all fastq file. I just wonder how it found 31, 32, 33 nt long adapter in my input seq.......... or I just mis-interpreted the histogram result?

Command line parameters: -a CCCTATAGTGAGTCGTATTATCGTATGCCGTCTTCTGCTTG /home/khademul/PROJECTS/s1.fq -m 14 -M 30 > s1.trim.fq

Maximum error rate: 10.00%
Processed reads: 8983115
Trimmed reads: 3012186 ( 33.5%)
Too short reads: 1558273 ( 17.3% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 326.89 s
Time per read: 0.04 ms

=== Adapter 1 ===

Adapter 'CCCTATAGTGAGTCGTATTATCGTATGCCGTCTTCTGCTTG', length 41, was trimmed 3012186 times.

Histogram of adapter lengths
length count
3 113604
4 130890
5 142304
6 47822
7 94040
8 119724
9 59029
10 63170
11 79266
12 86977
13 159517
14 136192
15 98530
16 122180
17 123892
18 109569
19 84714
20 67326
21 42479
22 19665
23 6482
24 3134
25 4269
26 917
27 633
28 1143
29 4362
30 948377
31 27386
32 112349
33 2244


thanks,

Abul

[email protected]

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=8

input from STDIN stops working when -f option given

From [email protected] on August 18, 2011 01:35:31

If I run cutadapt on very simple fasta input, I get an error when taking input from STDIN and using the -f fasta option. When reading input from a file, or not using -f, everything works correctly.

My test input file (test0.fa) contains the following:

perfect
TTTGGGGGGG

I want to test cutadapt taking input from STDIN. With no -f option, it works as expected:

$ cat test0.fa | cutadapt -a GGGGGGGGGGGGGGG -

perfect
TTT
cutadapt version 0.9.4
Command line parameters: -a GGGGGGGGGGGGGGG -
Maximum error rate: 10.00%
Processed reads: 1
Trimmed reads: 1 (100.0%)
Too short reads: 0 ( 0.0% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 0.00 s
Time per read: 0.00 ms

=== Adapter 1 ===

Adapter 'GGGGGGGGGGGGGGG', length 15, was trimmed 1 times.

Histogram of adapter lengths
length count
7 1

However, if I add -f fasta to the options, it stops working:

$ cat test0.fa | cutadapt -f fasta -a GGGGGGGGGGGGGGG -
Traceback (most recent call last):
File "/home/weronika/programs/work_software/cutadapt", line 5, in
pkg_resources.run_script('cutadapt==0.9.4', 'cutadapt')
File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line 461, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line 1194, in run_script
execfile(script_filename, namespace, namespace)
File "/home/weronika/programs/work_software/cutadapt-0.9.4-py2.6-linux-x86_64.egg/EGG-INFO/scripts/cutadapt", line 595, in
sys.exit(main())
File "/home/weronika/programs/work_software/cutadapt-0.9.4-py2.6-linux-x86_64.egg/EGG-INFO/scripts/cutadapt", line 546, in main
for desc, seq, qualities in reader:
File "/home/weronika/programs/work_software/cutadapt-0.9.4-py2.6-linux-x86_64.egg/cutadapt/seqio.py", line 154, in _streaming_iter
for line in self.fp:
IOError: File not open for reading

Just to make sure, if I run it with the -f option but reading input from a file instead of STDIN, it also works as expected:

$ cutadapt -f fasta -a GGGGGGGGGGGGGGG test0.fa

perfect
TTT
cutadapt version 0.9.4
Command line parameters: -f fasta -a GGGGGGGGGGGGGGG test0.fa
Maximum error rate: 10.00%
Processed reads: 1
Trimmed reads: 1 (100.0%)
Too short reads: 0 ( 0.0% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 0.01 s
Time per read: 10.00 ms

=== Adapter 1 ===

Adapter 'GGGGGGGGGGGGGGG', length 15, was trimmed 1 times.

Histogram of adapter lengths
length count
7 1

Cutadapt version: cutadapt-0.9.4-py2.6-linux-x86_64.egg

Operating system: Linux, Ubuntu 10.04, 64-bit.
$ uname -a
Linux bleen 2.6.32-32-generic #62-Ubuntu SMP Wed Apr 20 21:52:38 UTC 2011 x86_64 GNU/Linux

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=28

output additional info about the cut adapters

From [email protected] on December 02, 2010 14:30:05

This suggestion was made by "hash" in the seqanswers forum:

"Also, it would be useful to track which sequence was trimmed as adapter and where in the original sequence it was trimmed from in terms of location. Maybe an optional dump .fastq file would help for this which would contain the trimmed adapter sequence and additional information as to where in the original sequence it was found and how many mismatches were allowed (e.g. if a sequence is 36bp and adapter is found at 1 to 15bp with 0 mismatches, then maybe you could append this information to the '+' line in the fastq file as 1_15_0; the rest of the fields for a fastq sequence entry i.e. '@' would be the same)."

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=3

Sequence and quality different lengths

From [email protected] on October 05, 2011 14:53:57

Hi Marcel,

The below sounds like issue #9 , but with colour space fastq data.

My problem is that the quality and sequence strings are not seen as being the same length, although they appear to be when I check them.

I am running 0.9.5 with python2.7 on 64bit CentOS (RedHat).

My command line is taken from novoalign's guide:

cutadapt -c -e 0.12 -a 330201030313112312 test.fastq

The output is:

Traceback (most recent call last):
File "/opt/python2.7/bin/cutadapt", line 600, in
sys.exit(main())
File "/opt/python2.7/bin/cutadapt", line 548, in main
for desc, seq, qualities in reader:
File "/opt/python2.7/lib/python2.7/site-packages/cutadapt/seqio.py", line 239, in iter
raise ValueError("Length of quality sequence and length of read do not match (%d+%d!=%d)" % (len(qualities), lengthdiff, len(sequence)))
ValueError: Length of quality sequence and length of read do not match (36+1!=36)

I then tried dos2unix test.fastq, but received the same result.

The test file is as below (this was generated using the fastq-dump tool from the SRA package so I'm unsure what line terminators are used but after dos2unix vi shows a $ at the end of each line in set list mode):

[aplatts@grandiflora novoalignCS]$ cat test.fastq
@SRR040402.1 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_192_F3 length=35
T32113113220300030.1232...32.2010...
+SRR040402.1 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_192_F3 length=35
!/?)>7;<<)/688(<+&!75)1!!!&%!/3,/!!!
@SRR040402.2 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_295_F3 length=35
T33213022233231303.2000...20.2203...
+SRR040402.2 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_295_F3 length=35
!)&4,6):-7'&5+$)8!'&&5!!!(3!)&4&!!!
@SRR040402.3 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_298_F3 length=35
T33211222202311330.0000...30.2002...
+SRR040402.3 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_298_F3 length=35
!
##,.)%)_&&048$%&!2&+%!!!%(!&-/+!!!
@SRR040402.4 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_316_F3 length=35
T22213010112010010.3113...33.1111...
+SRR040402.4 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_316_F3 length=35
!)8)%+.9,4669/$#52!1%,3!!!'5!1,%+!!!
@SRR040402.5 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_578_F3 length=35
T00112130221210101.2010...13.1111...
+SRR040402.5 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_578_F3 length=35
!(//8&/(;(,>))(&1(!99<8!!!50!.)&6!!!
@SRR040402.6 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_693_F3 length=35
T01200230113010033.2131...20.1003...
+SRR040402.6 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_693_F3 length=35
!<5((((6)5181&41#!-)()!!!'!5.6#!!!
@SRR040402.7 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_714_F3 length=35
T23211301211022321.1322...20.1123...
+SRR040402.7 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_714_F3 length=35
!:27=43=:5<6;9;/49!-(8:!!!/6!+2&.!!!
@SRR040402.8 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_728_F3 length=35
T10203211202030103.3112...21.1122...
+SRR040402.8 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_728_F3 length=35
!;,6(.1)($1(9#&#/8!,)-)!!!_3!00(3!!!
@SRR040402.9 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_756_F3 length=35
T33220333011333332.2210...30.0013...
+SRR040402.9 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_756_F3 length=35
!5(19<63/8:<7.1%1'!.,35!!!+6!).*/!!!
@SRR040402.10 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_1009_F3 length=35
T01211220222221112.1222...03.1011...
+SRR040402.10 VAB_sparrow_20091211_2_Axtell_smallRNA2_and_transcriptome6_bc1to82_18_1009_F3 length=35
!&/,;6&)/2,,/78,,;!/)1:!!!&#!193/!!!

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=29

-g finds and deletes adapter from end of read

From [email protected] on June 02, 2012 09:45:03

What steps will reproduce the problem? 1.sequence is TTGGCCAATTGGCCAATGACTGTGATGCTGTAGTCGTGATGCTGATGCTGTAGCTAGCTGTAGTGTGTCGATGACTGAACCGGTTAACCGGTT
2. Adapter is AACCGGTTAACCGGTT
3. using -g What is the expected output? What do you see instead? The adapater should not be found since it is at the end of the read What version of the product are you using? On what operating system? 1.0 Please provide any additional information below. Returns empty sequence. According to the documentation, the -g option only looks for hits at the beginning of a sequence

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=45

--wildcard-file throws IndexError: string index out of range

From trgibbons on June 17, 2012 06:45:02

What steps will reproduce the problem? 1. Run cutadapt on a large fastq file (>100k sequences) with the --wildcard-file flag set What is the expected output? What do you see instead? I expected a file containing a list of multiplexing sequence tags.
Instead I keep getting some variation of this (only paths change):
Traceback (most recent call last):
File "/usr/local/bin/cutadapt", line 708, in
sys.exit(main())
File "/usr/local/bin/cutadapt", line 678, in main
read, trimmed = cutter.cut(read)
File "/usr/local/bin/cutadapt", line 448, in cut
if adap_match[i] == 'N']
IndexError: string index out of range What version of the product are you using? On what operating system? Using cutadapt version 1.0 on Ubuntu 10.10 (Python 2.6) & OS X 10.7 (Python 2.7) Please provide any additional information below. Reproducible with multiple read sets, with Python 2.6 & 2.7, with installation from easy_install and from source, and on Ubuntu and OS X.

I've been running cutadapt in a shell script with a large number of flags set simultaneously. Of these, the wildcard file flag seems to be the only one giving me trouble.

my_script.sh
#!/bin/bash

time nice -n 12 cutadapt
--format=fastq
--anywhere=GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
--output=${1%.fastq}-cutadapt_q30m25_clean.fastq
--rest-file=${1%.fastq}-cutadapt_q30m25_garbage.fastq
--wildcard-file=${1%.fastq}-cutadapt_q30m25_tag.fastq
--too-short-output=${1%.fastq}-cutadapt_q30m25_short.fastq
--untrimmed-output=${1%.fastq}-cutadapt_q30m25_noadapt.fastq
--quality-cutoff=20
--minimum-length=25
$1 \

${1%.fastq}-cutadapt_q30m25.log

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=46

Error with preamble and exc_clear

From [email protected] on August 22, 2012 22:41:11

What steps will reproduce the problem? 1. running cutadapt with any files What is the expected output? What do you see instead? Instead of proper cutadapt running, I see the code copied below

Traceback (most recent call last):
File "/home/rnaseq/programs/execute/cutadapt", line 5, in
import _preamble
ImportError: No module named _preamble

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "$HOME/programs/execute/cutadapt", line 7, in
sys.exc_clear()
AttributeError: 'module' object has no attribute 'exc_clear' What version of the product are you using? On what operating system? cutadapt 1.1 on linux with python 3.2 Please provide any additional information below. I'm sorry if this is a simple question but I am new to linux and having errors trying to use cutadapt. Please help me determine what is incorrect about my installation or coding. Thank you.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=49

remove adaptor sequence

From [email protected] on January 18, 2012 17:50:16

Hi there,
i run the cutadapt to remove adaptor sequence from RNA-SEQ data, but it gave me some error. could you please help me fix it? Thanks.
Below is the error,

'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/opt/bioinf/bin/cutadapt", line 71, in
_libdir = join(dirname(realpath(file)), 'lib')
File "/usr/lib64/python2.6/posixpath.py", line 362, in realpath
if islink(component):
File "/usr/lib64/python2.6/posixpath.py", line 135, in islink
return stat.S_ISLNK(st.st_mode)
AttributeError: 'module' object has no attribute 'S_ISLNK'

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=35

strict 5' adapter matching

From [email protected] on February 23, 2012 23:58:59

Hi, cutadapt seems to be excessively trimming reads.
e.g. see the 3bp read remaining below.
I have specified (lots of ) multiple ~20bp adapters with -g, --times=1
I would have expected the smallest read to be 100-20=80bp.
I have not specified to trim low quality reads.

100bp miseq data. quite dirty with some low intensities, hence lower q scores.

many thanks for help.
david

/cutadapt-1.0/cutadapt --overlap=15 --times=1 --quality-base=64 -g AAAAAACTCACAAAGTCAGGTAATTCT -g AAAAAAGCTCTCblah etc....

@Miseq:6:000000000-A0EW6:1:1:15870:1573 1:N:0:CGCTATCAG
GGAGCACGTGCAGACCCCCTACCTCTGCAGGACTGTCTTGCCATCCTCACCTGTCTGTGCCTCCTGCCCCGCAGTCAAGCGC
+
IIIIIGIG?DCGB>DBBDFGGII>FG<C4@(5=CD@CC;B>AACCBCCC?BCCCC@AAC3@>>(9>ABA@B###########
@Miseq:6:000000000-A0EW6:1:1:15386:1585 1:N:0:CGCTATCAG
ACTTAAAAGTTCACTTTTTGACAGATCCTGAAAATGAGATGAAGGAGAAGCTCTTAAAAGAGTACTTAATGGTGATAG
+
IIIII@DDH?F?FCGGGGIGGHHGIIGGIIFCEHHGHG@@ddf<C@=AEDDCCCCC:@:5<C:>@CCDDDAC4:>CE>
@Miseq:6:000000000-A0EW6:1:1:17223:1592 1:N:0:CGCTATCAG
GTC
+
::>
@Miseq:6:000000000-A0EW6:1:1:14819:1607 1:N:0:CGCTATCAG
AGCCCGCGGCAGCCACTGCAGCAGCGGCAGTGGCAGTAGCAGCAGCCACAGCTACAGCCACAGCCACGGCCTCTGTGGCCGC
+
GGIIIIIIF8FGGD9F;33CGDC2C/9BC>;>CDCBCCC35>?BB=A?A?BB@:>CC:8?C8?88?C59@############ What steps will reproduce the problem? 1. 2. 3. What is the expected output? What do you see instead? What version of the product are you using? On what operating system? Please provide any additional information below.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=36

Error ELFCLASS64

From [email protected] on July 09, 2012 13:41:19

Hello, I'm really struggling with Cutadapt... I don't know how to continue with this error:

[sudo] password for genmed13:
SIOCADDRT: El archivo ya existe
genmed13@genmed13-HP-Compaq-dc5800-Microtower:$ cd Descargas/
genmed13@genmed13-HP-Compaq-dc5800-Microtower:
/Descargas$ cd cutadapt-1.0
genmed13@genmed13-HP-Compaq-dc5800-Microtower:/Descargas/cutadapt-1.0$ ./cutadapt --help
Traceback (most recent call last):
File "./cutadapt", line 78, in
from cutadapt import align, seqio
File "/home/genmed13/Descargas/cutadapt-1.0/lib/cutadapt/align.py", line 222, in
from cutadapt.calign import globalalign, globalalign_locate
ImportError: /home/genmed13/Descargas/cutadapt-1.0/lib/cutadapt/calign.so: wrong ELF class: ELFCLASS64
genmed13@genmed13-HP-Compaq-dc5800-Microtower:
/Descargas/cutadapt-1.0$ python setup.py build_ext -i
running build_ext
genmed13@genmed13-HP-Compaq-dc5800-Microtower:~/Descargas/cutadapt-1.0$ ./cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC /Escritorio/s_G1_L001_I1_001_mate1.fastq -o /Escritorio/s_G1_L001_I1_001_mate1_trimmed.fastq
Traceback (most recent call last):
File "./cutadapt", line 78, in
from cutadapt import align, seqio
File "/home/genmed13/Descargas/cutadapt-1.0/lib/cutadapt/align.py", line 222, in
from cutadapt.calign import globalalign, globalalign_locate
ImportError: /home/genmed13/Descargas/cutadapt-1.0/lib/cutadapt/calign.so: wrong ELF class: ELFCLASS64

What can I do?

Thanks...

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=48

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.