aakechin / cutprimers Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 11.0 36.59 MB

curPrimers is a tool for trimming primer sequences from amplicon based NGS reads

License: GNU General Public License v3.0

Python 100.00%

cutprimers's People

Contributors

Stargazers

Watchers

Forkers

ray1919 xiaoqiwang19 sdwfrost ditag jianshu93 bennyyu79 nvt-1009 hidaisyyu flywind2 genostack marcosquintelab

cutprimers's Issues

Your comments, suggestions and problems that you have encountered

Dear users of cutPrimers! We will be glad to hear from you any comments (positive or negative) about using cutPrimers. If you have any problems, please let us know. We will try to answer you as soon as it is possible. But I think we will be able to answer during 24 hours. Thank you for using cutPrimers that removes primer sequences from NGS reads!

doubt in input primer fasta files

Hi,

I have a adapter trimmed fastq files(R1 and R2 reads). My forward primer sequence is "TGTGCCAGCMGCCGCGGTAA" and reverse primer sequence is "TGGACTACHVGGGTWTCTAAT". With this info, how to use cutPrimers?

I assume the below fasta files needs to be parsed from the fastq files at 5' and 3' end of the R1 and R2 reads. How to do this?
--primersFileR1_5, -pr15 - fasta-file with sequences of primers on the 5'-end of R1 reads
--primersFileR2_5, -pr25 - fasta-file with sequences of primers on the 5'-end of R2 reads. Do not use this parameter if you have single-end reads
--primersFileR1_3, -pr13 - fasta-file with sequences of primers on the 3'-end of R1 reads. It is not required. But if it is determined, -pr23 is necessary
--primersFileR2_3, -pr23 - fasta-file with sequences of primers on the 3'-end of R2 reads

Any help is appreciated and i am curious to compare your tool with our sequencing provider in-house primerclipping results!

Best Regards,
Bala

how could I get the primer fasta files?

Hi，
I have a primer bed file:
Chr Amplicon_Start Insert_Start Insert_Stop Amplicon_Stop
chr17 41275996 41276024 41276122 41276149
chr17 41267705 41267733 41267856 41267884
How could I get the primer fasta files?

When I run the shell from the readme file,
'''
python3 cutPrimers.py -r1 example/1_S1_L001_R1_001.fastq.gz -r2 example/1_S1_L001_R2_001.fastq.gz -pr15 example/primers_R1_5.fa -pr25 example/primers_R2_5.fa -pr13 example/primers_R1_3.fa -pr23 example/primers_R2_3.fa -tr1 example/1_r1_trimmed.fastq.gz -tr2 example/1_r2_trimmed.fastq.gz -utr1 example/1_r1_untrimmed.fastq.gz -utr2 example/1_r2_untrimmed.fastq.gz -t 2
'''
As a result I get four files with the following sizes: 4.2 Mb, 2.3 Mb, 3.6 Mb and 2.3 Mb for files with trimmed R1 reads, untrimmed R1 reads, trimmed R2 reads and untrimmed R2 reads, respectively.
But I find some the primer sequences are still in the untrimmed reads files,why?
less primers_R1_5.fa|head

R1
AGAGTGGGTGTTGGACAGTGT

less 1_S1_L001_R1_001.untrim.fastq.gz|grep AGAGTGGGTGTTGGACAGTGT
GATTAGAGCCTAGTCCAGGAGAATGAATTGACACTAATCTCTGCTTGTGTTCTCTGTCTCCAGCAATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAAATCACCGA
AGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTACTGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAGCTTACAAAGTATCACCGA
AGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTACTGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAGCTTACAAAGGATCACCGA
GATTAGAGCCTAGTCCGGGAGAATGAATTGACACTAATCTCTGCTTGTGTTCTCTGTCTCCAGCAATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAAATCACCG
GATTAGAGCCTAGTCCAGGAGAATGAATTGACACTAATCTCTGCTTGTGTTCTCTGTCTCCAGCAATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAAATCACCGA
AGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTACTGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAATCACCGACTGCCCATAGG
AGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTACTGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAGCTTACAAATATCACCGA
AGAGTGGGTGTTGGACAGTGTGTGGCTGTGTGGGTCAGTGTATGGCTGTGTGGGTTGGTGAGTGGTTGTGTGGGTTGCTGTGTGTGCGTGTGGGGTGCCTGTTTTGGGGAAAAATAGCTTTTCACATCTGCAATCACCGACTGCCCATAGG
GATTAGAGCCTAGTCCAGGAGAATGAATTGACACTAATCTCTGCTTGTGTTCTCCGTCTCCAGCAATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGACATCACCGA

And I use cutPrimers to deal amplicon fastq, eighty percent reads trimmed, there are still twenty percent reads untrimmed with the primer sequence. How could I solve this problem?

Any help is appreciated.
Best Regards,
Amy

parameters and efficiency

Hi,
Here is what I do. I have 2 paired-end reads (740,134 reads each) with staggered degenerate primers:

Forward primers	Reverse primers
CCTACGGGNGGCWGCAG	GACTACHVGGGTATCTAATCC
TCCTACGGGNGGCWGCAG	TGACTACHVGGGTATCTAATCC
ACCCTACGGGNGGCWGCAG	ACGACTACHVGGGTATCTAATCC
CTACCTACGGGNGGCWGCAG	CTAGACTACHVGGGTATCTAATCC

Staggered meaning that 1, 2 or 3 bases are added to the main primer (e.g. CCTACGGGNGGCWGCAG) to increase diversity in Illumina sequencing. I would need to cut these out from the fastq. So I prepared 2 fasta files with 4 primer sequences each and ran cutprimers.py:

python3 cutPrimers.py 
    -r1 $FWD_read \
    -r2 /$REV_read \
    -pr15 forward_primers.fa \
    -pr25 reverse_primers.fa \
    -tr1 trim.pair1.fastq.gz \
    -tr2 trim.pair2.fastq.gz \
    -utr1 untrimmed1.fastq.gz \
    -utr2 untrimmed2.fastq.gz \
    --error-number 10 \
    -stat trim.statistics.log \
    --primer3-absent \
    --primer-location-buffer 30 \
    --threads 1

A note on the parameters:
--error-number 10 to account for the 2 ambiguous nucleotides (N and W) + the few bases that are added in the beginning of the primer. I tried lowest values but a few good primers remained in the untrimmed output. 10 does the job fine
--primer-location-buffer 30 to limit the region searched (in hope to gain time)
--threads 1 because the parameters I chose are really RAM demanding when I run in parallel.

Now I can't understand well the trim.statistics.log file. It shows stats for 4 primers only (when I was expecting 8), and the number of reads don't match my samples:

Primer	Total_number_of_reads	Number_with_sequencing_errors
3F	599	1198
3R	599	0
4F	719215	1438430
4R	719215	0

It seems to do the job well although it takes quite a lot of time and RAM (I am on a server and it uses 15 GB for a little more than 2 hours for 1 paired-end reads). Do you think my parameters are adapted to what I want to do? Thanks!

fastq file trimmed after cutprimer can not correct run in mpileup

Hello,
This programme is very fantastic in my daily use. However, recently I found the fastq files trimmed after by cutprimers.py can not run correct by samtools mpileup programme. My analysis pipeline as following: Fastq reads trimmed by cutprimers.py-- bwa mem reads alignment-- samtools view convert file from sam to bam -- samtools sort to sort the bam file -- samtools mpileup -a to output all position statistics. But finnally the output file lost most positions it should be coveraged well. At the begining, I thought it might be the problems from the samtools mpileup programe. but when i skip the cutprimer step, all position is well reported in mpileup files. So, i write to find your help to solve this disaster problem.
best regards,
Yuanwu

Bug with Primer File processing?

I have one forward PCR primer (F1) and two different reverse primers (let's say R1 and R2). I us miSeq paired-end. I'm guessing that primersFileR1_5 and primersFileR2_5 has to have the same number of sequences. So primersFileR1_5 file, I repeated F1 twice. Weird thing is that the order of primer sequences in the primerFileR2_5 influences the outcome. If I list R1 first and then R2 in the fasta file, it works. But if I list R2 first and R1 second, it doesn't trim. Now in this file, there is no R2 match, but there should be R1 match.

I tried with 1.2 release and the current version pulled out from github, and they behaves in the same way. I think this is a bug, but could you take a look at it?

test-cutPrimers.tar.gz

I'm attaching a test dataset (with 4 read pairs) in the attached tar.gz file. There are four files in the directory. test_R[12].fq are the reads, and pri_R[12]_5.fa is the primer files. I used this command. You will see that with the current pri_R2_5.fa, it doesn't remove any primers. But if you switch the order or R1 and R2 in this file, it works.

python3 cutPrimers.py -r1 test_R1.fq -r2 test_R2.fq -pr15 pri_R1_5.fa -pr25 pri_R2_5.fa -tr1 out.tr1 -tr2 out.tr2 -utr1 out.utr1 -utr2 out.utr2

Thank you,
Naoki

Little doubt

def makeHashes(seq,k):
    # k is the length of parts
    subSeqs=[]
    h=[]
    lens=set()
    for i in range(len(seq)-k+1):
        h.append(hashlib.md5(seq[i:i+k].encode('utf-8')).hexdigest())
        lens.add(k)
    return(h,lens)

I am new to python. Why add k to set lens in each iteration since it is a fixed value?

aakechin / cutprimers Goto Github PK

cutprimers's People

Contributors

Stargazers

Watchers

Forkers

cutprimers's Issues

Your comments, suggestions and problems that you have encountered

doubt in input primer fasta files

how could I get the primer fasta files?

parameters and efficiency

fastq file trimmed after cutprimer can not correct run in mpileup

Bug with Primer File processing?

Little doubt

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent