Giter Club home page Giter Club logo

cutprimers's People

Contributors

aakechin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cutprimers's Issues

Your comments, suggestions and problems that you have encountered

Dear users of cutPrimers! We will be glad to hear from you any comments (positive or negative) about using cutPrimers. If you have any problems, please let us know. We will try to answer you as soon as it is possible. But I think we will be able to answer during 24 hours. Thank you for using cutPrimers that removes primer sequences from NGS reads!

doubt in input primer fasta files

Hi,

I have a adapter trimmed fastq files(R1 and R2 reads). My forward primer sequence is "TGTGCCAGCMGCCGCGGTAA" and reverse primer sequence is "TGGACTACHVGGGTWTCTAAT". With this info, how to use cutPrimers?

I assume the below fasta files needs to be parsed from the fastq files at 5' and 3' end of the R1 and R2 reads. How to do this?
--primersFileR1_5, -pr15 - fasta-file with sequences of primers on the 5'-end of R1 reads
--primersFileR2_5, -pr25 - fasta-file with sequences of primers on the 5'-end of R2 reads. Do not use this parameter if you have single-end reads
--primersFileR1_3, -pr13 - fasta-file with sequences of primers on the 3'-end of R1 reads. It is not required. But if it is determined, -pr23 is necessary
--primersFileR2_3, -pr23 - fasta-file with sequences of primers on the 3'-end of R2 reads

Any help is appreciated and i am curious to compare your tool with our sequencing provider in-house primerclipping results!

Best Regards,
Bala

how could I get the primer fasta files?

Hi,
I have a primer bed file:
Chr Amplicon_Start Insert_Start Insert_Stop Amplicon_Stop
chr17 41275996 41276024 41276122 41276149
chr17 41267705 41267733 41267856 41267884
How could I get the primer fasta files?

When I run the shell from the readme file,
'''
python3 cutPrimers.py -r1 example/1_S1_L001_R1_001.fastq.gz -r2 example/1_S1_L001_R2_001.fastq.gz -pr15 example/primers_R1_5.fa -pr25 example/primers_R2_5.fa -pr13 example/primers_R1_3.fa -pr23 example/primers_R2_3.fa -tr1 example/1_r1_trimmed.fastq.gz -tr2 example/1_r2_trimmed.fastq.gz -utr1 example/1_r1_untrimmed.fastq.gz -utr2 example/1_r2_untrimmed.fastq.gz -t 2
'''
As a result I get four files with the following sizes: 4.2 Mb, 2.3 Mb, 3.6 Mb and 2.3 Mb for files with trimmed R1 reads, untrimmed R1 reads, trimmed R2 reads and untrimmed R2 reads, respectively.
But I find some the primer sequences are still in the untrimmed reads files,why?
less primers_R1_5.fa|head

R1
AGAGTGGGTGTTGGACAGTGT

less 1_S1_L001_R1_001.untrim.fastq.gz|grep AGAGTGGGTGTTGGACAGTGT
GATTAGAGCCTAGTCCAGGAGAATGAATTGACACTAATCTCTGCTTGTGTTCTCTGTCTCCAGCAATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAAATCACCGA
AGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTACTGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAGCTTACAAAGTATCACCGA
AGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTACTGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAGCTTACAAAGGATCACCGA
GATTAGAGCCTAGTCCGGGAGAATGAATTGACACTAATCTCTGCTTGTGTTCTCTGTCTCCAGCAATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAAATCACCG
GATTAGAGCCTAGTCCAGGAGAATGAATTGACACTAATCTCTGCTTGTGTTCTCTGTCTCCAGCAATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAAATCACCGA
AGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTACTGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAATCACCGACTGCCCATAGG
AGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGAGCTGGACACCTACCTGATACCCCAGATCCCCCACAGCCACTACTGACTGCAGCCAGCCACAGGTACAGAGCCACAGGACCCCAAGAATGAGCTTACAAATATCACCGA
AGAGTGGGTGTTGGACAGTGTGTGGCTGTGTGGGTCAGTGTATGGCTGTGTGGGTTGGTGAGTGGTTGTGTGGGTTGCTGTGTGTGCGTGTGGGGTGCCTGTTTTGGGGAAAAATAGCTTTTCACATCTGCAATCACCGACTGCCCATAGG
GATTAGAGCCTAGTCCAGGAGAATGAATTGACACTAATCTCTGCTTGTGTTCTCCGTCTCCAGCAATTGGGCAGATGTGTGAGGCACCTGTGGTGACCCGAGAGTGGGTGTTGGACAGTGTAGCACTCTACCAGTGCCAGGACATCACCGA

And I use cutPrimers to deal amplicon fastq, eighty percent reads trimmed, there are still twenty percent reads untrimmed with the primer sequence. How could I solve this problem?

Any help is appreciated.
Best Regards,
Amy

parameters and efficiency

Hi,
Here is what I do. I have 2 paired-end reads (740,134 reads each) with staggered degenerate primers:

Forward primers Reverse primers
CCTACGGGNGGCWGCAG GACTACHVGGGTATCTAATCC
TCCTACGGGNGGCWGCAG TGACTACHVGGGTATCTAATCC
ACCCTACGGGNGGCWGCAG ACGACTACHVGGGTATCTAATCC
CTACCTACGGGNGGCWGCAG CTAGACTACHVGGGTATCTAATCC

Staggered meaning that 1, 2 or 3 bases are added to the main primer (e.g. CCTACGGGNGGCWGCAG) to increase diversity in Illumina sequencing. I would need to cut these out from the fastq. So I prepared 2 fasta files with 4 primer sequences each and ran cutprimers.py:

python3 cutPrimers.py 
    -r1 $FWD_read \
    -r2 /$REV_read \
    -pr15 forward_primers.fa \
    -pr25 reverse_primers.fa \
    -tr1 trim.pair1.fastq.gz \
    -tr2 trim.pair2.fastq.gz \
    -utr1 untrimmed1.fastq.gz \
    -utr2 untrimmed2.fastq.gz \
    --error-number 10 \
    -stat trim.statistics.log \
    --primer3-absent \
    --primer-location-buffer 30 \
    --threads 1

A note on the parameters:
--error-number 10 to account for the 2 ambiguous nucleotides (N and W) + the few bases that are added in the beginning of the primer. I tried lowest values but a few good primers remained in the untrimmed output. 10 does the job fine
--primer-location-buffer 30 to limit the region searched (in hope to gain time)
--threads 1 because the parameters I chose are really RAM demanding when I run in parallel.

Now I can't understand well the trim.statistics.log file. It shows stats for 4 primers only (when I was expecting 8), and the number of reads don't match my samples:

Primer Total_number_of_reads Number_without_any_errors Number_with_sequencing_errors Number_with_synthesis_errors
3F 599 0 1198 0
3R 599 0 0 0
4F 719215 0 1438430 0
4R 719215 0 0 0

It seems to do the job well although it takes quite a lot of time and RAM (I am on a server and it uses 15 GB for a little more than 2 hours for 1 paired-end reads). Do you think my parameters are adapted to what I want to do? Thanks!

fastq file trimmed after cutprimer can not correct run in mpileup

Hello,
This programme is very fantastic in my daily use. However, recently I found the fastq files trimmed after by cutprimers.py can not run correct by samtools mpileup programme. My analysis pipeline as following: Fastq reads trimmed by cutprimers.py-- bwa mem reads alignment-- samtools view convert file from sam to bam -- samtools sort to sort the bam file -- samtools mpileup -a to output all position statistics. But finnally the output file lost most positions it should be coveraged well. At the begining, I thought it might be the problems from the samtools mpileup programe. but when i skip the cutprimer step, all position is well reported in mpileup files. So, i write to find your help to solve this disaster problem.
best regards,
Yuanwu

Bug with Primer File processing?

I have one forward PCR primer (F1) and two different reverse primers (let's say R1 and R2). I us miSeq paired-end. I'm guessing that primersFileR1_5 and primersFileR2_5 has to have the same number of sequences. So primersFileR1_5 file, I repeated F1 twice. Weird thing is that the order of primer sequences in the primerFileR2_5 influences the outcome. If I list R1 first and then R2 in the fasta file, it works. But if I list R2 first and R1 second, it doesn't trim. Now in this file, there is no R2 match, but there should be R1 match.

I tried with 1.2 release and the current version pulled out from github, and they behaves in the same way. I think this is a bug, but could you take a look at it?

test-cutPrimers.tar.gz

I'm attaching a test dataset (with 4 read pairs) in the attached tar.gz file. There are four files in the directory. test_R[12].fq are the reads, and pri_R[12]_5.fa is the primer files. I used this command. You will see that with the current pri_R2_5.fa, it doesn't remove any primers. But if you switch the order or R1 and R2 in this file, it works.

python3 cutPrimers.py -r1 test_R1.fq -r2 test_R2.fq -pr15 pri_R1_5.fa -pr25 pri_R2_5.fa -tr1 out.tr1 -tr2 out.tr2 -utr1 out.utr1 -utr2 out.utr2

Thank you,
Naoki

Little doubt

def makeHashes(seq,k):
    # k is the length of parts
    subSeqs=[]
    h=[]
    lens=set()
    for i in range(len(seq)-k+1):
        h.append(hashlib.md5(seq[i:i+k].encode('utf-8')).hexdigest())
        lens.add(k)
    return(h,lens)

I am new to python. Why add k to set lens in each iteration since it is a fixed value?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.