Giter Club home page Giter Club logo

Comments (6)

torognes avatar torognes commented on June 30, 2024

Currently, the usearch_global command does not support FASTQ files as input. It is trivial to implement, and we can do that for the next release.

Thanks for the question / suggestion.

If you are going to use the usearch_global command to map sequences to OTUs and generate an OTU table, you should remember to include --maxhits 1 or --strand plus as an option to avoid multiple hits (one on each strand) in rare cases.

from vsearch.

magicprotoss avatar magicprotoss commented on June 30, 2024

Thanks for the quick reply and the tip.
My main question is if the command supports mapping reads before primer removal and quality filtering to OTU table. Supporting fastq input is not quite as important because performing raw convertion from fastq to fasta is an easy step.

from vsearch.

torognes avatar torognes commented on June 30, 2024

Ok, now I think I understand your question.

I would not recommend running usearch_global on raw, unprocessed reads. At least not with adapters/primers. It may work if you set the --id threshold low, like 0.8, but you may risk mapping to the wrong OTUs.

Let's say there is a primer or adapter of length 20bp in one end of the read, with a total read length of 150bp. This part of the read will probably not align to the OTU sequence, and the percentage of aligned nucleotides will fall to 100%*(150-20)/150 = 87%. If you perform usearch_global with an id threshold of 0.8, corresponding to 80%, you risk matching an OTU with low similarity, as the heuristic algorithm in vsearch will report a match as soon as it finds an OTU with at least 80% identity.

There is a similar problem with sequencing errors, but they are usually much more rare and does not pose the same problem.

I think I will anyway allow the usearch_global and search_exact commands to use FASTQ files as input, as FASTQ sequences are often trimmed to remove adapters or primers.

Maybe others with more experience will have other opinions.

from vsearch.

magicprotoss avatar magicprotoss commented on June 30, 2024

Thanks for the info :)

from vsearch.

frederic-mahe avatar frederic-mahe commented on June 30, 2024

Assuming you have two reads in fastq format (already merged and trimmed), from two samples S1 and S2:

@q1;sample=S1
A
+
I
@q2;sample=S2
A
+
I

and a fasta file containing cluster representatives:

>t
A

Starting with vsearch v2.27, you can map the cluster seed t to your two samples S1 and S2:

vsearch \
    --usearch_global <(printf "@q1;sample=S1\nA\n+\nI\n@q2;sample=S2\nA\n+\nI\n") \
    --db <(printf ">t\nA\n") \
    --minseqlength 1 \
    --id 1.00 \
    --quiet \
    --otutabout -
#OTU ID	S1	S2
t	1	1

from vsearch.

frederic-mahe avatar frederic-mahe commented on June 30, 2024

tests added to our test-suite (see frederic-mahe/vsearch-tests@e962aec)

from vsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.