Giter Club home page Giter Club logo

Comments (10)

emford avatar emford commented on August 10, 2024 2

Ill check into this and see whats going on with the tests.

from q2-dada2.

emford avatar emford commented on August 10, 2024 2

So I started on this but my MacAir wont run the tests due to not enough RAM ...:coffin:

from q2-dada2.

fconstancias avatar fconstancias commented on August 10, 2024 1

Adding a few more options to the q2-dada2 plugin - R script which is running when calling the q2-dada2 plugin - can be easy and lead to a fine-tuned /state-of-the art dada2 pipeline whitin qiime2.

In addition to the minOverlap, I think that adding truncQ and minlen parameters can help on specific datasets.
As @benjjneb mentionned, adding a parameter to specify the pooling strategy : "pseudo", TRUE instead of default FALSE will allow detection of singletons.
Having a closer look at the R script running behind q2-dada2 plugin, I also find it more logical to add the option randomize = TRUE as default in the learnErrors function. My understanding is that doing so, samples are randomly added until enough bases/reads are loaded in order to learn the error rates of the dataset instead of adding samples according sample names alphabetic order which can correspond to specific samples of the entire dataset.
Is this correct @benjjneb ?

Last one, adding the possibility to export plots from the plotError function could also help people to confirm that everything went well and that error model fits the data. Also, exporting read/ASV length distribution after read merging (as shown here) can also provide a positive feeling that everything went well.

I can help to add those options if you agree they are helpfull.

from q2-dada2.

benjjneb avatar benjjneb commented on August 10, 2024

Current Behavior
minOverlap is hardcoded at 20

An important note, as part of the update to 1.10 or the R package, the minOverlap default value change from 20 to 12, so that is now the "hardcoded" value.

Maybe it is still worth exposing this, but the maximum benefit here is very low, as one should never go below minOverlap=4 as this opens up to a lot of FP mergers.

from q2-dada2.

nbokulich avatar nbokulich commented on August 10, 2024

An important note, as part of the update to 1.10 or the R package, the minOverlap default value change from 20 to 12, so that is now the "hardcoded" value.

thanks @benjjneb ! I was not aware of that change.

Maybe it is still worth exposing this, but the maximum benefit here is very low

I agree, probably not a priority but we have had a number of users ask about this so it may be worthwhile to give control.

one should never go below minOverlap=4 as this opens up to a lot of FP mergers.

we can set 4 as the minimum overlap; an error will be raised if users try to go lower. (an explanation can be given in the description)

from q2-dada2.

benjjneb avatar benjjneb commented on August 10, 2024

If it's sufficiently requested then OK.

Would it be possible to create a milestone linked to the next Q2 release?
There are .couple other things I want to add to the plugin for the next release now that 1.10 is in (especially pooling/singletons) and would help a lot to have them organized by a milestone with reference to a (tentative) date for the next Q2 release.

from q2-dada2.

nbokulich avatar nbokulich commented on August 10, 2024

I have added to the 2019.7 release project page — we have been using projects to organize release goals (and release dates and details are available on that page). We have not been using the milestones feature, but you are welcome to use that feature if it helps you organize issues for q2-dada2. Thanks!

from q2-dada2.

colinbrislawn avatar colinbrislawn commented on August 10, 2024

I'm gamely interested in working on this issue.

EDIT: The maxMismatch should be included in this PR as well. These settings are related and powerful!

from q2-dada2.

Oddant1 avatar Oddant1 commented on August 10, 2024

Exposing the min_overlap parameter and defaulting it to 12 (which does appear to be the default value in dada2 after a cursory look) causes the tests to fail, and I was unable to find a value of min_overlap that did pass the tests. The table we get as a result of the command with min_overlap always has more nonzero elements than expected.

from q2-dada2.

benjjneb avatar benjjneb commented on August 10, 2024

'truncQ' is almost always a superfluous parameter in my experience.

randomize=TRUE has the downside of giving non-identical results when the pipeline is re-run, so for total reproducibility randomize=FALSE is the current default in the R package.

I hope to have a pull request to add pseudo-pooling up today, sadly probably too late for the imminent release though.

The read length stats and plotErrors output definitely can be useful diagnostics in some datasets.

from q2-dada2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.