Improvement Deion exposing minOverlap would allow adjustmen

Adding a few more options to the q2-dada2 plugin - R which is running when call

Current Behavior minOverlap is hardcoded at 20 <p d

I'm gamely interested in working on this issue. EDIT: The <code clas

expose minOverlap parameter about q2-dada2 HOT 10 CLOSED

nbokulich commented on August 10, 2024

expose minOverlap parameter

from q2-dada2.

Comments (10)

emford commented on August 10, 2024 2

Ill check into this and see whats going on with the tests.

from q2-dada2.

emford commented on August 10, 2024 2

So I started on this but my MacAir wont run the tests due to not enough RAM ...:coffin:

from q2-dada2.

fconstancias commented on August 10, 2024 1

Adding a few more options to the q2-dada2 plugin - R script which is running when calling the q2-dada2 plugin - can be easy and lead to a fine-tuned /state-of-the art dada2 pipeline whitin qiime2.

In addition to the minOverlap, I think that adding truncQ and minlen parameters can help on specific datasets.
As @benjjneb mentionned, adding a parameter to specify the pooling strategy : "pseudo", TRUE instead of default FALSE will allow detection of singletons.
Having a closer look at the R script running behind q2-dada2 plugin, I also find it more logical to add the option randomize = TRUE as default in the learnErrors function. My understanding is that doing so, samples are randomly added until enough bases/reads are loaded in order to learn the error rates of the dataset instead of adding samples according sample names alphabetic order which can correspond to specific samples of the entire dataset.
Is this correct @benjjneb ?

Last one, adding the possibility to export plots from the plotError function could also help people to confirm that everything went well and that error model fits the data. Also, exporting read/ASV length distribution after read merging (as shown here) can also provide a positive feeling that everything went well.

I can help to add those options if you agree they are helpfull.

from q2-dada2.

benjjneb commented on August 10, 2024

Current Behavior
minOverlap is hardcoded at 20

An important note, as part of the update to 1.10 or the R package, the minOverlap default value change from 20 to 12, so that is now the "hardcoded" value.

Maybe it is still worth exposing this, but the maximum benefit here is very low, as one should never go below minOverlap=4 as this opens up to a lot of FP mergers.

from q2-dada2.

nbokulich commented on August 10, 2024

An important note, as part of the update to 1.10 or the R package, the minOverlap default value change from 20 to 12, so that is now the "hardcoded" value.

thanks @benjjneb ! I was not aware of that change.

Maybe it is still worth exposing this, but the maximum benefit here is very low

I agree, probably not a priority but we have had a number of users ask about this so it may be worthwhile to give control.

one should never go below minOverlap=4 as this opens up to a lot of FP mergers.

we can set 4 as the minimum overlap; an error will be raised if users try to go lower. (an explanation can be given in the description)

from q2-dada2.

benjjneb commented on August 10, 2024

If it's sufficiently requested then OK.

Would it be possible to create a milestone linked to the next Q2 release?
There are .couple other things I want to add to the plugin for the next release now that 1.10 is in (especially pooling/singletons) and would help a lot to have them organized by a milestone with reference to a (tentative) date for the next Q2 release.

from q2-dada2.

nbokulich commented on August 10, 2024

I have added to the 2019.7 release project page — we have been using projects to organize release goals (and release dates and details are available on that page). We have not been using the milestones feature, but you are welcome to use that feature if it helps you organize issues for q2-dada2. Thanks!

from q2-dada2.

colinbrislawn commented on August 10, 2024

I'm gamely interested in working on this issue.

EDIT: The maxMismatch should be included in this PR as well. These settings are related and powerful!

from q2-dada2.

Oddant1 commented on August 10, 2024

Exposing the min_overlap parameter and defaulting it to 12 (which does appear to be the default value in dada2 after a cursory look) causes the tests to fail, and I was unable to find a value of min_overlap that did pass the tests. The table we get as a result of the command with min_overlap always has more nonzero elements than expected.

from q2-dada2.

benjjneb commented on August 10, 2024

'truncQ' is almost always a superfluous parameter in my experience.

randomize=TRUE has the downside of giving non-identical results when the pipeline is re-run, so for total reproducibility randomize=FALSE is the current default in the R package.

I hope to have a pull request to add pseudo-pooling up today, sadly probably too late for the imminent release though.

The read length stats and plotErrors output definitely can be useful diagnostics in some datasets.

from q2-dada2.

expose minOverlap parameter about q2-dada2 HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent