Comments (10)
Ill check into this and see whats going on with the tests.
from q2-dada2.
So I started on this but my MacAir wont run the tests due to not enough RAM ...:coffin:
from q2-dada2.
Adding a few more options to the q2-dada2 plugin - R script which is running when calling the q2-dada2 plugin - can be easy and lead to a fine-tuned /state-of-the art dada2 pipeline whitin qiime2.
In addition to the minOverlap
, I think that adding truncQ
and minlen
parameters can help on specific datasets.
As @benjjneb mentionned, adding a parameter to specify the pooling strategy : "pseudo"
, TRUE
instead of default FALSE
will allow detection of singletons.
Having a closer look at the R script running behind q2-dada2 plugin, I also find it more logical to add the option randomize = TRUE
as default in the learnErrors function. My understanding is that doing so, samples are randomly added until enough bases/reads are loaded in order to learn the error rates of the dataset instead of adding samples according sample names alphabetic order which can correspond to specific samples of the entire dataset.
Is this correct @benjjneb ?
Last one, adding the possibility to export plots from the plotError
function could also help people to confirm that everything went well and that error model fits the data. Also, exporting read/ASV length distribution after read merging (as shown here) can also provide a positive feeling that everything went well.
I can help to add those options if you agree they are helpfull.
from q2-dada2.
Current Behavior
minOverlap is hardcoded at 20
An important note, as part of the update to 1.10 or the R package, the minOverlap
default value change from 20 to 12, so that is now the "hardcoded" value.
Maybe it is still worth exposing this, but the maximum benefit here is very low, as one should never go below minOverlap=4
as this opens up to a lot of FP mergers.
from q2-dada2.
An important note, as part of the update to 1.10 or the R package, the minOverlap default value change from 20 to 12, so that is now the "hardcoded" value.
thanks @benjjneb ! I was not aware of that change.
Maybe it is still worth exposing this, but the maximum benefit here is very low
I agree, probably not a priority but we have had a number of users ask about this so it may be worthwhile to give control.
one should never go below minOverlap=4 as this opens up to a lot of FP mergers.
we can set 4 as the minimum overlap; an error will be raised if users try to go lower. (an explanation can be given in the description)
from q2-dada2.
If it's sufficiently requested then OK.
Would it be possible to create a milestone linked to the next Q2 release?
There are .couple other things I want to add to the plugin for the next release now that 1.10 is in (especially pooling/singletons) and would help a lot to have them organized by a milestone with reference to a (tentative) date for the next Q2 release.
from q2-dada2.
I have added to the 2019.7 release project page — we have been using projects to organize release goals (and release dates and details are available on that page). We have not been using the milestones feature, but you are welcome to use that feature if it helps you organize issues for q2-dada2. Thanks!
from q2-dada2.
I'm gamely interested in working on this issue.
EDIT: The maxMismatch
should be included in this PR as well. These settings are related and powerful!
from q2-dada2.
Exposing the min_overlap
parameter and defaulting it to 12 (which does appear to be the default value in dada2 after a cursory look) causes the tests to fail, and I was unable to find a value of min_overlap
that did pass the tests. The table we get as a result of the command with min_overlap
always has more nonzero elements than expected.
from q2-dada2.
'truncQ' is almost always a superfluous parameter in my experience.
randomize=TRUE
has the downside of giving non-identical results when the pipeline is re-run, so for total reproducibility randomize=FALSE
is the current default in the R package.
I hope to have a pull request to add pseudo-pooling up today, sadly probably too late for the imminent release though.
The read length stats and plotErrors
output definitely can be useful diagnostics in some datasets.
from q2-dada2.
Related Issues (20)
- Add more DADA2 parameters to the wrapper HOT 3
- Allow denoise-single to process either reverse or forward reads HOT 3
- Add long-read support HOT 27
- Certain ID schemes can become mixed up with barcodes during sorting HOT 3
- I've gone through tutorials and forums and still don't understand how to select the truncLen and trim parameters in dada2 HOT 1
- denoise-paired: add % merged to stats output
- denoise-paired: expose matchIDs parameter HOT 3
- Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) HOT 13
- Update denoise-paired document from minimum overlap 20 > 12nt
- Setting "allowOneOf" and "maxMismatch" parameters in q2/dada2 plugin HOT 1
- dada2 justConcatenate HOT 6
- Update to latest DADA2 for R 4.0 support HOT 1
- dada2 drops samples without reads unfiltered reads from table but not denoising stats. HOT 1
- Restructuring dada2 R/python code
- error while using dada2 in qiime2 HOT 1
- ENH: include error model plots in denoise-* output HOT 3
- Importing DADA2 R rep. sequences into QIIME2 HOT 1
- Can not run qiime dada2 denoise-ccs without primers or adapters HOT 4
- Add warning when pooling method and chimera detection method are misaligned HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from q2-dada2.