Improvement Deion Add long-read support <

I think adding the following code directly before the <code class="notran

What's the current status? People love those long reads! <g-emoji class="g-emoji" alia

Are we ready to try to do this? It looks like we're still on <code class="notranslate"

Add long-read support,about qiime2/q2-dada2

Comments (27)

benjjneb commented on August 10, 2024 2

This won't be in the 2019Q2 update. Tentatively targeting this for end-of-year 2019.

from q2-dada2.

benjjneb commented on August 10, 2024 2

Waiting on the R package 1.12 to propagate through bioconda to Q2, as that made some important long read improvements.

from q2-dada2.

benjjneb commented on August 10, 2024 2

IMO, this is the next bit of functionality I'd really like to propagate through to the plugin.

First step would be to upgrade Q2 to the 1.14 version of the package or later. Then, it will be pretty straightforward to add a denoise-pacbio command that basically uses denoise_single/_denoise_single just like denoise-pyro does to avoid much repetition.

Also consider adding denoise-loop in similar fashion (for Loop Genomics tech).

from q2-dada2.

benjjneb commented on August 10, 2024 2

I think adding the following code directly before the filterAndTrim call should do the trick:

filt.args <- c("unfilts", "filts", "truncLen", "trimLeft", "maxEE", "truncQ", "multithread", "minLen", "maxLen")
for(varName in filt.args) {
  cat(varName, "=", get(varName), "\n")
}

from q2-dada2.

sixvable commented on August 10, 2024 2

I think adding the following code directly before the filterAndTrim call should do the trick:
filt.args <- c("unfilts", "filts", "truncLen", "trimLeft", "maxEE", "truncQ", "multithread", "minLen", "maxLen")
for(varName in filt.args) {
  cat(varName, "=", get(varName), "\n")
}

I find a mistake in _denoise.py that I actually do not use the run_dada_ccs.R I created, but the old run_dada_single.R. That is why the result is not the same with R code version.
I have fixed it. The result is same with R code version now. Check my repo sixvable/q2-dada2-CCS. It should work well.

Any way, thank you @benjjneb for patiently answering my questions!

from q2-dada2.

benjjneb commented on August 10, 2024 1

It looks like dada2 1.18 has been incorporated into the latest Q2 branch. fb5b7ff

Pending busywork updates to make sure it is passing, this would now enable straightforward implementation of an official denoise_pacbio workflow.

from q2-dada2.

colinbrislawn commented on August 10, 2024

What's the current status? People love those long reads! 🧬

from q2-dada2.

Oddant1 commented on August 10, 2024

Are we ready to try to do this? It looks like we're still on bioconductor-dada2 version 1.10.0

from q2-dada2.

benjjneb commented on August 10, 2024

Can't add this functionality until Q2's version bioconductor-dada2 is upgraded to 1.14 or later.

Versions up to 1.16 are already available through bioconda: https://bioconda.github.io/recipes/bioconductor-dada2/README.html

from q2-dada2.

harish0201 commented on August 10, 2024

I finished installing qiime2 recently, and DADA2 is still 1.10.

When are the upgrades planned to incorporate PacBio Long Reads?

@benjjneb Would you suggest upgrading the installed version and modifying the existing script (for Pyro or single ends) to incorporate PacBio specific error profiles?

from q2-dada2.

benjjneb commented on August 10, 2024

@harish0201 DADA2 1.10 is an excellent release of DADA2 that hold up to this day. That said, yes it is missing some features needed for long-read amplicon sequencing.

I don't know when the Q2 DADA2 version will be updated.

Until then, on could modify the the script in place on a Q2 install to achieve something like PacBio long-read processing, but I couldn't in good faith recommend that. Can you do the initial data processing in R, and then import into Q2? That would probably be a better idea at this ponit in time.

from q2-dada2.

harish0201 commented on August 10, 2024

Thanks for the idea! I'm planning to do the same, but I had a doubt. I'll have to run dada2 till bimera removal and taxon assignments? Once that is done, can I export the outcome in a biom format? Or should I go towards phyloseq and get it done?

…

On Tue, Oct 20, 2020, 22:51 Benjamin Callahan ***@***.***> wrote: @harish0201 <https://github.com/harish0201> DADA2 1.10 is an excellent release of DADA2 that hold up to this day. That said, yes it is missing some features needed for long-read amplicon sequencing. I don't know when the Q2 DADA2 version will be updated. Until then, on could modify the the script in place on a Q2 install to achieve something like PacBio long-read processing, but I couldn't in good faith recommend that. Can you do the initial data processing in R, and then import into Q2? That would probably be a better idea at this ponit in time. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#101 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACFMFNYTTL2ERYTPRIVYG6TSLXBIBANCNFSM4FWL3CWA> .

from q2-dada2.

benjjneb commented on August 10, 2024

I'll have to run dada2 till bimera removal and taxon assignments?

Or even just until chimera removal if you want to go back to QIIME2 and use one of their taxonomy tools.

can I export the outcome in a biom format?

I think so, but I suspect there is a straightforward guide for moving data to/from R and QIIME2 already out there somewhere. Might want to search the QIIME2 forum, I think something like that has been posted there before.

Or should I go towards phyloseq and get it done?

You can also do downstream analysis in R/phyloseq if you are more comfortable with that route. But I don't think moving data between R/Q2 will be that hard assuming that guide can be found.

from q2-dada2.

sixvable commented on August 10, 2024

@harish0201 DADA2 1.10 is an excellent release of DADA2 that hold up to this day. That said, yes it is missing some features needed for long-read amplicon sequencing.

I don't know when the Q2 DADA2 version will be updated.

Until then, on could modify the the script in place on a Q2 install to achieve something like PacBio long-read processing, but I couldn't in good faith recommend that. Can you do the initial data processing in R, and then import into Q2? That would probably be a better idea at this ponit in time.

I think I have implemented the method denoise-pacbio (I named it as denoise-ccs since pacbio also has a CLR mode) in current qiime2-2020.8 and bioconductor-dada2=1.10.1. The modified python and R script are in my repo sixvable/q2-dada2-CCS.

But the behavior is a little bit different between the qiime2 version and the R version in filterAndTrim step.
I select Zymo dataset in your paper, remove primers with R code and get the same result.

Then I compare the filterAndTrim result between the original R version and the qiime2 version. While the R code give the same result compared to your rmarkdown result (73057 to 69369) even in different paltform(Linux and Windows), the same parameters in qiime2 version gives a different result(73057 to 72940).

$qiime dada2 denoise-ccs --i-demultiplexed-seqs rawdata/demux.qza --p-trunc-len 0 --p-min-len 1000 --p-max-len 1600  --p-n-threads 0 --output-dir dada_ccs --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_single.R /tmp/qiime2-archive-_kly5rep/b8cfafb4-4836-4ef3-a4ee-d9a7a592fb3e/data /tmp/tmphc5igyje/output.tsv.biom /tmp/tmphc5igyje/track.tsv /tmp/tmphc5igyje 0 0 2.0 2 1600 independent consensus 3.5 0 1000000 NULL 32 1000

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.1 / Rcpp: 1.0.5 / RcppParallel: 5.0.2
1) Filtering .
2) Learning Error Rates
107576595 total bases in 72940 reads from 1 samples will be used for learning the error rates.
3) Denoise samples .
4) Remove chimeras (method = consensus)
5) Report read numbers through the pipeline
6) Write output

I have already change the R package into same version:
dada2=1.10.1 Rcpp 1.0.5 RcppParallel 5.0.2

I am not sure why those two ways behave different.

from q2-dada2.

harish0201 commented on August 10, 2024

@sixvable Interesting! Will evaluate and get to you!

from q2-dada2.

benjjneb commented on August 10, 2024

@sixvable Great work!

Then I compare the filterAndTrim result between the original R version and the qiime2 version. While the R code give the same result compared to your rmarkdown result (73057 to 69369) even in different paltform(Linux and Windows), the same parameters in qiime2 version gives a different result(73057 to 72940).

I'll note that the Zymo processing in the paper was done on DADA2 version 1.12.1, not 1.10.
Are you seeing different results between running the Q2 version and in R when both are fixed to the same version (1.10)?

from q2-dada2.

sixvable commented on August 10, 2024

@sixvable Great work!

Then I compare the filterAndTrim result between the original R version and the qiime2 version. While the R code give the same result compared to your rmarkdown result (73057 to 69369) even in different paltform(Linux and Windows), the same parameters in qiime2 version gives a different result(73057 to 72940).

I'll note that the Zymo processing in the paper was done on DADA2 version 1.12.1, not 1.10.
Are you seeing different results between running the Q2 version and in R when both are fixed to the same version (1.10)?

Test the R code version at 1.10.1,1.12.1 and 1.16 at both linux and windows. Always got the same result! I do not think it is because of the dada2 version. Could you try the qiime2 version?

from q2-dada2.

benjjneb commented on August 10, 2024

My guess (also from my own experience) is that the discrepancy then is caused by a mismatch in how arguments are passed from the Q2 call into the R script. Unforunately because the R script depends entirely on positional arguments, it is more susceptible to difficult to recognize errors at that step.

Is it possible to have the Rscript echo out the exact arguments it is using in the filterAndTrim call, and then investigate if there are any differences between that output when running via R, and then when running via the Q2 python interface?

from q2-dada2.

sixvable commented on August 10, 2024

Is it possible to have the Rscript echo out the exact arguments it is using in the filterAndTrim call, and then investigate if there are any differences between that output when running via R, and then when running via the Q2 python interface?

Sorry , that is beyond my code ability. Have no idea to implement that.

from q2-dada2.

sixvable commented on August 10, 2024

I think q2-demux summarize also need to be updated to accommodate the high quality values of Pacbio CCS.

Danger: Some of the forward PHRED quality values are out of range. This is likely because an incorrect PHRED offset was chosen on import of your raw data. You can learn how to choose your PHRED offset during import in the importing tutorial.

from q2-dada2.

thermokarst commented on August 10, 2024

the high quality values of Pacbio CCS

Does Pacbio CCS use something different than PHRED 33 for encoding quality scores? Can you provide some links or other information?

from q2-dada2.

sixvable commented on August 10, 2024

Does Pacbio CCS use something different than PHRED 33 for encoding quality scores? Can you provide some links or other information?

Pacbio CCS use the same PHRED 33 but normally with higher Q value than illumina (most of the CCS sequences base quality ascii character is ~ as 92 for q-value). The Quality Plot in q2-demux summarize only shows base q-value less than 45 I assume. Therefore most of the CCS base have no boxplot bar.
Check this qzv
demux.qzv.zip

from q2-dada2.

benjjneb commented on August 10, 2024

Pacbio CCS use the same PHRED 33 but normally with higher Q value than illumina (most of the CCS sequences base quality ascii character is ~ as 92 for q-value).

This is the key difference -- much, much higher Q scores in the fastqs produced by PacBio CCS basecaller than in other technologies.

from q2-dada2.

yidagao0756 commented on August 10, 2024

I think adding the following code directly before the filterAndTrim call should do the trick:
filt.args <- c("unfilts", "filts", "truncLen", "trimLeft", "maxEE", "truncQ", "multithread", "minLen", "maxLen")
for(varName in filt.args) {
  cat(varName, "=", get(varName), "\n")
}
I find a mistake in _denoise.py that I actually do not use the run_dada_ccs.R I created, but the old run_dada_single.R. That is why the result is not the same with R code version.
I have fixed it. The result is same with R code version now. Check my repo sixvable/q2-dada2-CCS. It should work well.

Any way, thank you @benjjneb for patiently answering my questions!

Hello @sixvable! Thank you so much for your efforts on making this new function! I've followed the instruction on your GitHub page, and run my data with the following commands. However, it is stuck at "An error was encountered while running DADA2 in R (return code 127)". Detailed information of the error are attached below. Do you have any ideas about which part went wrong? Any help from you will be greatly appreciated!

--i-demultiplexed-seqs single-end-demux34.qza --p-front AGRGTTTGATCMTGGCTCAG --p-adapter GGGTTACCTTGTTACGACTT --p-trunc-len 0 --p-min-len 1000 --p-max-len 1600 --p-n-threads 0 --output-dir dada_ccs --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_ccs.R /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/qiime2-archive-w7ksn8h7/2c82a2ca-8402-46d1-8fb7-e0cfc427a21b/data /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/output.tsv.biom /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/track.tsv /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/nop /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/filt AGRGTTTGATCMTGGCTCAG GGGTTACCTTGTTACGACTT 2 False 0 0 2.0 2 1000 1600 independent consensus 3.5 0 1000000

env: Rscript\r: No such file or directory
Traceback (most recent call last):
File "/Users/yidagao/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 534, in denoise_ccs
run_commands([cmd])
File "/Users/yidagao/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 42, in run_commands
subprocess.run(cmd, check=True)
File "/Users/yidagao/miniconda3/envs/qiime2-2021.2/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_ccs.R', '/var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/qiime2-archive-w7ksn8h7/2c82a2ca-8402-46d1-8fb7-e0cfc427a21b/data', '/var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/output.tsv.biom', '/var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/track.tsv', '/var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/nop', '/var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/filt', 'AGRGTTTGATCMTGGCTCAG', 'GGGTTACCTTGTTACGACTT', '2', 'False', '0', '0', '2.0', '2', '1000', '1600', 'independent', 'consensus', '3.5', '0', '1000000']' returned non-zero exit status 127.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/yidagao/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in denoise_ccs
File "/Users/yidagao/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/Users/yidagao/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/Users/yidagao/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 547, in denoise_ccs
" and stderr to learn more." % e.returncode
Exception: An error was encountered while running DADA2 in R (return code 127), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 127), please inspect stdout and stderr to learn more.

from q2-dada2.

sixvable commented on August 10, 2024

env: Rscript\r: No such file or directory

That is strange. I guess your QIIME2 environment is broken. Try to remove and reinstall the whole environment and follow the instruction in repo https://github.com/sixvable/q2-dada2-CCS.

from q2-dada2.

yidagao0756 commented on August 10, 2024

env: Rscript\r: No such file or directory

That is strange. I guess your QIIME2 environment is broken. Try to remove and reinstall the whole environment and follow the instruction in repo https://github.com/sixvable/q2-dada2-CCS.

Thank you very much! I will try for it. Do I need to prepare anything in R? Downloading certain packages? It was said that the error was encountered while running in R.

from q2-dada2.

cjfields commented on August 10, 2024

Just checking on this, we have a few local researchers inquiring about using QIIME2 for PacBio (at the moment we're steering them to using DADA2 directly).

from q2-dada2.

Add long-read support about q2-dada2 HOT 27 OPEN

Comments (27)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent