Could you please answer some questions about the article (<a href="https://arxiv.org/p

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Some questions about the article about blend HOT 1 CLOSED

shelkmike commented on September 21, 2024

Some questions about the article

from blend.

Comments (1)

canfirtina commented on September 21, 2024 1

Hi @shelkmike,

Thanks for your interest and questions.

There are several reasons for this. First, we use default parameter settings as provided by each tool. Currently, there is no default parameter setting that Minimap2 suggests for finding overlapping reads specifically when using PacBio HiFi reads. Thus, we use the only available default parameter setting for finding overlapping reads using PacBio reads: --ava-pb.

Second, it is challenging to speculate on the custom best settings for Minimap2 because there are also many other parameters that could affect accuracy, performance, and memory usage. For example, map-pb uses -w10 while map-hifi uses -w19 along with many other options that are set differently than how map-pb sets. Also overlapping seems to use the half of the window length that is used for read mapping (e.g., ava-pb -w5 and map-pb uses -w10). We also tried to find a good answer to the following question when we were designing our experiments: Is it better to use the parameter settings as suggested in map-hifi while making it suitable for finding overlapping reads (i.e., also using the options -X -e0 -m100)? We tried using half of the original map-hifi window length (-w10) and also the original window length as suggested by map-hifi (-w19). What we observe was the following: when we use the map-hifi settings along with -X -e0 -m100 for finding overlapping reads, we observe Minimap2 performs 1.2x - 4x faster than using ava-pb (still much slower than BLEND) with the cost of loss of information in the PAF file and reduced accuracy in the assembly.

Third, we use window length to the level as much as -w500 (and potentially even higher) with the ability of combining many neighbor k-mers (e.g., 100 neighbor k-mers as in -x map-hifi --genome human) not to lose from the accuracy. It is not implementation-wise possible to increase the window length more than 256 in the original implementation of Minimap2.

Perhaps, we could contact Heng Li and have his opinion on the suggested parameter settings for finding overlapping HiFi reads with Minimap2. We will update our experiments accordingly if we can receive a suggestion from him. I would also appreciate any pointer to a similar discussion where Heng Li provides some suggestions for overlapping HiFi reads.

I would also like to clarify that we use the default settings for HiFi reads when there is available (i.e., we use map-hifi for mapping HiFi reads with Minimap2).

Thank you for suggesting NGA50 and I agree with your point. There is no strong reason for choosing N50 over NGA50. I believe we chose to go with N50 as it is --probably-- a more commonly reported statistic than NGA50. We also have the NGA50 numbers. The NGA50 results 1) are mostly inline with the trend we observe with the N50 results and 2) do not change our observation we make regarding the contiguity of the assemblies generated using BLEND and Minimap2 overlaps. We will include the NGA50 results in the revised version of the paper, too.
We want to assess the quality of the overlapping reads by measuring the accuracy of assemblies under the same conditions without the effect of other additional tools (e.g., polishing). Otherwise it makes it more challenging to differentiate the direct effect of the overlapping algorithms from the polishing tools on the quality of assemblies. We have the following statement in our paper to clarify this point:

"We use miniasm because it does not perform error correction when generating de novo assemblies, which allows us to directly assess the quality of overlaps without using additional approaches for improving the accuracy of assemblies."

We definitely agree that miniasm needs assembly polishing to generate higher quality assemblies. It may potentially be true that the the final polished assembly may have a high accuracy such that the accuracy of the initial draft assembly does not matter at all. However, we also note that this depends on the coverage of the read set, assembly polishing tool, read mapper used for generating the input for most assembly polishing tools, and probably several more other reasons. Then, the question potentially may become: what is the coverage that BLEND and Minimap2 requires to achieve 99.9% accuracy, if they both end up generating such a good accuracy after assembly polishing? An answer for such a question may again be implied from the initial draft assemblies without any error correction.

For these reasons, we currently do not consider including assembly polishing in our experiments.

This is a good question and I believe it is still open for discussion. I partially agree with your point. Unfortunately, the performance benefits are not high when using PacBio CLR reads. BLEND approximates the hash value of a seed by using seed's k-mers. Such an approximation works well with HiFi reads for obvious reasons (errors are less so the chances that we will include an erroneous k-mer in our BLEND calculation is less likely). I am still working on to make BLEND better with PacBio CLR reads and ONT reads but this is still an ongoing process. I am not sure if we will be able to get to the point than we what we already have in the current version of our implementation. We will definitely announce a new release in the same GitHub page if we can achieve better performance and memory usage with PacBio CLR reads.

We have not thoroughly tested BLEND with ONT reads. We believe the current parameter settings should be good enough for ONT reads but it is still not confirmed that they will work better than Minimap2.

In short, we believe BLEND is best fit for PacBio HiFi reads based on the results we show in our paper.

Best,

Can Firtina

from blend.

Some questions about the article about blend HOT 1 CLOSED

Comments (1)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent