Giter Club home page Giter Club logo

Comments (1)

canfirtina avatar canfirtina commented on September 21, 2024 1

Hi @shelkmike,

Thanks for your interest and questions.

  1. There are several reasons for this. First, we use default parameter settings as provided by each tool. Currently, there is no default parameter setting that Minimap2 suggests for finding overlapping reads specifically when using PacBio HiFi reads. Thus, we use the only available default parameter setting for finding overlapping reads using PacBio reads: --ava-pb.

Second, it is challenging to speculate on the custom best settings for Minimap2 because there are also many other parameters that could affect accuracy, performance, and memory usage. For example, map-pb uses -w10 while map-hifi uses -w19 along with many other options that are set differently than how map-pb sets. Also overlapping seems to use the half of the window length that is used for read mapping (e.g., ava-pb -w5 and map-pb uses -w10). We also tried to find a good answer to the following question when we were designing our experiments: Is it better to use the parameter settings as suggested in map-hifi while making it suitable for finding overlapping reads (i.e., also using the options -X -e0 -m100)? We tried using half of the original map-hifi window length (-w10) and also the original window length as suggested by map-hifi (-w19). What we observe was the following: when we use the map-hifi settings along with -X -e0 -m100 for finding overlapping reads, we observe Minimap2 performs 1.2x - 4x faster than using ava-pb (still much slower than BLEND) with the cost of loss of information in the PAF file and reduced accuracy in the assembly.

Third, we use window length to the level as much as -w500 (and potentially even higher) with the ability of combining many neighbor k-mers (e.g., 100 neighbor k-mers as in -x map-hifi --genome human) not to lose from the accuracy. It is not implementation-wise possible to increase the window length more than 256 in the original implementation of Minimap2.

Perhaps, we could contact Heng Li and have his opinion on the suggested parameter settings for finding overlapping HiFi reads with Minimap2. We will update our experiments accordingly if we can receive a suggestion from him. I would also appreciate any pointer to a similar discussion where Heng Li provides some suggestions for overlapping HiFi reads.

I would also like to clarify that we use the default settings for HiFi reads when there is available (i.e., we use map-hifi for mapping HiFi reads with Minimap2).

  1. Thank you for suggesting NGA50 and I agree with your point. There is no strong reason for choosing N50 over NGA50. I believe we chose to go with N50 as it is --probably-- a more commonly reported statistic than NGA50. We also have the NGA50 numbers. The NGA50 results 1) are mostly inline with the trend we observe with the N50 results and 2) do not change our observation we make regarding the contiguity of the assemblies generated using BLEND and Minimap2 overlaps. We will include the NGA50 results in the revised version of the paper, too.

  2. We want to assess the quality of the overlapping reads by measuring the accuracy of assemblies under the same conditions without the effect of other additional tools (e.g., polishing). Otherwise it makes it more challenging to differentiate the direct effect of the overlapping algorithms from the polishing tools on the quality of assemblies. We have the following statement in our paper to clarify this point:

"We use miniasm because it does not perform error correction when generating de novo assemblies, which allows us to directly assess the quality of overlaps without using additional approaches for improving the accuracy of assemblies."

We definitely agree that miniasm needs assembly polishing to generate higher quality assemblies. It may potentially be true that the the final polished assembly may have a high accuracy such that the accuracy of the initial draft assembly does not matter at all. However, we also note that this depends on the coverage of the read set, assembly polishing tool, read mapper used for generating the input for most assembly polishing tools, and probably several more other reasons. Then, the question potentially may become: what is the coverage that BLEND and Minimap2 requires to achieve 99.9% accuracy, if they both end up generating such a good accuracy after assembly polishing? An answer for such a question may again be implied from the initial draft assemblies without any error correction.

For these reasons, we currently do not consider including assembly polishing in our experiments.

  1. This is a good question and I believe it is still open for discussion. I partially agree with your point. Unfortunately, the performance benefits are not high when using PacBio CLR reads. BLEND approximates the hash value of a seed by using seed's k-mers. Such an approximation works well with HiFi reads for obvious reasons (errors are less so the chances that we will include an erroneous k-mer in our BLEND calculation is less likely). I am still working on to make BLEND better with PacBio CLR reads and ONT reads but this is still an ongoing process. I am not sure if we will be able to get to the point than we what we already have in the current version of our implementation. We will definitely announce a new release in the same GitHub page if we can achieve better performance and memory usage with PacBio CLR reads.

We have not thoroughly tested BLEND with ONT reads. We believe the current parameter settings should be good enough for ONT reads but it is still not confirmed that they will work better than Minimap2.

In short, we believe BLEND is best fit for PacBio HiFi reads based on the results we show in our paper.

Best,

Can Firtina

from blend.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.