In the paper https://arxiv.org/abs/2112.08687 BLEND was tested on only one non-HiFi read dataset. That was a simulated read dataset for one of the smallest eukaryotic genomes โ the genome of Saccharomyces cerevisiae.
To test how well BLEND performs on real (non-simulated) datasets of genomes which have more typical sizes, I used it to assemble genomes from these two sets of reads:
- Caenorhabditis elegans, PacBio CLR reads used in the article https://www.sciencedirect.com/science/article/pii/S2589004220305770 . For polishing I also used Illumina reads from that article. The nematode genome size is approximately 100 Mbp.
- Arabidopsis thaliana, Nanopore reads https://www.ncbi.nlm.nih.gov/sra/?term=ERR5530736 . For polishing I also used Illumina reads https://www.ncbi.nlm.nih.gov/sra/?term=ERR2173372 . The size of arabidopsis' genome is approximately 120 Mbp.
I searched for overlaps, then assembled the genomes with Miniasm using default parameters, then polished the assemblies using long reads with Racon, and then polished the assemblies using both long and short reads with HyPo. The assemblies were compared with references using QUAST.
The search for overlaps was performed with Blend 1.0 and, for comparison, with Minimap 2.22, using 22 threads of Intel Xeon X5670.
For the nematode, results are as follows:
|
Minimap2 |
BLEND |
Time to find overlaps |
10m |
3h 37m |
Maximum RAM consumption |
20G |
44G |
N50 |
2,056,511 |
1,915,190 |
NGA50 |
589,675 |
563,498 |
misassemblies |
740 |
707 |
Genome fraction |
99.692% |
99.683% |
Total length |
109,516,352 |
108,958,103 |
So, the assemblies of the nematode genome made with Minimap2 and with BLEND are similar. However, Blend required 20x more time to find overlaps and 2x more RAM.
For arabidopsis Minimap found overlaps in 30 minutes using 29G RAM. I terminated BLEND because it didn't finish in 24 hours. At the moment I terminated it, BLEND was using 300G RAM.
So, it seems that on non-HiFi datasets for genomes not as small as the genome of Saccharomyces cerevisiae BLEND is slower than Minimap2 and uses more RAM. This may be so because BLEND doesn't deal efficiently with repetitive seeds.