Hi there, Thanks for making this tool available and for the clean re

Runtime for Large Dataset (1.3M seqs) about identity HOT 1 OPEN

e-trop commented on August 10, 2024

Runtime for Large Dataset (1.3M seqs)

from identity.

Comments (1)

hani-girgis commented on August 10, 2024

Hi, Evan.

The length range of the microbiome sequences in the paper is 171–372, which is more homogenous than yours.

Yes, dividing your data set based on length would work. Then I would cluster each group separately.

After that you may want to extract the centers and use Identity (all-vs-all) on the centers and merge (select one) centers that are similar (with identity scores greater than the threshold).

Finally, run Identity on the reduced center set and the entire data set and assign a sequence to the closest center.

This is a work around for now. But this process can be automated in future releases.

Let me know if you have additional questions.

Best regards.

Hani

from identity.

Related Issues (20)

Identity on very large data HOT 10
A sequence is too short HOT 1
kmer-db HOT 1
How to get MeShClust v3.0.0 HOT 3
Clustering MAGs to nrMAGs HOT 1
Floating point exception (core dumped) HOT 1
test mesh killed HOT 3
Turn off warning?
Floating point exception (core dumped)
bioconda recipe for meshclust3 HOT 2
Identity shows the highest value for very different contigs length
Any info for default threshold and mininum read sequence length for meshclust? HOT 1
Segmentation fault HOT 2
Clustering long-read 18S amplicons HOT 3
Bioconda version
Protein sequences
Negative threshold
meshclust not in latest release?
MeShClust terminates due to mean1 is zero

Runtime for Large Dataset (1.3M seqs) about identity HOT 1 OPEN

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent