Comments (1)
Hi, Evan.
The length range of the microbiome sequences in the paper is 171–372, which is more homogenous than yours.
Yes, dividing your data set based on length would work. Then I would cluster each group separately.
After that you may want to extract the centers and use Identity (all-vs-all) on the centers and merge (select one) centers that are similar (with identity scores greater than the threshold).
Finally, run Identity on the reduced center set and the entire data set and assign a sequence to the closest center.
This is a work around for now. But this process can be automated in future releases.
Let me know if you have additional questions.
Best regards.
Hani
from identity.
Related Issues (20)
- Identity on very large data HOT 10
- A sequence is too short HOT 1
- kmer-db HOT 1
- How to get MeShClust v3.0.0 HOT 3
- Clustering MAGs to nrMAGs HOT 1
- Floating point exception (core dumped) HOT 1
- test mesh killed HOT 3
- Turn off warning?
- Floating point exception (core dumped)
- bioconda recipe for meshclust3 HOT 2
- Identity shows the highest value for very different contigs length
- Any info for default threshold and mininum read sequence length for meshclust? HOT 1
- Segmentation fault HOT 2
- Clustering long-read 18S amplicons HOT 3
- Bioconda version
- Protein sequences
- Negative threshold
- meshclust not in latest release?
- MeShClust terminates due to mean1 is zero
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from identity.