Comments (1)
Thank you for your question. As in all tasks regarding the selection of the right number of clusters, topics, etc. there is no single correct answer. Each selection criterion has its own logic - you need to think about whether the logic fits to the perspective you want to display in your results. For example, the different metrics usually are used with different text windows to check the coherence - so if you target coherence within larger text windows pick the according metrics as key selection criterion.
Unfortunately, there is (to my knowledge) only few practical experience in using coherence metrics for selecting a suitable model and especially what implications parameter variations have on the coherence metrics and the resulting interpretations. With practical I mean that we find a model that makes sense from the perspective of qualitative interpretation not from computational accuracy, etc.
- So, in practice, I would simply use the coherence metrics as indicators that show you potentially interesting models with good performance, indicated by peaks
- Take the interesting models and check the top 10 or 20 terms of selected topics that are in your area of expertise so you can judge if these topics make sense - try to check thematically similar topics of different models to understand what potential gains you might get by increasing the degree of granularity (i.e. increasing number of topics).
- So in your case you might, e.g., check the models 110 / 160 / 190 (or maybe 200, but since one metric decreases 190 might be favored)
- Not for advertising my work but to show you an applied example you might have a look at this article - the situation for selecting a good model was similarly ambiguous why a qualitative check of the models was performed: https://energsustainsoc.biomedcentral.com/articles/10.1186/s13705-019-0226-z/figures/2
I hope this helps. Please do not hesitate to ask further questions.
from text2vec.
Related Issues (20)
- Questions about itoken_parallel for Windows. HOT 1
- GloVe example not working HOT 3
- RelaxedWordMoversDistance on version 0.6 HOT 5
- Viewing saved LDAvis plot from directory in browser HOT 1
- coherence documentation HOT 2
- GlobalVectors object lost fit method? HOT 3
- Verification of what the R implementation is doing HOT 2
- SWEM-concat Implementation in text2vec
- WISH: Less aggressive parallelization by default (please don't use *all* CPU cores) HOT 3
- Multiple Errors Adapting GloVe Example to Project - Quanteda Related HOT 3
- tfidf fitting much slower than expected HOT 3
- as(<dgTMatrix>, "dgCMatrix") is deprecated
- Upcoming ICU 72 may break tests HOT 1
- Errors when running in text2vec - analyzing texts HOT 2
- itoken returned data structure is not documented
- tcm (by `create_tcm`) is not documented. HOT 1
- Error: 'colScale' is not an exported object from 'namespace:Matrix' HOT 2
- RelaxedWordMoversDistance resuts are not symmetrical
- mismatch in installed version, could not run itoken function HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text2vec.