Giter Club home page Giter Club logo

Comments (3)

yilunzhao avatar yilunzhao commented on July 17, 2024

I notice that although top selected words in each topic is clearer than other topic models like NVDM, topics related to some classes is less likely to appear. For example, only few from 100 topics indicate music/movie.

The classes I use are ['car', 'game', 'food', 'movie', 'music', 'news', 'show', 'sports', 'tech', 'travel']

edit:
I notice that we use topic embedding in ETM.
Will topic embedding encourage the topic model to discover those topics similar with each other and ignore those independent topic? For example, in my experiments, about 30% topics extracted are talking about news and very few topics relates to music.

from etm.

acatovic avatar acatovic commented on July 17, 2024

@worldchanger6666 I am not associated with ETM paper, but here is my 2 cents on why you see poor classification performance. When performing topic modelling you are throwing away lot of information. I personally wouldn't use LDA representations for downstream tasks. I see it more as a way of finding and visualizing a manageable set of themes/topics. In your case average title length is just 10 words, so probably there are lot of very subtle or rare words that you want to capture as part of classification, but LDA effectively smooths these over. Just from the classes I can see potentially lot of overlap between "movie", "music" and "show". So you're better off using SVM with BoW feature representation, or if you want to use embeddings, then you can try Deep Averaging Networks (DANs).

from etm.

bui-thanh-lam avatar bui-thanh-lam commented on July 17, 2024

You should never use representation from a generative model like LDA-based to perform a classification task.

from etm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.