Giter Club home page Giter Club logo

Comments (6)

koheiw avatar koheiw commented on August 26, 2024

It is an interesting experiment. Can you upload the code and the data for me to investigate?

from seededlda.

masa126 avatar masa126 commented on August 26, 2024

Sorry to late my response.
Please find the input data (DTM) and my R scripts in the below.
Kokoro-LDA-TopicRatio.txt
kokoro_71w_DTM.csv
Regards

from seededlda.

koheiw avatar koheiw commented on August 26, 2024

Thanks for the files. A simple answer to your question is that, although topicmodels and seededlda are based on the same C++ code, they are different packages. You selected topics where "K" is at the top to make the plot, but other words are different between them for this reason.

> seededlda::terms(kokoro_LDAs, 10)
      topic1 topic2 topic3   topic4   topic5 topic6 topic7     topic8     topic9 topic10 
 [1,] "見る" "眼"   "聞く"   "先生"   "妻"   "手紙" "奥さん"   "K"       "父"   "叔父"  
 [2,] "行く" "見る" "言葉"   "人"     "自分" "書く" "お嬢さん" "自分"     "母"   "人"    
 [3,] "帰る" "帰る" "思う"   "見える" "思う" "人"   "女"       "二人"     "兄"   "思う"  
 [4,] "前"   "顔"   "知れる" "答える" "死ぬ" "来る" "自分"     "出る"     "病気" "家"    
 [5,] "出る" "頭"   "事"     "解る"   "人間" "読む" "思う"     "お嬢さん" "死ぬ" "東京"  
 [6,] "卒業" "声"   "問題"   "人間"   "心"   "返事" "出る"     "答える"   "東京" "考える"
 [7,] "聞く" "室"   "前"     "二人"   "外"   "自分" "男"       "知る"     "好い" "自分"  
 [8,] "笑う" "来る" "話"     "知る"   "行く" "気"   "考える"   "考える"   "聞く" "解る"  
 [9,] "顔"   "坐る" "口"     "言葉"   "一人" "今"   "今"       "見える"   "口"   "知る"  
[10,] "思う" "手"   "話す"   "態度"   "意味" "出す" "好い"     "立つ"     "知る" "心"    
> 

> topicmodels::terms(kokoro_LDAt, 10)
      Topic 1  Topic 2    Topic 3 Topic 4  Topic 5 Topic 6  Topic 7  Topic 8  Topic 9 Topic 10
 [1,] "奥さん" "K"       "自分"  "聞く"   "出る"  "人"     "今"     "見る"   "父"    "先生"  
 [2,] "女"     "お嬢さん" "思う"  "言葉"   "帰る"  "知る"   "思う"   "前"     "母"    "答える"
 [3,] "見る"   "室"       "妻"    "考える" "来る"  "見える" "叔父"   "眼"     "書く"  "人"    
 [4,] "顔"     "答える"   "死ぬ"  "口"     "立つ"  "自分"   "家"     "顔"     "手紙"  "卒業"  
 [5,] "急"     "声"       "心"    "話"     "宅"    "解る"   "事"     "手"     "兄"    "解る"  
 [6,] "態度"   "付く"     "人間"  "意味"   "笑う"  "心持"   "東京"   "問題"   "出す"  "人間"  
 [7,] "二人"   "坐る"     "一人"  "様子"   "行く"  "頭"     "知れる" "悪い"   "読む"  "外"    
 [8,] "少し"   "取る"     "気"    "返事"   "歩く"  "好い"   "考える" "少し"   "病気"  "手"    
 [9,] "眼"     "心持"     "外"    "気"     "見る"  "二人"   "頭"     "知れる" "卒業"  "少し"  
[10,] "話す"   "聞く"     "帰る"  "二人"   "二人"  "話す"   "心"     "聞く"   "東京"  "一人"  

It is hard to say which is better, but you can check in which sections "K" should appear. I believe the person only appear only late in the story. In that case, the result of seededlda is more correct.

from seededlda.

masa126 avatar masa126 commented on August 26, 2024

Thank you for your response.
I confirmed the document-term-matrix elements stored in the dtm-format(topicmodels input) and dfm-format(seedelda input) in my R scripts #76-100 and #153-178.
These the document-term-matrix elements are same.
I specified the random.seed and Gibbs sampling for topicmodels and seededlda as same.
If topicmodels and seededlda are based on the same C++ code, the results should be same.
Is there any other parameters for topicmodels or seededlda?

from seededlda.

koheiw avatar koheiw commented on August 26, 2024

We need to modify C++ code in creating R packages. seededlda and topicmodels use different mechanism for random number generation, for example. I also rewrote the code from GibbsLDA++ entirely to replace arrays with vectors.

You might understand the difference if you compare these two files:

https://github.com/koheiw/seededlda/blob/master/src/lda.h
https://github.com/cran/topicmodels/blob/master/src/model.h

from seededlda.

masa126 avatar masa126 commented on August 26, 2024

Thank you for your comments.
I understand the difference between topicmodels and seededlda is caused by the mechanism for random number generation.

from seededlda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.