Giter Club home page Giter Club logo

stm's People

Contributors

lichenliang-whu avatar messyidea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

messyidea titsuki

stm's Issues

Unable to get the result similar to the paper

I do not know whether I am missing something. But from my understanding, the default setting in stm.java should give a result near to 0.952 as in Table 4 of the paper "Effective Document Labeling with Very Few Seed Words", the macro-f1 score for ploitics-religion classification.
However, after 50 iterations I can only get 0.8438.
I used all the default settings.
Below is a part of my running log, Thanks for your help!

5038 have been indexed...
5040 have been indexed...
5042 have been indexed...
5044 have been indexed...
5046 have been indexed...
5048 have been indexed...
loading documents...
calculate co-occurrence...
start to predict...
iter: 0
f1: 0.5363255440310482
cost time 834ms
iter: 1
f1: 0.7122772025090711
cost time 635ms
iter: 2
f1: 0.7657145896016797
cost time 635ms
iter: 3
f1: 0.7998512521261621
cost time 621ms
iter: 4
f1: 0.814890872076508
cost time 708ms
iter: 5
f1: 0.8326626957441149
cost time 680ms
iter: 6
f1: 0.8262305084412165
cost time 612ms
iter: 7
f1: 0.8223032270759543
cost time 618ms
iter: 8
f1: 0.8282995798893804
cost time 651ms
iter: 9
f1: 0.8337883053625725
cost time 696ms
iter: 10
f1: 0.8333308752606785
cost time 674ms
iter: 11
f1: 0.8338216449501352
cost time 626ms
iter: 12
f1: 0.8268768778734245
cost time 620ms
iter: 13
f1: 0.8288385453402936
cost time 620ms
iter: 14
f1: 0.8263970819568979
cost time 620ms
iter: 15
f1: 0.8218679885271558
cost time 627ms
iter: 16
f1: 0.8333608587943848
cost time 612ms
iter: 17
f1: 0.8329123378577755
cost time 610ms
iter: 18
f1: 0.837828990618261
cost time 621ms
iter: 19
f1: 0.8343527584444181
cost time 614ms
iter: 20
f1: 0.834333215353154
cost time 610ms
iter: 21
f1: 0.8418369014143334
cost time 643ms
iter: 22
f1: 0.8378914584256099
cost time 631ms
iter: 23
f1: 0.8324480210666981
cost time 608ms
iter: 24
f1: 0.8304651574106827
cost time 610ms
iter: 25
f1: 0.833925500221955
cost time 609ms
iter: 26
f1: 0.8408377750635577
cost time 616ms
iter: 27
f1: 0.8353951444250514
cost time 625ms
iter: 28
f1: 0.8319127904876824
cost time 641ms
iter: 29
f1: 0.8384024870376721
cost time 623ms
iter: 30
f1: 0.8403959423280034
cost time 623ms
iter: 31
f1: 0.8398977214530726
cost time 624ms
iter: 32
f1: 0.8403669274006524
cost time 650ms
iter: 33
f1: 0.8398807311481298
cost time 636ms
iter: 34
f1: 0.8458103446454425
cost time 657ms
iter: 35
f1: 0.8428717575870712
cost time 616ms
iter: 36
f1: 0.8443508617479418
cost time 627ms
iter: 37
f1: 0.8384132832264879
cost time 609ms
iter: 38
f1: 0.8364137483787288
cost time 615ms
iter: 39
f1: 0.8423834019743461
cost time 631ms
iter: 40
f1: 0.8419055753812179
cost time 616ms
iter: 41
f1: 0.8418636323103821
cost time 626ms
iter: 42
f1: 0.8373855268834323
cost time 622ms
iter: 43
f1: 0.8373708237305959
cost time 617ms
iter: 44
f1: 0.8399027549250704
cost time 623ms
iter: 45
f1: 0.8373629916183107
cost time 658ms
iter: 46
f1: 0.8394046127772457
cost time 612ms
iter: 47
f1: 0.843880895911019
cost time 613ms
iter: 48
f1: 0.8468645220990465
cost time 631ms
iter: 49
f1: 0.8438726134924301
cost time 607ms

Discrepancy from the paper

Hi,
the eta calculation in LoadDocs.java seems to differ from the description in the paper. Specifically, in the code you assign identical eta only when sum==0. This makes the eta value of the categories whose seed word doesn't appear to be zero.

if (sum == 0) {

Below is the direct translation from the paper to replace ln 95-102.

        float sumSmooth = smooth * model.numCategories;
        for (int i = 0; i < raw.length; i++)
            raw[i] = (raw[i] + smooth) / (sum + sumSmooth);

        model.eta[index] = raw;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.