Giter Club home page Giter Club logo

se-wrl's Introduction

SE-WRL

The code for Improve word representation learning with sememes(ACL2017).

How to Run

Using the following command to train word-sense-sememe embeddings.

cp SSA.c[SSA.c/MST.c/SAC.c/SAT.c] word2vec/word2vec.c
cd word2vec
make
./word2vec -train TrainFile -output vectors.bin -cbow 0 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-4 -threads 30 -binary 1 -iter 1 -read-vocab VocabFile -read-meaning SememeFile -read-sense Word_Sense_Sememe_File -min-count 1 -alpha 0.025

TrainFile is train data set. The following three files can be found in directory datasets. VocabFile is the word vocabulary file, and SememeFile is the sememe vocabulary file. Word_Sense_Sememe_File is a file recording group information of word-sense-sememe.

Before training, you should replace word2vec/word2vec.c with one of the four files SSA.c/MST.c/SAC.c/SAT.c.

Data Set

HowNet.txt is an Chinese knowledge base with annotated word-sense-sememe information.

Sogou-T(sample).txt is a sample dataset extracted from Sogou-T.

Complete training dataset Clean-SogouT is released in https://pan.baidu.com/s/1kXgkyJ9(password: f2ul).

Evaluation Set

wordsim-240.txt and wordsim-297.txt in this files are utilized to evaluate the quality of word representations.

analogy.txt in this file is utilized to evaluate models' capability of word analogy inference.

Annotation Information

The annotation information is for the four files SSA.c/MST.c/SAC.c/SAT.c. Annotation of the common code is only included in file SSA.c.

Revise

I'm sorry that we found bugs in programs. We have revised them. The new experiment results are released on GitHub and new version of paper is given.

Word Similarity

Model Wordsim-240 Wordsim-297
CBOW 57.7 61.1
GloVe 59.8 58.7
Skip-gram 58.5 63.3
SSA 58.9 64.0
MST 59.2 62.8
SAC 59.1 61.0
SAT 61.2 63.3

Word Analogy

Model Capital City Relationship All
CBOW 49.8 85.7 86.0 64.2
GloVe 57.3 74.3 81.6 65.8
Skip-gram 66.8 93.7 76.8 73.4
SSA 62.3 93.7 81.6 71.9
MST 65.7 95.4 82.7 74.5
SAC 79.2 97.7 75.0 81.0
SAT 82.6 98.9 80.1 84.5

se-wrl's People

Contributors

heylinsir avatar tsingularity avatar

Watchers

James Cloos avatar Han Yang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.