yifan0sun / dynamicword2vec Goto Github PK

View Code? Open in Web Editor NEW

71.0 3.0 26.0 346.28 MB

Dynamic Word Embeddings for Evolving Semantic Discovery code.

Python 100.00%

dynamicword2vec's Introduction

DynamicWord2Vec

Paper title: Dynamic Word Embeddings for Evolving Semantic Discovery.

Paper links: https://dl.acm.org/citation.cfm?id=3159703 https://arxiv.org/abs/1703.00607

Files:

/embeddings

embeddings in loadable MATLAB files. 0 corresponds to 1990, 1 to 1991, ..., 19 to 2009. To save space, each year's embedding is saved separately. When used in visualization code, first merge to 1 embedding file.

/train_model

contains code used for training our embeddings
data file download: https://www.dropbox.com/s/nifi5nj1oj0fu2i/data.zip?dl=0

/train_model/train_time_CD_smallnyt.py
- main training script
/train_model/util_timeCD.py
- containing helper functions

/other_embeddings

contains code for training baseline embeddings
data file download: https://www.dropbox.com/s/tzkaoagzxuxtwqs/data.zip?dl=0

/other_embeddings/staticw2v.py
- static word2vec (Mikolov et al 2013)
/other_embeddings/aw2v.py
- aligned word2vec (Hamilton, Leskovec, Jufarsky 2016)
/other_embeddings/tw2v.py
- transformed word2vec (Kulkarni, Al-Rfou, Perozzi, Skiena 2015)

/visualization

scripts for visualizations in paper

/visualization/norm_plots.py
- changepoint detection figures
/visualization/tsne_of_results.py
- trajectory figures

/distorted_smallNYT

code for robust experiment
data file download: https://www.dropbox.com/s/6q5jhhmxdmc8n1e/data.zip?dl=0

/misc

contains general statistics and word hash file

dynamicword2vec's People

Contributors

Stargazers

Watchers

dynamicword2vec's Issues

Broken URLs

The links to the data files in the README appear to be broken. Can you please update those?

Is there a separate calculation process for the embedding file (emb_frobreg~) used in the visualization ?

Hi there. I am implementing the paper myself by referring to that repertoire. I have one question.

In the training code you wrote, I understood that you save Ulist, Vlist respectively as training results, but in the visualization code, you use an embedding weight file named 'emb_frobreg~ '.
I didn't see any code that results in that file. Is there a separate process for calculating it ?

thank you for your time :)

FileNotFoundError: [Errno 2] No such file or directory: 'data/wordlist.txt'

I am trying to run your visualization code on your embeddings. I get this error:

/DynamicWord2Vec/visualization$ python tsne_of_results.py
Traceback (most recent call last):
File "tsne_of_results.py", line 19, in
fid = open('data/wordlist.txt','r')
FileNotFoundError: [Errno 2] No such file or directory: 'data/wordlist.txt'

Do you still have that file available somewhere?

Issues to reproduce the results in Table 6 of the paper for aligment quality

Step 1: I loaded embeddings from the folder "./embeddings". The provided folder has only 26 files while one is missing for NYT alignment quality. I copy the last file embeddings_25.m as embeddings_26.m to have 27 embeddings in total.
The results for the alignment quality test1 are (MRR, P@1, P@3, P@5, P@10 respectively)
0.1027 0.0494 0.1042 0.1340 0.1962
while the reported one is:
0.4222 0.3306 0.4854 0.5488 0.6191
For test2: I get
0.1161 0.0449 0.1079 0.1775 0.2989
while the reported one is:
0.1444 0.0764 0.1596 0.2202 0.3820

Step 2: When I tried to loaded the pre-trained static word embedding and PMI matrices to train it by using the provided code (train_time_CD_smallnyt.py). The hyperparamters are the same in the paper and the code, and I also tried smaller batch size, which seems to be more stable. The best performance is a little bit better than Step 1.

Step 3: I tried to train the word embedding and calculate PMI matrices by myself. The performance was not improved.

Could you please provide some tips for these issues? or provide the evaluation for the alignment quality task to check whether I implemented the evaluation properly?

best,
Benyou

How to get the Embeddings in the form of Matlab files?

Hi,
Sorry for this question but after I train the model, I will get some U and V s as pickle files. However, I notice that the embeddings used in visualization and other tasks are matlab data files. How do we get to that? Can you please direct me to a resource to do that?

Generating temporal emebeddings on own data

Hi,

Is it possible to generate on my own data if i have individual embeddings generated by word2vec of temporal data? I have a total of ~10 embeddings. Which script should i use for this. is it ok to modify accordingly?

thank you for your time

The links are expired

Could you renew the links?

Code error while running your code to train the model.

Hi,
I am trying to get dynamic embeddings for my dataset. I have the data file in the format word,context,ppmi. But while running the main train script, I get the following error:

Traceback (most recent call last):
  File "train_time_CD_smallnyt.py", line 115, in <module>
    pmi_seg = pmi[:,ind].todense()
AttributeError: 'matrix' object has no attribute 'todense'

Can you help me understand why this is happening?

Issues to reproduce the reported results in the paper

Hi, I am trying to implement the results in the paper. Some issues I met hindered me:

The provided wordlist.txt does not match the evaluation set: only 2294 out of 11028 words can be found in test1 (the alignment quality task) and 5 out of 445 words can be found in test2. The size of word vocabulary in the scripts is 20936, but the provided emb_static.mat has 20000 words.
The provided pickled files for baselines have 20 time slices, but the NYT data itself has 27 time slices. This makes it difficult to reproduce the baseline results as well.
Thanks for your attention in advance

yifan0sun / dynamicword2vec Goto Github PK

dynamicword2vec's Introduction

DynamicWord2Vec

dynamicword2vec's People

Contributors

Stargazers

Watchers

Forkers

dynamicword2vec's Issues

Broken URLs

Is there a separate calculation process for the embedding file (emb_frobreg~) used in the visualization ?

FileNotFoundError: [Errno 2] No such file or directory: 'data/wordlist.txt'

Issues to reproduce the results in Table 6 of the paper for aligment quality

How to get the Embeddings in the form of Matlab files?

Generating temporal emebeddings on own data

The links are expired

Code error while running your code to train the model.

Issues to reproduce the reported results in the paper

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent