Giter Club home page Giter Club logo

dynamicword2vec's Introduction

DynamicWord2Vec

Paper title: Dynamic Word Embeddings for Evolving Semantic Discovery.

Paper links: https://dl.acm.org/citation.cfm?id=3159703 https://arxiv.org/abs/1703.00607

Files:

/embeddings

  • embeddings in loadable MATLAB files. 0 corresponds to 1990, 1 to 1991, ..., 19 to 2009. To save space, each year's embedding is saved separately. When used in visualization code, first merge to 1 embedding file.

/train_model

/other_embeddings

  • contains code for training baseline embeddings

  • data file download: https://www.dropbox.com/s/tzkaoagzxuxtwqs/data.zip?dl=0

    /other_embeddings/staticw2v.py

    • static word2vec (Mikolov et al 2013)

    /other_embeddings/aw2v.py

    • aligned word2vec (Hamilton, Leskovec, Jufarsky 2016)

    /other_embeddings/tw2v.py

    • transformed word2vec (Kulkarni, Al-Rfou, Perozzi, Skiena 2015)

/visualization

  • scripts for visualizations in paper

    /visualization/norm_plots.py

    • changepoint detection figures

    /visualization/tsne_of_results.py

    • trajectory figures

/distorted_smallNYT

/misc

  • contains general statistics and word hash file

dynamicword2vec's People

Contributors

yifan0sun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dynamicword2vec's Issues

Broken URLs

The links to the data files in the README appear to be broken. Can you please update those?

Is there a separate calculation process for the embedding file (emb_frobreg~) used in the visualization ?

Hi there. I am implementing the paper myself by referring to that repertoire. I have one question.

In the training code you wrote, I understood that you save Ulist, Vlist respectively as training results, but in the visualization code, you use an embedding weight file named 'emb_frobreg~ '.
I didn't see any code that results in that file. Is there a separate process for calculating it ?

thank you for your time :)

FileNotFoundError: [Errno 2] No such file or directory: 'data/wordlist.txt'

I am trying to run your visualization code on your embeddings. I get this error:

/DynamicWord2Vec/visualization$ python tsne_of_results.py
Traceback (most recent call last):
File "tsne_of_results.py", line 19, in
fid = open('data/wordlist.txt','r')
FileNotFoundError: [Errno 2] No such file or directory: 'data/wordlist.txt'

Do you still have that file available somewhere?

Issues to reproduce the results in Table 6 of the paper for aligment quality

Step 1: I loaded embeddings from the folder "./embeddings". The provided folder has only 26 files while one is missing for NYT alignment quality. I copy the last file embeddings_25.m as embeddings_26.m to have 27 embeddings in total.
The results for the alignment quality test1 are (MRR, P@1, P@3, P@5, P@10 respectively)
0.1027 0.0494 0.1042 0.1340 0.1962
while the reported one is:
0.4222 0.3306 0.4854 0.5488 0.6191
For test2: I get
0.1161 0.0449 0.1079 0.1775 0.2989
while the reported one is:
0.1444 0.0764 0.1596 0.2202 0.3820

Step 2: When I tried to loaded the pre-trained static word embedding and PMI matrices to train it by using the provided code (train_time_CD_smallnyt.py). The hyperparamters are the same in the paper and the code, and I also tried smaller batch size, which seems to be more stable. The best performance is a little bit better than Step 1.

Step 3: I tried to train the word embedding and calculate PMI matrices by myself. The performance was not improved.

Could you please provide some tips for these issues? or provide the evaluation for the alignment quality task to check whether I implemented the evaluation properly?

best,
Benyou

How to get the Embeddings in the form of Matlab files?

Hi,
Sorry for this question but after I train the model, I will get some U and V s as pickle files. However, I notice that the embeddings used in visualization and other tasks are matlab data files. How do we get to that? Can you please direct me to a resource to do that?

Generating temporal emebeddings on own data

Hi,

Is it possible to generate on my own data if i have individual embeddings generated by word2vec of temporal data? I have a total of ~10 embeddings. Which script should i use for this. is it ok to modify accordingly?

thank you for your time

Code error while running your code to train the model.

Hi,
I am trying to get dynamic embeddings for my dataset. I have the data file in the format word,context,ppmi. But while running the main train script, I get the following error:

Traceback (most recent call last):
  File "train_time_CD_smallnyt.py", line 115, in <module>
    pmi_seg = pmi[:,ind].todense()
AttributeError: 'matrix' object has no attribute 'todense'

Can you help me understand why this is happening?

Issues to reproduce the reported results in the paper

Hi, I am trying to implement the results in the paper. Some issues I met hindered me:

  1. The provided wordlist.txt does not match the evaluation set: only 2294 out of 11028 words can be found in test1 (the alignment quality task) and 5 out of 445 words can be found in test2. The size of word vocabulary in the scripts is 20936, but the provided emb_static.mat has 20000 words.
  2. The provided pickled files for baselines have 20 time slices, but the NYT data itself has 27 time slices. This makes it difficult to reproduce the baseline results as well.
    Thanks for your attention in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.