Giter Club home page Giter Club logo

unsupervisedlanguagelearning's Introduction

Unsupervised Language Learning

Code for the ULL course at UvA 17/18

Instructions for running the code of Lab1

There is only one ipython notebook which can be used to replicate all the results.

Instructions for running the code of Lab2

Training the models

There are two skipgram models. One that uses negative sampling and batches which is called skipgram_with_negative_sampling_and_batches.py and one without both negative sampling and batches called skipgram.py. If you download the entire folder you can simply run them by running python3 filename.py. The file bayesian_skipgram.py trains the bayesian skipgram model and the file embed_align.py trains the embed align model, there are no further instructions needed for running these models. One can adjust the number of epochs in the code and the weights are automatically stored after training.

Evaluating the models

As explained in the report we use different methods to construct our output to the lexical substitution task (LST). make_lst_skipgram.py can be used to for the skipgram model, make_lst_bayesian_##.py can be used for the bayesian skipgram model and depending on the file name the different methods are applied. Finally, the file lst_embed_align_KL.py can be used to construct the output for the LST taks using the embed align model. Simply renaming the output file to lst.out and moving it into the /lst folder is enough to run the provided script that determines the generalized average precision.

Extra files

Preprocess.py is used by all models to construct the vocabulary and training pairs. The file find_most_similar_words.py uses cosine similarity to find similar words using the trained embeddings and was solely used to test the obtained embeddings.

Instructions for running the code of Lab3

Training the skipgram model

The skipgram model can be trained by running the train_skipgram.py file. This generates a .bin model file which can be used in combination with the Gensim package.

Testing the models using SentEval

Open the iPython notebooks and run them to obtain the results and save them using pickle. In order to use the word probabilities, you can use the get_word_probabilities function defined in the skipgram notebook. This will store a pickle file on your disk with all the probabilities. We assume that the hansards and SentEval folders are in the same directory as the notebook.

unsupervisedlanguagelearning's People

Watchers

James Cloos avatar Stefan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.