Giter Club home page Giter Club logo

pytorch-rnn-lm's Introduction

Pytorch RNN Language Models

This repo shows how to train neural language models using Pytorch example code.

Requirements

  • This only works on a Unix-like system, with bash.

  • Python 3 must be installed on your system, i.e. the command python3 must be available

  • Make sure virtualenv is installed on your system. To install, e.g.

    pip install virtualenv

Steps

Clone this repository in the desired place:

git clone https://github.com/bricksdont/pytorch-rnn-lm
cd pytorch-rnn-lm

Create a new virtualenv that uses Python 3. Please make sure to run this command outside of any virtual Python environment:

./scripts/make_virtualenv.sh

Important: Then activate the env by executing the source command that is output by the shell script above.

Download and install required software:

./scripts/install_packages.sh

The steps from here on deviate slightly from the original steps from which this repository was forked

Download and preprocess data:

./scripts/download_data.sh

Both datasets have already been included in this repository as text files in the data folder (./data).

Below are the changed I made to the download_data.sh script:

  1. create harry_potter folder in ./data
  2. create a raw folder inside ./data/harry_potter
  3. copied the Harry Potter data set from ./data to ./data harry_potter/raw
  4. preprocess, tokenize and divide into training, validation and tests sets by changing the file path of the input and name and path of the ouput.

*The Sherlock novels dataset can also be used by following the steps above and just replacing all Harry Potter entries into Sherlock. *I also payed around with the vocabulary and thus made another version of both sets with a vocabulary size of 10 000.

Train a model:

./scripts/train.sh

Since I conducted my experiments with a GPU, I had to modify some of the settings on the train.sh script: CUDA_VISIBLE_DEVICES=0 (changed) --cuda (added)

Make sure to specify/change the data path/folder to the one you want to work on

I played around with different hyperparameters by changing the values for --epochs, --emsize and --nhid, and --droput.

The naming convention for the model has also been modified by including information on dataset, vocabulary size, embedding, dropout and epochs. This makes it easier to identfify later one which model was made following which hyperparameter/s.

The training process can be interrupted at any time, and the best checkpoint will always be saved.

Generate (sample) some text from a trained model with:

./scripts/generate.sh

I also had to do the following modifications on generate.sh: CUDA_VISIBLE_DEVICES=0 (changed) --cuda (added)

Make sure to set/change the path/folder name of your data source

Put in your selected language model and its path in checkpoint

I changes the number of words to be generated from 100 to 500

Assign an easy to understand filename for your sample output file. I opted to follow the naming convention I made in order to know which sample was generated from which model

I also tinkered with the temperature hyperparameter


If the scripts are run without changing anything it will create a model from a vocabulary size of 5000 with an embedding of 400, dropout .6, epochs =40

The temperature is set to .6 on the generate.sh script


I have done quite a lot of experiments and will only show 15 of them below using the Harry Potter data set. A pdf with all of the results I got is also included in this repository.

Harry Potter Dataset

Vocabulary size: 5000

╔═══════════╦═════════╦═══════╦════════════╗ ║ Embedding ║ Dropout ║ Epoch ║ Perplexity ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 200 ║ 0.5 ║ 40 ║ 101.64 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 300 ║ 0.5 ║ 40 ║ 101.53 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 400 ║ 0.5 ║ 40 ║ 104.49 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 400 ║ 0.6 ║ 40 ║ 100.71 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 400 ║ 0.65 ║ 40 ║ 101.76 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 100 ║ 0.3 ║ 40 ║ 104.93 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 300 ║ 0.6 ║ 40 ║ 101.08 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 1000 ║ 0.65 ║ 40 ║ 104.65 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 300 ║ 0.6 ║ 60 ║ 101.08 ║ ╚═══════════╩═════════╩═══════╩════════════╝

Vocabulary size: 10000

╔═══════════╦═════════╦═══════╦════════════╗ ║ Embedding ║ Dropout ║ Epoch ║ Perplexity ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 200 ║ 0.5 ║ 40 ║ 134.6 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 200 ║ 0.3 ║ 40 ║ 140.54 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 300 ║ 0.5 ║ 40 ║ 134.55 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 200 ║ 0.5 ║ 30 ║ 134.6 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 200 ║ 0.5 ║ 50 ║ 134.6 ║ ╠═══════════╬═════════╬═══════╬════════════╣ ║ 300 ║ 0.3 ║ 40 ║ 144.18 ║ ╚═══════════╩═════════╩═══════╩════════════╝

pytorch-rnn-lm's People

Contributors

emvibal avatar bricksdont avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.