Giter Club home page Giter Club logo

gluon-nlp-1's Introduction

ptb_ wiki2_ wiki103_

Language Models with Transformers

Reference: C Wang, M Li, A Smola. "Language Models with Transformers". arXiv preprint arXiv:1904.09408 (2019).

Installation

pip install --pre --upgrade mxnet
pip install gluonnlp

Results

The datasets used for training the models are wikitext-2 and wikitext-103 respectively.

The key features used to reproduce the results on wikitext-2 based on the corresponding pre-trained models are listed in the following tables.

Model bert_lm_12_768_12_300_1150_wikitext2 bert_lm_24_1024_16_300_1150_wikitext2
Val PPL 38.43 37.79
Test PPL 34.64 34.11
Command [1] [2]
Result logs log log

[1] bert_lm_12_768_12_300_1150_wikitext2 (Val PPL 38.43 Test PPL 34.64)

$ cd scripts/language_model
$ python transformer_language_model.py --data wikitext2 --model bert_lm_12_768_12_300_1150 --val_batch_size 8 --test_batch_size 8 --bptt 128 --seed 1882 --batch_size 16 --gpus 0

[2] bert_lm_24_1024_16_300_1150_wikitext2 (Val PPL 37.79 Test PPL 34.11)

$ cd scripts/language_model
$ python transformer_language_model.py --data wikitext2 --model bert_lm_24_1024_16_300_1150 --val_batch_size 8 --test_batch_size 8 --bptt 128 --seed 1882 --batch_size 16 --gpus 0

The key features used to reproduce the results on wikitext-103 based on the corresponding pre-trained models are listed in the following tables.

Model bert_lm_12_768_12_400_2500_wikitext103 bert_lm_24_1024_16_400_2500_wikitext103
Val PPL 40.70 20.33
Test PPL 39.85 20.54
Command [1] [2]
Result logs log log

[1] bert_lm_12_768_12_400_2500_wikitext103 (Val PPL 40.70 Test PPL 39.85)

$ cd scripts/language_model
$ python transformer_language_model.py --data wikitext103 --model bert_lm_12_768_12_400_2500 --val_batch_size 8 --test_batch_size 8 --bptt 64 --seed 1111 --batch_size 20 --gpus 0

[2] bert_lm_24_1024_16_400_2500_wikitext103 (Val PPL 20.33 Test PPL 20.54)

$ cd scripts/language_model
$ python transformer_language_model.py --data wikitext103 --model bert_lm_24_1024_16_400_2500 --val_batch_size 8 --test_batch_size 8 --bptt 64 --seed 1111 --batch_size 12 --gpus 0

Note that the corresponding multi-gpu evaluations are also supported. The pre-trained model bert_lm_24_1024_16_400_2500_wikitext103 would be updated soon.

Reference Paper

The bibtext entry of the reference paper is:

@article{lmtransformer2019,
   title={Language Models with Transformers},
   author={Chenguang Wang and Mu Li and Alexander J. Smola},
   journal={ArXiv},
   year={2019},
   volume={abs/1904.09408}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.