affjljoo3581 / gpt2 Goto Github PK

View Code? Open in Web Editor NEW

249.0 4.0 50.0 820 KB

PyTorch Implementation of OpenAI GPT-2

License: Apache License 2.0

Python 85.75% Jupyter Notebook 14.25%

pytorch nlp language-model gpt2 natural-language-processing natural-language-generation transformer

gpt2's People

Contributors

Stargazers

Watchers

gpt2's Issues

Is Apex useful for GPT-2?

hi, Is there a reduction in the size of the GPT-2 model when using Apex, is the inference speed of the model faster?

Dataset에 대해 문의 드립니다.

안녕하세요. 우선 한국어 GPT2 pretrained 모델을 공개해주셔서 정말 감사합니다. 정말 멋진 프로젝트입니다. 한 가지 문의 사항이 있는데 facebook에서 언급하신 데이터셋 중에서 혹시 '웹소셜 데이터'가 정확히 어떤 데이터를 의미하는지 알 수 있을까요..?

bidirectional training in GPT2

thanks for the sharing and code. May I ask you if there is method to fine tune the pre-trained GPT2 in bidirectional training or hybrid (uni- and bidirectional training together)? Thanks for any tips.

Training spec #2

Could you share other details of the training results in the comment of the issue which has loss of 3.2398 ?

#6 (comment)

For example, the things such as scheduler, optimizer beta1, beta2, dropout probability, gradient clipping, learning rate, warmup step, layer normalization, etc.

I just know about some training tips for parameter configuration.

Thank you.

Training spec

I have question about training spec of your model. I want to know about sequence length, batch size, training time, GPU type, # of GPU, # of training samples, and loss
You looks like acquire 3.7 loss. Could you describe the parameter of training to acquire those performance?

GPT2/src/gpt2/train_model.py

Line 93 in 71ebf91

def add_subparser(subparsers: argparse._SubParsersAction):

Are these parameters used to get the loss ?

Multi GPU mode is stuck at the beginning

hi @affjljoo3581
Thank you very much for your work
when I run for demo,it stuck. but no --gpus it works well [only on my first gpu]
[root@gpu02]:~/kb/src# python -m gpt2 train --train_corpus ../build/corpus.train.txt \

                 --eval_corpus            ../build/corpus.test.txt \
                 --vocab_path             ../build/vocab.txt \
                 --dims                   1024 \
                 --batch_train            128 \
                 --batch_eval             128 \
                 --seq_len                64 \
                 --total_steps            3000 \
                 --eval_steps             500 \
                 --save_steps             3000 \
                 --gpus                   4

                 --save_checkpoint_path   ckpt-gpt2.pth \
                 --save_model_path        gpt2-pretrained.pth

Train GPT-2 model: 0%| | 0/3000 [00:00<?, ?it/s]
How to fix it so that the program goes on？

Activaiton Function

From the paper Improving Language Understanding by Generative Pre-Training (GPT-2), it says that gelu was used as an activation function.
Is there any activation function used in the code?

Also, can you tell me the reason of adding Swish?

Confusions on Usage

Hi! I'm new to gpt2 and also this project. Thanks for sharing this awesome project! I got problems when I want to run the code following the usage section.

After preparing datasets, you can train GPT-2 by using as follows:
$ python -m gpt2 train --train_corpus build/corpus.train.txt \ ...

Here you use gpt2 as a python module which is not metioned in previous usage section. I want to know what I can do to run this code and pretrain the gpt2 model. Looking forward to your reply!

Which kind of tokenizer do you use? It looks like WordPiece, not BPE.

OpenAI's GPT-2 implementation uses BPE to make tokenizer, which needs 2 files: one is a .json file contains vocabulary, another is a .txt file contains merges.
Your implementation only uses one vocab.txt file, and some vocabulary may start with '##', which implys from your tokenization.py.
So do you use WordPiece not BPE?
(not English native speaker, sorry for my poor English...)

affjljoo3581 / gpt2 Goto Github PK

gpt2's People

Contributors

Stargazers

Watchers

Forkers

gpt2's Issues

Is Apex useful for GPT-2?

Dataset에 대해 문의 드립니다.

bidirectional training in GPT2

Training spec #2

Training spec

Multi GPU mode is stuck at the beginning

Activaiton Function

Confusions on Usage

Which kind of tokenizer do you use? It looks like WordPiece, not BPE.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent