affjljoo3581 / gpt2 Goto Github PK
View Code? Open in Web Editor NEWPyTorch Implementation of OpenAI GPT-2
License: Apache License 2.0
PyTorch Implementation of OpenAI GPT-2
License: Apache License 2.0
hi, Is there a reduction in the size of the GPT-2 model when using Apex, is the inference speed of the model faster?
안녕하세요. 우선 한국어 GPT2 pretrained 모델을 공개해주셔서 정말 감사합니다. 정말 멋진 프로젝트입니다. 한 가지 문의 사항이 있는데 facebook에서 언급하신 데이터셋 중에서 혹시 '웹소셜 데이터'가 정확히 어떤 데이터를 의미하는지 알 수 있을까요..?
thanks for the sharing and code. May I ask you if there is method to fine tune the pre-trained GPT2 in bidirectional training or hybrid (uni- and bidirectional training together)? Thanks for any tips.
Could you share other details of the training results in the comment of the issue which has loss of 3.2398 ?
For example, the things such as scheduler, optimizer beta1, beta2, dropout probability, gradient clipping, learning rate, warmup step, layer normalization, etc.
I just know about some training tips for parameter configuration.
Thank you.
I have question about training spec of your model. I want to know about sequence length, batch size, training time, GPU type, # of GPU, # of training samples, and loss
You looks like acquire 3.7 loss. Could you describe the parameter of training to acquire those performance?
Line 93 in 71ebf91
Are these parameters used to get the loss ?
hi @affjljoo3581
Thank you very much for your work
when I run for demo,it stuck. but no --gpus it works well [only on my first gpu]
[root@gpu02]:~/kb/src# python -m gpt2 train --train_corpus ../build/corpus.train.txt \
--eval_corpus ../build/corpus.test.txt \ --vocab_path ../build/vocab.txt \ --dims 1024 \ --batch_train 128 \ --batch_eval 128 \ --seq_len 64 \ --total_steps 3000 \ --eval_steps 500 \ --save_steps 3000 \ --gpus 4
--save_checkpoint_path ckpt-gpt2.pth \
--save_model_path gpt2-pretrained.pth
Train GPT-2 model: 0%| | 0/3000 [00:00<?, ?it/s]
How to fix it so that the program goes on?
From the paper Improving Language Understanding by Generative Pre-Training (GPT-2), it says that gelu was used as an activation function.
Is there any activation function used in the code?
Also, can you tell me the reason of adding Swish?
Hi! I'm new to gpt2 and also this project. Thanks for sharing this awesome project! I got problems when I want to run the code following the usage section.
After preparing datasets, you can train GPT-2 by using as follows:
$ python -m gpt2 train --train_corpus build/corpus.train.txt \ ...
Here you use gpt2 as a python module which is not metioned in previous usage section. I want to know what I can do to run this code and pretrain the gpt2 model. Looking forward to your reply!
OpenAI's GPT-2 implementation uses BPE to make tokenizer, which needs 2 files: one is a .json file contains vocabulary, another is a .txt file contains merges.
Your implementation only uses one vocab.txt
file, and some vocabulary may start with '##', which implys from your tokenization.py
.
So do you use WordPiece not BPE?
(not English native speaker, sorry for my poor English...)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.