Giter Club home page Giter Club logo

gpt2's People

Contributors

affjljoo3581 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gpt2's Issues

Is Apex useful for GPT-2?

hi, Is there a reduction in the size of the GPT-2 model when using Apex, is the inference speed of the model faster?

Dataset에 대해 문의 드립니다.

안녕하세요. 우선 한국어 GPT2 pretrained 모델을 공개해주셔서 정말 감사합니다. 정말 멋진 프로젝트입니다. 한 가지 문의 사항이 있는데 facebook에서 언급하신 데이터셋 중에서 혹시 '웹소셜 데이터'가 정확히 어떤 데이터를 의미하는지 알 수 있을까요..?

bidirectional training in GPT2

thanks for the sharing and code. May I ask you if there is method to fine tune the pre-trained GPT2 in bidirectional training or hybrid (uni- and bidirectional training together)? Thanks for any tips.

Training spec #2

Could you share other details of the training results in the comment of the issue which has loss of 3.2398 ?

For example, the things such as scheduler, optimizer beta1, beta2, dropout probability, gradient clipping, learning rate, warmup step, layer normalization, etc.

I just know about some training tips for parameter configuration.

Thank you.

Training spec

I have question about training spec of your model. I want to know about sequence length, batch size, training time, GPU type, # of GPU, # of training samples, and loss
You looks like acquire 3.7 loss. Could you describe the parameter of training to acquire those performance?

def add_subparser(subparsers: argparse._SubParsersAction):

Are these parameters used to get the loss ?

Multi GPU mode is stuck at the beginning

hi @affjljoo3581
Thank you very much for your work
when I run for demo,it stuck. but no --gpus it works well [only on my first gpu]
[root@gpu02]:~/kb/src# python -m gpt2 train --train_corpus ../build/corpus.train.txt \

                 --eval_corpus            ../build/corpus.test.txt \
                 --vocab_path             ../build/vocab.txt \
                 --dims                   1024 \
                 --batch_train            128 \
                 --batch_eval             128 \
                 --seq_len                64 \
                 --total_steps            3000 \
                 --eval_steps             500 \
                 --save_steps             3000 \
                 --gpus                   4
                 --save_checkpoint_path   ckpt-gpt2.pth \
                 --save_model_path        gpt2-pretrained.pth

Train GPT-2 model: 0%| | 0/3000 [00:00<?, ?it/s]
How to fix it so that the program goes on?

Activaiton Function

From the paper Improving Language Understanding by Generative Pre-Training (GPT-2), it says that gelu was used as an activation function.
Is there any activation function used in the code?

Also, can you tell me the reason of adding Swish?

Confusions on Usage

Hi! I'm new to gpt2 and also this project. Thanks for sharing this awesome project! I got problems when I want to run the code following the usage section.

After preparing datasets, you can train GPT-2 by using as follows:
$ python -m gpt2 train --train_corpus build/corpus.train.txt \ ...

Here you use gpt2 as a python module which is not metioned in previous usage section. I want to know what I can do to run this code and pretrain the gpt2 model. Looking forward to your reply!

Which kind of tokenizer do you use? It looks like WordPiece, not BPE.

OpenAI's GPT-2 implementation uses BPE to make tokenizer, which needs 2 files: one is a .json file contains vocabulary, another is a .txt file contains merges.
Your implementation only uses one vocab.txt file, and some vocabulary may start with '##', which implys from your tokenization.py.
So do you use WordPiece not BPE?
(not English native speaker, sorry for my poor English...)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.