Giter Club home page Giter Club logo

Comments (3)

rasbt avatar rasbt commented on September 22, 2024

Thanks for reporting, and hm, yes, this is weird. I can reproduce it:

Pretraining

litgpt pretrain \
   --model_name pythia-14m \
   --tokenizer_dir checkpoints/EleutherAI/pythia-14m \
   --out_dir my_test_dir \
   --data TextFiles \
   --data.train_data_path custom_pretraining_data \
   --train.max_tokens 10_000
...
Seed set to 42
Time to instantiate model: 0.13 seconds.
Total parameters: 14,067,712
Validating ...
Measured TFLOPs: 0.10
Saving checkpoint to '/teamspace/studios/this_studio/my_test_dir/final/lit_model.pth'
Training time: 24.14s
Memory used: 1.44 GB

Continued Pretraining

litgpt pretrain \
   --model_name pythia-14m \
   --tokenizer_dir checkpoints/EleutherAI/pythia-14m \
   --out_dir my_test_dir_2 \
   --data TextFiles \
   --data.train_data_path custom_pretraining_data \
   --train.max_tokens 10_000 \
   --initial_checkpoint_dir /teamspace/studios/this_studio/my_test_dir/final/
RuntimeError: Error(s) in loading state_dict for GPT:
        Missing key(s) in state_dict: "lm_head.weight", "transformer.wte.weight", "transformer.h.0.norm_1.weight", "transformer.h.0.norm_1.bias", "transformer.h.0.attn.attn.weight", "transformer.h.0.attn.attn.bias", "transformer.h.0.attn.proj.weight", "transformer.h.0.attn.proj.bias", "transformer.h.0.norm_2.weight", "transformer.h.0.norm_2.bias", "transformer.h.0.mlp.fc.weight",
...

ls  /teamspace/studios/this_studio/my_test_dir/final

config.json  generation_config.json  hyperparameters.yaml  lit_model.pth  model_config.yaml  tokenizer.json  tokenizer_config.json

It did work a few months ago when I tested this for the tutorials and don't have a good explanation at the moment why it would fail. Either I am doing something incorrectly above, or there could be something that has recently changed that's causing this. I will have to think more about this ...

Have you seen this before @awaelchli or @carmocca ?

Finetuning

Finetuning seems to work fine for me though

litgpt finetune full \
    --checkpoint_dir /teamspace/studios/this_studio/my_test_dir/final \
    --train.max_seq_length 64 \
    --train.max_steps 5
...
Epoch 1 | iter 73 step 4 | loss train: 10.978, val: n/a | iter time: 15.70 ms
Epoch 1 | iter 74 step 4 | loss train: 10.972, val: n/a | iter time: 15.56 ms
Epoch 1 | iter 75 step 4 | loss train: 10.967, val: n/a | iter time: 15.70 ms
Epoch 1 | iter 76 step 4 | loss train: 10.960, val: n/a | iter time: 16.08 ms
Epoch 1 | iter 77 step 4 | loss train: 10.961, val: n/a | iter time: 16.31 ms
Epoch 1 | iter 78 step 4 | loss train: 10.957, val: n/a | iter time: 16.12 ms
Epoch 1 | iter 79 step 4 | loss train: 10.944, val: n/a | iter time: 15.83 ms
Epoch 1 | iter 80 step 5 | loss train: 10.931, val: n/a | iter time: 18.52 ms (step)
Training time: 20.99s
Memory used: 0.31 GB

So, I am thinking the generated checkpoint file is fine, it's more like something when loading the checkpoint in the pretraining script.

from litgpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.