Comments (3)
Thanks for reporting, and hm, yes, this is weird. I can reproduce it:
Pretraining
litgpt pretrain \
--model_name pythia-14m \
--tokenizer_dir checkpoints/EleutherAI/pythia-14m \
--out_dir my_test_dir \
--data TextFiles \
--data.train_data_path custom_pretraining_data \
--train.max_tokens 10_000
...
Seed set to 42
Time to instantiate model: 0.13 seconds.
Total parameters: 14,067,712
Validating ...
Measured TFLOPs: 0.10
Saving checkpoint to '/teamspace/studios/this_studio/my_test_dir/final/lit_model.pth'
Training time: 24.14s
Memory used: 1.44 GB
Continued Pretraining
litgpt pretrain \
--model_name pythia-14m \
--tokenizer_dir checkpoints/EleutherAI/pythia-14m \
--out_dir my_test_dir_2 \
--data TextFiles \
--data.train_data_path custom_pretraining_data \
--train.max_tokens 10_000 \
--initial_checkpoint_dir /teamspace/studios/this_studio/my_test_dir/final/
RuntimeError: Error(s) in loading state_dict for GPT:
Missing key(s) in state_dict: "lm_head.weight", "transformer.wte.weight", "transformer.h.0.norm_1.weight", "transformer.h.0.norm_1.bias", "transformer.h.0.attn.attn.weight", "transformer.h.0.attn.attn.bias", "transformer.h.0.attn.proj.weight", "transformer.h.0.attn.proj.bias", "transformer.h.0.norm_2.weight", "transformer.h.0.norm_2.bias", "transformer.h.0.mlp.fc.weight",
...
ls /teamspace/studios/this_studio/my_test_dir/final
config.json generation_config.json hyperparameters.yaml lit_model.pth model_config.yaml tokenizer.json tokenizer_config.json
It did work a few months ago when I tested this for the tutorials and don't have a good explanation at the moment why it would fail. Either I am doing something incorrectly above, or there could be something that has recently changed that's causing this. I will have to think more about this ...
Have you seen this before @awaelchli or @carmocca ?
Finetuning
Finetuning seems to work fine for me though
litgpt finetune full \
--checkpoint_dir /teamspace/studios/this_studio/my_test_dir/final \
--train.max_seq_length 64 \
--train.max_steps 5
...
Epoch 1 | iter 73 step 4 | loss train: 10.978, val: n/a | iter time: 15.70 ms
Epoch 1 | iter 74 step 4 | loss train: 10.972, val: n/a | iter time: 15.56 ms
Epoch 1 | iter 75 step 4 | loss train: 10.967, val: n/a | iter time: 15.70 ms
Epoch 1 | iter 76 step 4 | loss train: 10.960, val: n/a | iter time: 16.08 ms
Epoch 1 | iter 77 step 4 | loss train: 10.961, val: n/a | iter time: 16.31 ms
Epoch 1 | iter 78 step 4 | loss train: 10.957, val: n/a | iter time: 16.12 ms
Epoch 1 | iter 79 step 4 | loss train: 10.944, val: n/a | iter time: 15.83 ms
Epoch 1 | iter 80 step 5 | loss train: 10.931, val: n/a | iter time: 18.52 ms (step)
Training time: 20.99s
Memory used: 0.31 GB
So, I am thinking the generated checkpoint file is fine, it's more like something when loading the checkpoint in the pretraining script.
from litgpt.
Related Issues (20)
- how to pretrain llama2? HOT 4
- Python API
- Stream option HOT 3
- Resolve output characters garbled HOT 4
- Continually pretrained Llama2-7B-hf model inference is not working on 16GB GPU machine HOT 5
- how to pretrain llama2 in custom data? HOT 1
- Is there any best practice for using litdata to load custom data for pretraining? HOT 1
- pretrain custom dataset gpu memory oom
- Create new CI API key HOT 1
- Some confusion about weight conversion, as I need to use other engineering to evaluate my LLM HOT 2
- Upgrade LitData
- validation output during finetuning HOT 3
- mistralai/Mistral-7B-v0.3 support HOT 4
- How to set max_iters HOT 5
- Specify cache for huggingface openwebtext download HOT 1
- Training lasts just 150 seconds for TinyLlama OpenWebtext dataset
- Mixtral 8x22B support HOT 4
- Using custom data for `Continue pretraining an LLM` HOT 4
- The difference between FSDPStrategy and DeepSpeedStrategy during pre-training
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from litgpt.