I fine-tuned llama3-8b with Lora and followed the tutorial in the repository to conver

Failed to load the finetuned model with `AutoModelForCausalLM.from_pretrained(name, state_dict=state_dict)` about litgpt HOT 4 OPEN

zhaosheng-thu commented on May 28, 2024

Failed to load the finetuned model with `AutoModelForCausalLM.from_pretrained(name, state_dict=state_dict)`

from litgpt.

Comments (4)

zhaosheng-thu commented on May 28, 2024

I can load the weight using the model.load_state_dict(), and then everything will go smoothly, but I really want to know why from_pretrained(state_dict=state_dict) can't work.

from litgpt.

rasbt commented on May 28, 2024

Thanks for raising that. Maybe it's a HF thing. I will have to investigate.

from litgpt.

rasbt commented on May 28, 2024

I could not reproduce it for another model yet when I gave it a quick try.

I am not sure if it's related because the differences are so big, but I wonder ~~what the precision of the tensors in your current state dict are. Could you print the precision of the state dict, and~~ could you also try to load it without torch_dtype=torch.float16?

EDIT: Nevermind, I can see that the precision is bfloat16 in your screenshot.

from litgpt.

rasbt commented on May 28, 2024

I tried this also with Llama 3 and it seemed to work fine for me there as well. Here are my steps:

litgpt download --repo_id meta-llama/Meta-Llama-3-8B-Instruct --access_token ...


litgpt finetune \
    --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct \
    --out_dir my_llama_model \
    --train.max_steps 1 \
    --eval.max_iter 1

litgpt convert from_litgpt \
    --checkpoint_dir my_llama_model/final \
    --output_dir out/converted_llama_model/

And then in a python session:

and

from litgpt.

Recommend Projects

Failed to load the finetuned model with `AutoModelForCausalLM.from_pretrained(name, state_dict=state_dict)` about litgpt HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent