I run into OOM error with default setup on 8*A100 with train.sh , could you plea

What the recommended GPU setup for fine-tuning ? about yarn HOT 5 CLOSED

jquesnelle commented on July 20, 2024

What the recommended GPU setup for fine-tuning ?

from yarn.

Comments (5)

sadransh commented on July 20, 2024 2

Could you please clarify if this discussion is around full parameter tuning or lora based? @bloc97

from yarn.

fyang7 commented on July 20, 2024 1

Thanks a lot. To confirm, A100 is with 40G or 80G memory for 7b 64k fine-tuning?

from yarn.

jquesnelle commented on July 20, 2024

We were able to train the 7b 64k model on an 8x A100 node -- all other models unfortunately require a multinode setup. We used 64 GPUs, but I expect 16 would suffice for all other models (7b 128k, 13b 64k, 13b 128k)

from yarn.

bloc97 commented on July 20, 2024

It is 8x80GB for 64k context size

from yarn.

YL-9 commented on July 20, 2024

We were able to train the 7b 64k model on an 8x A100 node -- all other models unfortunately require a multinode setup. We used 64 GPUs, but I expect 16 would suffice for all other models (7b 128k, 13b 64k, 13b 128k)

I ran finetune.py using 2x A100 GPUs, and both GPUs loaded up to 14g/80g. After processing the first batch, the memory usage went up to 77g/80g, and then it ran OOM when starting the second batch.
Is this situation normal?

from yarn.

What the recommended GPU setup for fine-tuning ? about yarn HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent