System Info <div class="highlight highlight-source-shell notranslate position-re

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Performance on single GPU is much better than on Multi-GPUs about accelerate HOT 3 OPEN

baicenxiao commented on June 2, 2024

Performance on single GPU is much better than on Multi-GPUs

from accelerate.

Comments (3)

baicenxiao commented on June 2, 2024 1

Hi @muellerzr, thanks for the response!

For the experiments above, I have already disabled the learning rate scheduler.

In addition, I have tried adjust the learning rate according to learning_rate *= accelerator.num_processes given in the official performance guideline. I still see a significant difference in the training performance.

FYI, here is the result after using learning_rate *= 4 when training with 4 GPUs:

(shadow) bxiao@ip-10-45-101-134:/sensei-fs/users/bxiao/test_multiGPUs$ accelerate launch --config_file config.yaml ./cv_example.py --data_dir ./images
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
0.17.1
0.17.1
0.17.1
0.17.1
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1478/1478 [00:35<00:00, 41.07it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 370/370 [00:10<00:00, 35.43it/s]
epoch 0: 75.24
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1478/1478 [00:34<00:00, 42.35it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 370/370 [00:10<00:00, 36.49it/s]
epoch 1: 76.52
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1478/1478 [00:34<00:00, 42.67it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 370/370 [00:10<00:00, 35.66it/s]
epoch 2: 77.33

from accelerate.

muellerzr commented on June 2, 2024

Have you also tried scaling the learning rate according to the multiple GPUs? (What I mean by this is in multi-GPU the scheduler is stepped N times, which could account for some of this)

from accelerate.

muellerzr commented on June 2, 2024

Thanks, let me try running this today and see what happens

from accelerate.

Performance on single GPU is much better than on Multi-GPUs about accelerate HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent