System Info <div class="highlight highlight-source-shell notranslate position-re

Not sure if we support TorchKeras models. Does <code

Multi GPU process stuck in the dataloader loop, single GPU works fine. about accelerate HOT 7 CLOSED

SHEN2BAIYI commented on June 2, 2024

Multi GPU process stuck in the dataloader loop, single GPU works fine.

from accelerate.

Comments (7)

SHEN2BAIYI commented on June 2, 2024 1

Why are we combining accelerate then? I'm not sure those two can exist at the same time. This feels like a torchkeras issue more than accelerate.

Thanks, I find the solution...This is a bug of Pytorch. In 'torch.utils.data.sampler', we must use "torch.randperm" method, which does not work in a multiprocess environment when passed a value greater than 2**15. So I reduce the size of my dataset, and it works!!!

from accelerate.

SHEN2BAIYI commented on June 2, 2024

When I use debug mode of Pycharm, I find that the procedure will stuck in the dataloader loop.

from accelerate.

muellerzr commented on June 2, 2024

You may need to set the env variable inside the training function. Can you share your full notebook code?

from accelerate.

SHEN2BAIYI commented on June 2, 2024

You may need to set the env variable inside the training function. Can you share your full notebook code?

Thank you for your prompt reply, it doesn't work as well. And the code above is all I have. I'm trying to standardize my code with torchkeras.

from accelerate.

muellerzr commented on June 2, 2024

Not sure if we support TorchKeras models. Does isinstance(model, torch.nn.Module) return True?

from accelerate.

SHEN2BAIYI commented on June 2, 2024

isinstance(model, torch.nn.Module)

yes, it returns Ture. And I think we can think of torchkeras as a normal training program.

from accelerate.

muellerzr commented on June 2, 2024

Why are we combining accelerate then? I'm not sure those two can exist at the same time. This feels like a torchkeras issue more than accelerate.

from accelerate.

Multi GPU process stuck in the dataloader loop, single GPU works fine. about accelerate HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent