python main.py ucf101 RGB ucf101_trainlist01new.txt ucf101_testlist01new.txt --gpus 1

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

dataloader runtime errors about tsn-pytorch HOT 20 CLOSED

yjxiong commented on July 21, 2024 1

dataloader runtime errors

from tsn-pytorch.

Comments (20)

ntuyt commented on July 21, 2024 2

IF I set workers = 0, this error will disappear.

from tsn-pytorch.

yjxiong commented on July 21, 2024

Are you using the latest version of PyTorch?

from tsn-pytorch.

ntuyt commented on July 21, 2024

Yes, I use the latest version.

…

________________________________ From: yjxiong <[email protected]> Sent: Tuesday, October 3, 2017 11:02:12 PM To: yjxiong/tsn-pytorch Cc: #YU TAN#; Author Subject: Re: [yjxiong/tsn-pytorch] dataloader runtime errors (#17) Are you using the latest version of PyTorch? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#17 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AOIcLrtKqyZwzHLVc12IWvTrGV1nn3Nrks5sokx0gaJpZM4PqGqz>.

from tsn-pytorch.

yjxiong commented on July 21, 2024

You only have one GPU, use -j 1. A too high -j number or a too small GPU men size may cause this error.

from tsn-pytorch.

yjxiong commented on July 21, 2024

And you are using a Batchsize of 128 with only 1 GPU. I don’t think this is feasible. It typically needs 4 GPUs.

from tsn-pytorch.

ntuyt commented on July 21, 2024

@yjxiong Actually I have 4 GPUs, if I set j=1, it will stuck there. Only if j=0 works.

from tsn-pytorch.

yjxiong commented on July 21, 2024

I have explained why your case failed. Try use 4 GPUs instead.

from tsn-pytorch.

ntuyt commented on July 21, 2024

I have removed the --gpus option, it will automatically use 4 GPUs. It does not help.

from tsn-pytorch.

yjxiong commented on July 21, 2024

What is the memory size of your GPUs? If setting -j 4 or lower won’t work, I don’t know what’s the error then. We run this setting on 4 Titan X without problem.

from tsn-pytorch.

ntuyt commented on July 21, 2024

Memory size is not a problem. Since I have tested the case when batch size =1. It will cost very little GPU memory but the problem still exists.

…

________________________________ From: yjxiong <[email protected]> Sent: Tuesday, October 3, 2017 11:26:48 PM To: yjxiong/tsn-pytorch Cc: #YU TAN#; Author Subject: Re: [yjxiong/tsn-pytorch] dataloader runtime errors (#17) What is the memory size of your GPUs? If setting -j 4 or lower won’t work, I don’t know what’s the error then. We run this setting on 4 Titan X without problem. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#17 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AOIcLp4vg5sIB8Y-eHRwLtacz0HMWAGlks5solI4gaJpZM4PqGqz>.

from tsn-pytorch.

yjxiong commented on July 21, 2024

Then I don’t have any instant idea for this. Seeing from the log, it says multiprocessing lib in the dataloaders cannot open/write to some kind of shared momery for communication between processes. It may be a problem in your permission setting or os, which I cannot identify on my side.

from tsn-pytorch.

ntuyt commented on July 21, 2024

Okay. Thanks for your reply anyway. I can run your code by setting j=0 anyway.

…

________________________________ From: yjxiong <[email protected]> Sent: Tuesday, October 3, 2017 11:35:31 PM To: yjxiong/tsn-pytorch Cc: #YU TAN#; Author Subject: Re: [yjxiong/tsn-pytorch] dataloader runtime errors (#17) Then I don’t have any instant idea for this. Seeing from the log, it says multiprocessing lib in the dataloaders cannot open/write to some kind of shared momery for communication between processes. It may be a problem in your permission setting or os, which I cannot identify on my side. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#17 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AOIcLmwSuUfNe4LFgv9CC75LcbBdHaNMks5solRDgaJpZM4PqGqz>.

from tsn-pytorch.

yjxiong commented on July 21, 2024

I don’t think this is a good choice tbh. It wastes too much time for data loading.

I’d rather suggest you either figure out the problem or go with Caffe TSN instead. I don’t want you to blame the code for running super slow at the end of the day.

from tsn-pytorch.

ntuyt commented on July 21, 2024

Okay. Thanks so much for your advices.

…

-------- 原始信息 -------- 由： yjxiong <[email protected]> 日期: 17/10/3 23:48 (GMT+08:00) 收件人： yjxiong/tsn-pytorch <[email protected]> 抄送： #YU TAN# <[email protected]>, Author <[email protected]> 主题： Re: [yjxiong/tsn-pytorch] dataloader runtime errors (#17) I don’t think this is a good choice tbh. It wastes too much time for data loading. I’d rather suggest you either figure out the problem or go with Caffe TSN instead. I don’t want you to blame the code for running super slow at the end of the day. ― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#17 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AOIcLpasau_AVVBaTrd8nYV9SjKHjSSRks5soldZgaJpZM4PqGqz>.

from tsn-pytorch.

ntuyt commented on July 21, 2024

I find it is caused by lack of shared memory.
I have increased the shared memory size and it can be trained using 8 workers!

from tsn-pytorch.

yjxiong commented on July 21, 2024

Good to know. This should lead to reasonable training speed.

from tsn-pytorch.

SmartPorridge commented on July 21, 2024

I met the same probles as you. Could you explain carefully how to solve this problem? I have no idea about the "lack of shared memory."
thank you

from tsn-pytorch.

ntuyt commented on July 21, 2024

@JiqiangZhou Please type in this command "df –k /dev/shm".
You will see the shared memory size in your PC or server.
Then if it is small, you will need to enlarge it.

from tsn-pytorch.

SmartPorridge commented on July 21, 2024

@ntuyt available is only 65536, I think it's too small! Iwill try to increase it. Thank you.

from tsn-pytorch.

SmartPorridge commented on July 21, 2024

I tried many times and the space of /dev/shm is at least 35GB.

from tsn-pytorch.

dataloader runtime errors about tsn-pytorch HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent