First of all, thank you for your great work ! ! ! Conduct the training s0 & s2

Question about training speed. about stcn HOT 14 CLOSED

hkchengrex commented on June 2, 2024

Question about training speed.

from stcn.

Comments (14)

hkchengrex commented on June 2, 2024

Can your dataloaders catch up? i.e. are the GPUs at (almost) full load all the time?
The reported training time is very rough (we used a mix of hardware at different times). We will re-train again and give a better estimate in the next revision of the paper.

In any case, it should take much less than 100h for s0 even with 2x 1080Ti. The most probable reason is dataloader bottleneck.

from stcn.

PinxueGuo commented on June 2, 2024

I think it's highly likely, cause my GPU sometimes far lower than 100%.
Can you give me some suggestions to solve it? Should I change OMP_NUM_THREADS (in command) or num_works (pytorch dataloader) bigger？
Thank you!

from stcn.

PinxueGuo commented on June 2, 2024

I find bigger num_works really speed things up in my case.

from stcn.

hkchengrex commented on June 2, 2024

I think the general wisdom is to use higher OMP_NUM_THREADS and num_workers when you have more free CPU cores available.
That's great to hear.

from stcn.

PinxueGuo commented on June 2, 2024

OK. More num_works exactly helps, and OMP_NUM_THREADS=4 (1,8,16 will be slower even) as your original setting is the fastest in my case.
Thank you for your great work and quick reply !

from stcn.

hkchengrex commented on June 2, 2024

BTW you can try adding the --benchmark flag.

from stcn.

PinxueGuo commented on June 2, 2024

Thank you, I tried it but not really effective.
And bigger num_workers only bring 10% speed improvement .
Could you tell me what's your time consuming in log "retrain_s0 - It ******* [TRAIN] [time ]: ？". In my case , time≈1.0+.

from stcn.

PinxueGuo commented on June 2, 2024

Sorry, it should be a 25% improvement about speed. (bigger num_workers=16, 1*3090, --nproc_per_node=1, bs=16).
log：retrain_s0 - It 51300 [TRAIN] [time ]: 1.0771173

from stcn.

hkchengrex commented on June 2, 2024

With 1x 3090 I am getting around 0.7 for [time].
2x 2080Ti should be faster than 1x 3090.

from stcn.

hkchengrex commented on June 2, 2024

Hmm, it's actually 0.7 around the start of training and stabilizes around 0.5.

from stcn.

PinxueGuo commented on June 2, 2024

I compared 22080ti with 13090, and result is 1×3090 is a little faster than 2×2080ti.
If the [time] is round 0.5, s0 need 45 hours right?
In my case, s3 exactly need 30 hours. So I wanna to confirm "Regular training without BL30K takes around 30 hours"(in paper), 30-hour is refer to s3 or s0+s3？

from stcn.

hkchengrex commented on June 2, 2024

It refers to s0+s3. I guess hardware infrastructure affects the training speed a lot.

from stcn.

PinxueGuo commented on June 2, 2024

Ok. Thank you !

from stcn.

zhouweii234 commented on June 2, 2024

May I ask the training time for stage 0 after you use bigger num_workers? (num_workers=16, 1*3090, --nproc_per_node=1, bs=16). @BWYWTB

from stcn.

Question about training speed. about stcn HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent