Hello, can you share your machines and training cost time on waymo? Such as <code clas

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Question about training time on waymo about voxelnext HOT 12 CLOSED

dvlab-research commented on May 31, 2024

Question about training time on waymo

from voxelnext.

Comments (12)

sky-fly97 commented on May 31, 2024

I have did experiments on 4 A100 and 20% waymo data, it seems that the k3 version voxelnext_ioubranch_large.yaml consumes twice as much time as the 2D version voxelnext2d_ioubranch.yaml. The former is 10hour, the latter is 4.5hour

from voxelnext.

yukang2017 commented on May 31, 2024

Hi @sky-fly97 ,

My machine (4 GPUs V100) is very very old... Its CPU for loading data really drives me crazy. It took me about 5 days to train voxelnext2d_ioubranch.yaml and 1 week to train voxelnext_ioubranch_large.yaml.

Regards,
Yukang Chen

from voxelnext.

sky-fly97 commented on May 31, 2024

Thanks~

from voxelnext.

seonhoon1002 commented on May 31, 2024

Hi @sky-fly97 ,

My machine (4 GPUs V100) is very very old... Its CPU for loading data really drives me crazy. It took me about 5 days to train voxelnext2d_ioubranch.yaml and 1 week to train voxelnext_ioubranch_large.yaml.

Regards, Yukang Chen

Can I ask how many epochs you set when you train on 20% waymo dataset.
Because In my case Its converge time is so long and performance is nan when I use 20% waymo dataset

from voxelnext.

yukang2017 commented on May 31, 2024

Hi,

In my 20% training, I used the exactly same settings to the full dataset, both 12 epochs (8 GPUs).
It is weird. The training results for 100% and 20% data should not have such gap.

Regards,
Yukang Chen

from voxelnext.

seonhoon1002 commented on May 31, 2024

Hello,

I solved the problem.

I actually reimplemented your code based on mmdetection3D.

VoxelNeXt/pcdet/utils/loss_utils.py

Line 421 in b5b7d39

class FocalLossSparse(nn.Module):

This class might have some problems because this Loss class has no function of splitting batches compared with other Loss classes such as RegLossSparse or something else.

I think I didn't find a similar function for that in your code.

Finally, I got below performances in Waymo Datasets based on your "voxelnext_ioubranch_large.yaml"

Vehicle/L2 mAPH: 0.6657
Pedestrian/L2 mAPH: 0.6599,
Cyclist/L2 mAPH: 0.7042

from voxelnext.

csinsgcc commented on May 31, 2024

Hello,

I solved the problem.

I actually reimplemented your code based on mmdetection3D.

VoxelNeXt/pcdet/utils/loss_utils.py

Line 421 in b5b7d39

class FocalLossSparse(nn.Module):

This class might have some problems because this Loss class has no function of splitting batches compared with other Loss classes such as RegLossSparse or something else.

I think I didn't find a similar function for that in your code.

Finally, I got below performances in Waymo Datasets based on your "voxelnext_ioubranch_large.yaml"

Vehicle/L2 mAPH: 0.6657 Pedestrian/L2 mAPH: 0.6599, Cyclist/L2 mAPH: 0.7042

Hi,

I have also been using mmdetection3d to reproduce recently, and I have also encountered a problem with long convergence time.

I have analyzed the loss and guess it may be a problem with focal loss.

May I ask how you resolved the problem with focal loss?

from voxelnext.

seonhoon1002 commented on May 31, 2024

Hello,
I solved the problem.
I actually reimplemented your code based on mmdetection3D.

VoxelNeXt/pcdet/utils/loss_utils.py

Line 421 in b5b7d39

class FocalLossSparse(nn.Module):

This class might have some problems because this Loss class has no function of splitting batches compared with other Loss classes such as RegLossSparse or something else.
I think I didn't find a similar function for that in your code.
Finally, I got below performances in Waymo Datasets based on your "voxelnext_ioubranch_large.yaml"
Vehicle/L2 mAPH: 0.6657 Pedestrian/L2 mAPH: 0.6599, Cyclist/L2 mAPH: 0.7042

Hi,

I have also been using mmdetection3d to reproduce recently, and I have also encountered a problem with long convergence time.

I have analyzed the loss and guess it may be a problem with focal loss.

May I ask how you resolved the problem with focal loss?

Hello,

In my perspective, I think your problem is not caused by focal loss

In VoxelNeXt, to make targets, they use "for loop" for all ground truth

and It gets worse when you use multi-task head groups such as ArgoverseV2, because we should make targets for each task per ground truth.

I tried to make a faster targeting module, but this repository uses Gaussian Focal Loss and nearest assignments, so I can't resolve the convergence problem. Because I don't know how avoid "for loop" when I implement GFL and nearest assignment

Maybe someday, If someone makes "cuda version" targeting module, we can make faster training convergence.

Good luck

from voxelnext.

csinsgcc commented on May 31, 2024

Hello,
I solved the problem.
I actually reimplemented your code based on mmdetection3D.

VoxelNeXt/pcdet/utils/loss_utils.py

Line 421 in b5b7d39

class FocalLossSparse(nn.Module):

This class might have some problems because this Loss class has no function of splitting batches compared with other Loss classes such as RegLossSparse or something else.
I think I didn't find a similar function for that in your code.
Finally, I got below performances in Waymo Datasets based on your "voxelnext_ioubranch_large.yaml"
Vehicle/L2 mAPH: 0.6657 Pedestrian/L2 mAPH: 0.6599, Cyclist/L2 mAPH: 0.7042

Hi,
I have also been using mmdetection3d to reproduce recently, and I have also encountered a problem with long convergence time.
I have analyzed the loss and guess it may be a problem with focal loss.
May I ask how you resolved the problem with focal loss?

Hello,

In my perspective, I think your problem is not caused by focal loss

In VoxelNeXt, to make targets, they use "for loop" for all ground truth

and It gets worse when you use multi-task head groups such as ArgoverseV2, because we should make targets for each task per ground truth.

I tried to make a faster targeting module, but this repository uses Gaussian Focal Loss and nearest assignments, so I can't resolve the convergence problem. Because I don't know how avoid "for loop" when I implement GFL and nearest assignment

Maybe someday, If someone makes "cuda version" targeting module, we can make faster training convergence.

Good luck

Thank you very much for your detailed reply~

from voxelnext.

Rzlnmc commented on May 31, 2024

I have did experiments on 4 A100 and 20% waymo data, it seems that the k3 version voxelnext_ioubranch_large.yaml consumes twice as much time as the 2D version voxelnext2d_ioubranch.yaml. The former is 10hour, the latter is 4.5hour

Hi!

I also have a question regarding evaluation cost time.
Could you please let me know approximately how much time you spent on evaluation and generating the final results? In my training, the time spent on evaluation seems normal, but the time spent on generating the final results is several times longer than the evaluation time.
I would like to understand if this is a normal occurrence.

Thanks.

from voxelnext.

seonhoon1002 commented on May 31, 2024

currence.

Thanks.

Hello,

Unfortunately, I can't remember the exact evaluation cost time. It's a too long time...

But it is weird that the final results are several times longer than the evaluation time.

I'm sorry I can't help.

Good luck!

from voxelnext.

Rzlnmc commented on May 31, 2024

currence.
Thanks.

Hello,

Unfortunately, I can't remember the exact evaluation cost time. It's a too long time...

But it is weird that the final results are several times longer than the evaluation time.

I'm sorry I can't help.

Good luck!

Thank you for your prompt response!

I am using the official configuration files and code from openpcdet for training, but the time spent on generating results is four times longer than the validation time.

I have found some answers, but they haven't resolved the issue.

Anyway, thank you very much.

from voxelnext.

Question about training time on waymo about voxelnext HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent