Giter Club home page Giter Club logo

Comments (12)

sky-fly97 avatar sky-fly97 commented on May 31, 2024

I have did experiments on 4 A100 and 20% waymo data, it seems that the k3 version voxelnext_ioubranch_large.yaml consumes twice as much time as the 2D version voxelnext2d_ioubranch.yaml. The former is 10hour, the latter is 4.5hour

from voxelnext.

yukang2017 avatar yukang2017 commented on May 31, 2024

Hi @sky-fly97 ,

My machine (4 GPUs V100) is very very old... Its CPU for loading data really drives me crazy. It took me about 5 days to train voxelnext2d_ioubranch.yaml and 1 week to train voxelnext_ioubranch_large.yaml.

Regards,
Yukang Chen

from voxelnext.

sky-fly97 avatar sky-fly97 commented on May 31, 2024

Thanks~

from voxelnext.

seonhoon1002 avatar seonhoon1002 commented on May 31, 2024

Hi @sky-fly97 ,

My machine (4 GPUs V100) is very very old... Its CPU for loading data really drives me crazy. It took me about 5 days to train voxelnext2d_ioubranch.yaml and 1 week to train voxelnext_ioubranch_large.yaml.

Regards, Yukang Chen

Can I ask how many epochs you set when you train on 20% waymo dataset.
Because In my case Its converge time is so long and performance is nan when I use 20% waymo dataset

from voxelnext.

yukang2017 avatar yukang2017 commented on May 31, 2024

Hi,

In my 20% training, I used the exactly same settings to the full dataset, both 12 epochs (8 GPUs).
It is weird. The training results for 100% and 20% data should not have such gap.

Regards,
Yukang Chen

from voxelnext.

seonhoon1002 avatar seonhoon1002 commented on May 31, 2024

Hello,

I solved the problem.

I actually reimplemented your code based on mmdetection3D.

class FocalLossSparse(nn.Module):

This class might have some problems because this Loss class has no function of splitting batches compared with other Loss classes such as RegLossSparse or something else.

I think I didn't find a similar function for that in your code.

Finally, I got below performances in Waymo Datasets based on your "voxelnext_ioubranch_large.yaml"

Vehicle/L2 mAPH: 0.6657
Pedestrian/L2 mAPH: 0.6599,
Cyclist/L2 mAPH: 0.7042

from voxelnext.

csinsgcc avatar csinsgcc commented on May 31, 2024

Hello,

I solved the problem.

I actually reimplemented your code based on mmdetection3D.

class FocalLossSparse(nn.Module):

This class might have some problems because this Loss class has no function of splitting batches compared with other Loss classes such as RegLossSparse or something else.

I think I didn't find a similar function for that in your code.

Finally, I got below performances in Waymo Datasets based on your "voxelnext_ioubranch_large.yaml"

Vehicle/L2 mAPH: 0.6657 Pedestrian/L2 mAPH: 0.6599, Cyclist/L2 mAPH: 0.7042

Hi,

I have also been using mmdetection3d to reproduce recently, and I have also encountered a problem with long convergence time.

I have analyzed the loss and guess it may be a problem with focal loss.

May I ask how you resolved the problem with focal loss?

from voxelnext.

seonhoon1002 avatar seonhoon1002 commented on May 31, 2024

Hello,
I solved the problem.
I actually reimplemented your code based on mmdetection3D.

class FocalLossSparse(nn.Module):

This class might have some problems because this Loss class has no function of splitting batches compared with other Loss classes such as RegLossSparse or something else.
I think I didn't find a similar function for that in your code.
Finally, I got below performances in Waymo Datasets based on your "voxelnext_ioubranch_large.yaml"
Vehicle/L2 mAPH: 0.6657 Pedestrian/L2 mAPH: 0.6599, Cyclist/L2 mAPH: 0.7042

Hi,

I have also been using mmdetection3d to reproduce recently, and I have also encountered a problem with long convergence time.

I have analyzed the loss and guess it may be a problem with focal loss.

May I ask how you resolved the problem with focal loss?

Hello,

In my perspective, I think your problem is not caused by focal loss

In VoxelNeXt, to make targets, they use "for loop" for all ground truth

and It gets worse when you use multi-task head groups such as ArgoverseV2, because we should make targets for each task per ground truth.

I tried to make a faster targeting module, but this repository uses Gaussian Focal Loss and nearest assignments, so I can't resolve the convergence problem. Because I don't know how avoid "for loop" when I implement GFL and nearest assignment

Maybe someday, If someone makes "cuda version" targeting module, we can make faster training convergence.

Good luck

from voxelnext.

csinsgcc avatar csinsgcc commented on May 31, 2024

Hello,
I solved the problem.
I actually reimplemented your code based on mmdetection3D.

class FocalLossSparse(nn.Module):

This class might have some problems because this Loss class has no function of splitting batches compared with other Loss classes such as RegLossSparse or something else.
I think I didn't find a similar function for that in your code.
Finally, I got below performances in Waymo Datasets based on your "voxelnext_ioubranch_large.yaml"
Vehicle/L2 mAPH: 0.6657 Pedestrian/L2 mAPH: 0.6599, Cyclist/L2 mAPH: 0.7042

Hi,
I have also been using mmdetection3d to reproduce recently, and I have also encountered a problem with long convergence time.
I have analyzed the loss and guess it may be a problem with focal loss.
May I ask how you resolved the problem with focal loss?

Hello,

In my perspective, I think your problem is not caused by focal loss

In VoxelNeXt, to make targets, they use "for loop" for all ground truth

and It gets worse when you use multi-task head groups such as ArgoverseV2, because we should make targets for each task per ground truth.

I tried to make a faster targeting module, but this repository uses Gaussian Focal Loss and nearest assignments, so I can't resolve the convergence problem. Because I don't know how avoid "for loop" when I implement GFL and nearest assignment

Maybe someday, If someone makes "cuda version" targeting module, we can make faster training convergence.

Good luck

Thank you very much for your detailed reply~

from voxelnext.

Rzlnmc avatar Rzlnmc commented on May 31, 2024

I have did experiments on 4 A100 and 20% waymo data, it seems that the k3 version voxelnext_ioubranch_large.yaml consumes twice as much time as the 2D version voxelnext2d_ioubranch.yaml. The former is 10hour, the latter is 4.5hour

Hi!

I also have a question regarding evaluation cost time.
Could you please let me know approximately how much time you spent on evaluation and generating the final results? In my training, the time spent on evaluation seems normal, but the time spent on generating the final results is several times longer than the evaluation time.
I would like to understand if this is a normal occurrence.

Thanks.

from voxelnext.

seonhoon1002 avatar seonhoon1002 commented on May 31, 2024

currence.

Thanks.

Hello,

Unfortunately, I can't remember the exact evaluation cost time. It's a too long time...

But it is weird that the final results are several times longer than the evaluation time.

I'm sorry I can't help.

Good luck!

from voxelnext.

Rzlnmc avatar Rzlnmc commented on May 31, 2024

currence.
Thanks.

Hello,

Unfortunately, I can't remember the exact evaluation cost time. It's a too long time...

But it is weird that the final results are several times longer than the evaluation time.

I'm sorry I can't help.

Good luck!

Thank you for your prompt response!

I am using the official configuration files and code from openpcdet for training, but the time spent on generating results is four times longer than the validation time.

I have found some answers, but they haven't resolved the issue.

Anyway, thank you very much.

from voxelnext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.