Giter Club home page Giter Club logo

Comments (8)

pengsongyou avatar pengsongyou commented on May 30, 2024

Hi @csuzhuzhuxia ,

If you consider 1- or 3-plane model on the ShapeNet dataset, you should be able to at least run 300K iterations in a day. It is strange that it takes so long for you to train. Maybe you can check when the overhead is. I doubt that it might be in the data loading.

Best,
Songyou

from convolutional_occupancy_networks.

Runsong123 avatar Runsong123 commented on May 30, 2024

Thanks for the quick reply!!
I will check it to find the overhead.

from convolutional_occupancy_networks.

pengsongyou avatar pengsongyou commented on May 30, 2024

@csuzhuzhuxia you have closed the issue but I am still curious where the overhead was? Thanks!

from convolutional_occupancy_networks.

Runsong123 avatar Runsong123 commented on May 30, 2024

@csuzhuzhuxia you have closed the issue but I am still curious where the overhead was? Thanks!

Sorry for the late reply. I am busy in other things in these days, so I just close the issue although I don't find the reason. And thank you for your concern.
I think you mean the 300K iterations in a day not 300K epochs(for ShapNet, 1 epoch = 959 iteration). But when I do the experiment with the shapnet-1-plane config, it is nearly 22k iteration in a day. I guess it due to that the dataset contains too many small files, and the file is saved in HDD disk, so the loader is so slow.

Thank you!

from convolutional_occupancy_networks.

pengsongyou avatar pengsongyou commented on May 30, 2024

Still strange because I was also keeping the dataset in HDD (or actually SSD?) and it worked well. But yup, thanks for letting me know!

from convolutional_occupancy_networks.

Runsong123 avatar Runsong123 commented on May 30, 2024

Still strange because I was also keeping the dataset in HDD (or actually SSD?) and it worked well. But yup, thanks for letting me know!

Hello, I train the model in another machine which installed SSD disk to verify the reason. The speed in SSD disk can reach to nearly 300K iteration a day using shapnet-1-plane setting.

zhu

from convolutional_occupancy_networks.

pengsongyou avatar pengsongyou commented on May 30, 2024

Still strange because I was also keeping the dataset in HDD (or actually SSD?) and it worked well. But yup, thanks for letting me know!

Hello, I train the model in another machine which installed SSD disk to verify the reason. The speed in SSD disk can reach to nearly 300K iteration a day using shapnet-1-plane setting.

zhu

I just checked the machine I used to run the model and I also kept the ShapeNet dataset in HDD. Therefore, this should not be the problem. For the 1-plane model on ShapeNet, when I used GTX 1080ti with HDD disk, I just checked again my output files and it took really only 8 hours to get 300K iterations.

I suggest you checking the following:

  • The GPU that you are using. I was using 2080ti and 1080ti (20-30% slower than 2080ti, but not too much).
  • num_workers that you use in the config file.
  • Check the volatile GPU-Util in nvidia-smi. If there are often cases 0%, it means that GPU is waiting for the data loading from CPU. This happened sometimes to me as well but disappeared by simply running your model again.
  • use time.time() in the code, to check where exactly slows down the code. For example, check runtime in between train_step or the train_loader.

Best,
Songyou

from convolutional_occupancy_networks.

Runsong123 avatar Runsong123 commented on May 30, 2024

Still strange because I was also keeping the dataset in HDD (or actually SSD?) and it worked well. But yup, thanks for letting me know!

Hello, I train the model in another machine which installed SSD disk to verify the reason. The speed in SSD disk can reach to nearly 300K iteration a day using shapnet-1-plane setting.
zhu

I just checked the machine I used to run the model and I also kept the ShapeNet dataset in HDD. Therefore, this should not be the problem. For the 1-plane model on ShapeNet, when I used GTX 1080ti with HDD disk, I just checked again my output files and it took really only 8 hours to get 300K iterations.

I suggest you checking the following:

  • The GPU that you are using. I was using 2080ti and 1080ti (20-30% slower than 2080ti, but not too much).
  • num_workers that you use in the config file.
  • Check the volatile GPU-Util in nvidia-smi. If there are often cases 0%, it means that GPU is waiting for the data loading from CPU. This happened sometimes to me as well but disappeared by simply running your model again.
  • use time.time() in the code, to check where exactly slows down the code. For example, check runtime in between train_step or the train_loader.

Best,
Songyou

Thanks for your quick reply.
I also compared the running time under different disks(SSD, HDD) keeping the same config provided by this repo, I find the difference is very large. Now, the training speed is fine for me when I change the disk. So I guess there exists some problems with my old HDD disk although the difference between a normal SSD and a normal HDD is not large from your test. The last, thank you again for your patient reply!!

-Zhu

from convolutional_occupancy_networks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.