Comments (8)
Hi @csuzhuzhuxia ,
If you consider 1- or 3-plane model on the ShapeNet dataset, you should be able to at least run 300K iterations in a day. It is strange that it takes so long for you to train. Maybe you can check when the overhead is. I doubt that it might be in the data loading.
Best,
Songyou
from convolutional_occupancy_networks.
Thanks for the quick reply!!
I will check it to find the overhead.
from convolutional_occupancy_networks.
@csuzhuzhuxia you have closed the issue but I am still curious where the overhead was? Thanks!
from convolutional_occupancy_networks.
@csuzhuzhuxia you have closed the issue but I am still curious where the overhead was? Thanks!
Sorry for the late reply. I am busy in other things in these days, so I just close the issue although I don't find the reason. And thank you for your concern.
I think you mean the 300K iterations in a day not 300K epochs(for ShapNet, 1 epoch = 959 iteration). But when I do the experiment with the shapnet-1-plane config, it is nearly 22k iteration in a day. I guess it due to that the dataset contains too many small files, and the file is saved in HDD disk, so the loader is so slow.
Thank you!
from convolutional_occupancy_networks.
Still strange because I was also keeping the dataset in HDD (or actually SSD?) and it worked well. But yup, thanks for letting me know!
from convolutional_occupancy_networks.
Still strange because I was also keeping the dataset in HDD (or actually SSD?) and it worked well. But yup, thanks for letting me know!
Hello, I train the model in another machine which installed SSD disk to verify the reason. The speed in SSD disk can reach to nearly 300K iteration a day using shapnet-1-plane setting.
zhu
from convolutional_occupancy_networks.
Still strange because I was also keeping the dataset in HDD (or actually SSD?) and it worked well. But yup, thanks for letting me know!
Hello, I train the model in another machine which installed SSD disk to verify the reason. The speed in SSD disk can reach to nearly 300K iteration a day using shapnet-1-plane setting.
zhu
I just checked the machine I used to run the model and I also kept the ShapeNet dataset in HDD. Therefore, this should not be the problem. For the 1-plane model on ShapeNet, when I used GTX 1080ti with HDD disk, I just checked again my output files and it took really only 8 hours to get 300K iterations.
I suggest you checking the following:
- The GPU that you are using. I was using 2080ti and 1080ti (20-30% slower than 2080ti, but not too much).
num_workers
that you use in the config file.- Check the volatile GPU-Util in nvidia-smi. If there are often cases 0%, it means that GPU is waiting for the data loading from CPU. This happened sometimes to me as well but disappeared by simply running your model again.
- use time.time() in the code, to check where exactly slows down the code. For example, check runtime in between
train_step
or thetrain_loader
.
Best,
Songyou
from convolutional_occupancy_networks.
Still strange because I was also keeping the dataset in HDD (or actually SSD?) and it worked well. But yup, thanks for letting me know!
Hello, I train the model in another machine which installed SSD disk to verify the reason. The speed in SSD disk can reach to nearly 300K iteration a day using shapnet-1-plane setting.
zhuI just checked the machine I used to run the model and I also kept the ShapeNet dataset in HDD. Therefore, this should not be the problem. For the 1-plane model on ShapeNet, when I used GTX 1080ti with HDD disk, I just checked again my output files and it took really only 8 hours to get 300K iterations.
I suggest you checking the following:
- The GPU that you are using. I was using 2080ti and 1080ti (20-30% slower than 2080ti, but not too much).
num_workers
that you use in the config file.- Check the volatile GPU-Util in nvidia-smi. If there are often cases 0%, it means that GPU is waiting for the data loading from CPU. This happened sometimes to me as well but disappeared by simply running your model again.
- use time.time() in the code, to check where exactly slows down the code. For example, check runtime in between
train_step
or thetrain_loader
.Best,
Songyou
Thanks for your quick reply.
I also compared the running time under different disks(SSD, HDD) keeping the same config provided by this repo, I find the difference is very large. Now, the training speed is fine for me when I change the disk. So I guess there exists some problems with my old HDD disk although the difference between a normal SSD and a normal HDD is not large from your test. The last, thank you again for your patient reply!!
-Zhu
from convolutional_occupancy_networks.
Related Issues (20)
- train on outdoor LiDAR pointcloud HOT 1
- Noise during inference
- What's the difference between "points.npz" and "pointcloud.npz"? HOT 1
- How do I retrain the model HOT 1
- Finetune for pretrained model HOT 1
- How can I generate mesh with my own point cloud data HOT 7
- RuntimeError: cannot join current thread" by running "python generate.py configs/pointcloud_crop/demo_matterport.yaml" HOT 6
- How to transform a mesh to a occupancy mat? HOT 1
- Use *.pcd file as input to inference or inference without normals HOT 1
- How to train a new model for your own dataset HOT 4
- Which config file is to run 3D Volume input?
- out of memory error HOT 1
- on my own data HOT 2
- What is the resolution of surface reconstruction ? HOT 1
- unable to download shapenet, synthetic room datasets in colab
- RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
- what 's the difference between shape_1plane.pt and shape_3plane.pt??? HOT 1
- Question on initialized parameter values for point cloud crop generation
- pip install torch_scatter==2.0.2
- from cython.operator cimport dereference as dref出错
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from convolutional_occupancy_networks.