Comments (5)
That is way too slow. We use num_workers=10, but that should not make such a big difference. For us, it takes about 30 minutes for each epoch, with a dataset of 1800 training trajectories, i.e. ~5000 batches. Can you do a profiling and see what is your current bottleneck?
from vcd.
I found the bottleneck is torch_geometirc.data.Dataloader. I try to set --num_workers=0~10 but it is still slow.
so I have checked my pc environment & pyFlex installation again.
pyFlex compilation: Ubuntu16.04, CUDA9.0
conda(execution) environment (on Ubuntu 16.04): torch==1.9, torch_geometirc==2.0.2 cudatoolkit==10.2
The one thing is that the difference of cuda version between pyFlex and conda is OK??
and can you tell me your pyFlex compilation environment and the Ubuntu , Torch, Cuda version you used in training??
from vcd.
I use python 3.6.9 under ubuntu 18, torch==1.9 and torch_geometric==2.02. cuda11.1 is used for both pytorch and pyflex. I would guess that these differences in versions should not matter. Since the dataloader is slow, can you check which part is slow? Is it the part that loads data from disk, or moving data from CPU to GPU? How long does each part take?
There is one thing to caution: When generating and loading data, we do have a filtering on the data, which could cause problem if your dataset is very small and you got unlucky.
from vcd.
can you recommand any profiling tools??
I profile the processing time of the functions inside of "def getitem():" in dataset.py
prepare_transition(data filtering) : 100~600(s). ==> mostly "filtering on the data" you mentioned
the others : under 0.01(s)
It seems that the python script calls dataloder(through getitiem) 16 times (batch_size) iteratively.
so it takes approx. 300(s/a call)*16(batch_size) = 4800(s) =80(miniutes) for one batch data loading.... :(
Do you have any suggestion to reduce the filtering data time??
from vcd.
I think this issue should have been resolved with our new release
from vcd.
Related Issues (7)
- No such file or directory 'best_state.json' HOT 2
- Failed to load model HOT 2
- conda package incompatible HOT 4
- `main_plan.py` cannot run multiprocessing program, i.e., num_workers >0. When I set num_workers = 0, there are some bugs. HOT 7
- How is the picked_particles updated for robot_exp case?
- about real robotic arm
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vcd.