Comments (7)
It seems that you are using the Python version of CIFAR-10 data set.
INFO:tensorflow:data_dir_local: /home/zgz/project/data_set/cifar-10-batches-py
Please use the binary version instead. Same issue as this one.
from pocketflow.
Traceback (most recent call last): File "utils/get_idle_gpus.py", line 54, in <module> raise ValueError('not enough idle GPUs; idle GPUs are: {}'.format(idle_gpus)) ValueError: not enough idle GPUs; idle GPUs are: [] ‘nets/resnet_at_cifar10_run.py’ -> ‘main.py’
and
2018-11-06 14:54:44.429596: E tensorflow/stream_executor/cuda/cuda_driver.cc:397] failed call to cuInit: CUDA_ERROR_NO_DEVICE
why report these tow error?
from pocketflow.
It seems PocketFlow failed to find an idle GPU device. Can you post the result of nvidia-smi?
$ nvidia-smi
from pocketflow.
`(tf-1.10-cp3) [zgz@localhost models]$ nvidia-smi
Tue Nov 6 15:05:06 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:06:00.0 Off | 0 |
| N/A 81C P0 64W / 149W | 110MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:07:00.0 Off | 0 |
| N/A 61C P0 73W / 149W | 110MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 79C P0 61W / 149W | 110MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:86:00.0 Off | 0 |
| N/A 59C P0 71W / 149W | 110MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 17631 C ...gz/anaconda2/envs/tf-1.8-cp3/bin/python 99MiB |
| 1 17631 C ...gz/anaconda2/envs/tf-1.8-cp3/bin/python 99MiB |
| 2 17631 C ...gz/anaconda2/envs/tf-1.8-cp3/bin/python 99MiB |
| 3 17631 C ...gz/anaconda2/envs/tf-1.8-cp3/bin/python 99MiB |
+-----------------------------------------------------------------------------+`
`>>> from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
2018-11-06 12:46:55.594800: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-06 12:46:55.832609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:06:00.0
totalMemory: 11.17GiB freeMemory: 11.00GiB
2018-11-06 12:46:56.019314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:07:00.0
totalMemory: 11.17GiB freeMemory: 11.00GiB
2018-11-06 12:46:56.188525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 2 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:85:00.0
totalMemory: 11.17GiB freeMemory: 11.00GiB
2018-11-06 12:46:56.388330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 3 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:86:00.0
totalMemory: 11.17GiB freeMemory: 11.00GiB
2018-11-06 12:46:56.388891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3
2018-11-06 12:46:57.834106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-06 12:46:57.834160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3
2018-11-06 12:46:57.834170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y N N
2018-11-06 12:46:57.834178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N N N
2018-11-06 12:46:57.834184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N N N Y
2018-11-06 12:46:57.834190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N N Y N
2018-11-06 12:46:57.835321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 10662 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:06:00.0, compute capability: 3.7)
2018-11-06 12:46:57.955038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:1 with 10662 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:07:00.0, compute capability: 3.7)
2018-11-06 12:46:58.078889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:2 with 10662 MB memory) -> physical GPU (device: 2, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
2018-11-06 12:46:58.196860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:3 with 10662 MB memory) -> physical GPU (device: 3, name: Tesla K80, pci bus id: 0000:86:00.0, compute capability: 3.7)
`
from pocketflow.
In utils/get_idle_gpus.py
, a GPU device is treated as idle if there is no process running on it. According to your nvidia-smi
's results, each of these four GPUs have some processes running, so utils/get_idle_gpus.py
cannot find an idle one.
To temporarily override this, you may skip calling utils/get_idle_gpus.py
and manually specify an idle GPU in scripts/run_local.sh
.
from pocketflow.
OK, thanks
from pocketflow.
Closing this issue. Reopen it if there is any further questions.
from pocketflow.
Related Issues (20)
- cifar10_channel pruned 的示例,通道剪枝(channel_pruning) 导出修改了计算图之后,速度比之前的更慢了! HOT 1
- Can the compression method provided by pocketflow be applied to MASK R-CNN? HOT 1
- QQ group HOT 1
- 我可以只用模型压缩部分么?
- TypeError: forward_train() missing 1 required positional argument: 'objects'
- Missing 1 required positional argument in constructor : data_format
- Download Pretrain Model But Get 502 Bad Gateway Error HOT 1
- You must feed a value for placeholder tensor 'model/input_1' with dtype float and shape [?,160,240,1]
- Question about export_chn_pruned_tflite_model.py HOT 1
- TF Version compatibility HOT 2
- Failed to create session
- Is it possible to compress the keras model with Pocket Flow
- Question about UniformLearner HOT 2
- Default tensorboard log output is huge
- FRCNN with VOC: Cannot batch tensors with different shapes in component 1.
- IndexError: list index out of range HOT 3
- Other issues:
- auto 通道裁剪问题
- test
- TF-Plus for Multi-GPU Training
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pocketflow.