Giter Club home page Giter Club logo

Comments (7)

jiaxiang-wu avatar jiaxiang-wu commented on May 16, 2024

It seems that you are using the Python version of CIFAR-10 data set.

INFO:tensorflow:data_dir_local: /home/zgz/project/data_set/cifar-10-batches-py

Please use the binary version instead. Same issue as this one.

from pocketflow.

as754770178 avatar as754770178 commented on May 16, 2024

Traceback (most recent call last): File "utils/get_idle_gpus.py", line 54, in <module> raise ValueError('not enough idle GPUs; idle GPUs are: {}'.format(idle_gpus)) ValueError: not enough idle GPUs; idle GPUs are: [] ‘nets/resnet_at_cifar10_run.py’ -> ‘main.py’
and

2018-11-06 14:54:44.429596: E tensorflow/stream_executor/cuda/cuda_driver.cc:397] failed call to cuInit: CUDA_ERROR_NO_DEVICE
why report these tow error?

from pocketflow.

jiaxiang-wu avatar jiaxiang-wu commented on May 16, 2024

It seems PocketFlow failed to find an idle GPU device. Can you post the result of nvidia-smi?

$ nvidia-smi

from pocketflow.

as754770178 avatar as754770178 commented on May 16, 2024

`(tf-1.10-cp3) [zgz@localhost models]$ nvidia-smi
Tue Nov 6 15:05:06 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:06:00.0 Off | 0 |
| N/A 81C P0 64W / 149W | 110MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:07:00.0 Off | 0 |
| N/A 61C P0 73W / 149W | 110MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 79C P0 61W / 149W | 110MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:86:00.0 Off | 0 |
| N/A 59C P0 71W / 149W | 110MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 17631 C ...gz/anaconda2/envs/tf-1.8-cp3/bin/python 99MiB |
| 1 17631 C ...gz/anaconda2/envs/tf-1.8-cp3/bin/python 99MiB |
| 2 17631 C ...gz/anaconda2/envs/tf-1.8-cp3/bin/python 99MiB |
| 3 17631 C ...gz/anaconda2/envs/tf-1.8-cp3/bin/python 99MiB |
+-----------------------------------------------------------------------------+`

`>>> from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())
2018-11-06 12:46:55.594800: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-06 12:46:55.832609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:06:00.0
totalMemory: 11.17GiB freeMemory: 11.00GiB
2018-11-06 12:46:56.019314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:07:00.0
totalMemory: 11.17GiB freeMemory: 11.00GiB
2018-11-06 12:46:56.188525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 2 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:85:00.0
totalMemory: 11.17GiB freeMemory: 11.00GiB
2018-11-06 12:46:56.388330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 3 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:86:00.0
totalMemory: 11.17GiB freeMemory: 11.00GiB
2018-11-06 12:46:56.388891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3
2018-11-06 12:46:57.834106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-06 12:46:57.834160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3
2018-11-06 12:46:57.834170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y N N
2018-11-06 12:46:57.834178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N N N
2018-11-06 12:46:57.834184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N N N Y
2018-11-06 12:46:57.834190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N N Y N
2018-11-06 12:46:57.835321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 10662 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:06:00.0, compute capability: 3.7)
2018-11-06 12:46:57.955038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:1 with 10662 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:07:00.0, compute capability: 3.7)
2018-11-06 12:46:58.078889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:2 with 10662 MB memory) -> physical GPU (device: 2, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
2018-11-06 12:46:58.196860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:3 with 10662 MB memory) -> physical GPU (device: 3, name: Tesla K80, pci bus id: 0000:86:00.0, compute capability: 3.7)
`

from pocketflow.

jiaxiang-wu avatar jiaxiang-wu commented on May 16, 2024

In utils/get_idle_gpus.py, a GPU device is treated as idle if there is no process running on it. According to your nvidia-smi's results, each of these four GPUs have some processes running, so utils/get_idle_gpus.py cannot find an idle one.

To temporarily override this, you may skip calling utils/get_idle_gpus.py and manually specify an idle GPU in scripts/run_local.sh.

from pocketflow.

as754770178 avatar as754770178 commented on May 16, 2024

OK, thanks

from pocketflow.

jiaxiang-wu avatar jiaxiang-wu commented on May 16, 2024

Closing this issue. Reopen it if there is any further questions.

from pocketflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.