Giter Club home page Giter Club logo

gputil's Introduction

GPUtil

GPUtil is a Python module for getting the GPU status from NVIDA GPUs using nvidia-smi. GPUtil locates all GPUs on the computer, determines their availablity and returns a ordered list of available GPUs. Availablity is based upon the current memory consumption and load of each GPU. The module is written with GPU selection for Deep Learning in mind, but it is not task/library specific and it can be applied to any task, where it may be useful to identify available GPUs.

Table of Contents

  1. Requirements
  2. Installation
  3. Usage
    1. Main functions
    2. Helper functions
  4. Examples
    1. Select first available GPU in Caffe
    2. Occupy only 1 GPU in TensorFlow
    3. Monitor GPU in a separate thread
  5. License

Requirements

NVIDIA GPU with latest NVIDIA driver installed. GPUtil uses the program nvidia-smi to get the GPU status of all available NVIDIA GPUs. nvidia-smi should be installed automatically, when you install your NVIDIA driver.

Supports both Python 2.X and 3.X.

Python libraries:

Tested on CUDA driver version 390.77 Python 2.7 and 3.5.

Installation

  1. Open a terminal (Ctrl+Shift+T)
  2. Type pip install gputil
  3. Test the installation
    1. Open a terminal in a folder other than the GPUtil folder
    2. Start a python console by typing python in the terminal
    3. In the newly opened python console, type:
      import GPUtil
      GPUtil.showUtilization()
    4. Your output should look something like following, depending on your number of GPUs and their current usage:
       ID  GPU  MEM
      --------------
        0    0%   0%
      

Old way of installation

  1. Download or clone repository to your computer
  2. Add GPUtil folder to ~/.bashrc
    1. Open a new terminal (Press Ctrl+Alt+T)
    2. Open bashrc:
      gedit ~/.bashrc
      
    3. Added your GPUtil folder to the environment variable PYTHONPATH (replace <path_to_gputil> with your folder path):
      export PYTHONPATH="$PYTHONPATH:<path_to_gputil>"
      
      Example:
      export PYTHONPATH="$PYTHONPATH:/home/anderskm/github/gputil"
      
    4. Save ~/.bashrc and close gedit
    5. Restart your terminal
  3. Test the installation
    1. Open a terminal in a folder other than the GPUtil folder
    2. Start a python console by typing python in the terminal
    3. In the newly opened python console, type:
      import GPUtil
      GPUtil.showUtilization()
    4. Your output should look something like following, depending on your number of GPUs and their current usage:
       ID  GPU  MEM
      --------------
        0    0%   0%
      

Usage

To include GPUtil in your Python code, all you hve to do is included it at the beginning of your script:

import GPUtil

Once included all functions are available. The functions along with a short description of inputs, outputs and their functionality can be found in the following two sections.

Main functions

deviceIDs = GPUtil.getAvailable(order = 'first', limit = 1, maxLoad = 0.5, maxMemory = 0.5, includeNan=False, excludeID=[], excludeUUID=[])

Returns a list ids of available GPUs. Availablity is determined based on current memory usage and load. The order, maximum number of devices, their maximum load and maximum memory consumption are determined by the input arguments.

  • Inputs
    • order - Deterimines the order in which the available GPU device ids are returned. order should be specified as one of the following strings:
      • 'first' - orders available GPU device ids by ascending id (defaut)
      • 'last' - orders available GPU device ids by descending id
      • 'random' - orders the available GPU device ids randomly
      • 'load'- orders the available GPU device ids by ascending load
      • 'memory' - orders the available GPU device ids by ascending memory usage
    • limit - limits the number of GPU device ids returned to the specified number. Must be positive integer. (default = 1)
    • maxLoad - Maximum current relative load for a GPU to be considered available. GPUs with a load larger than maxLoad is not returned. (default = 0.5)
    • maxMemory - Maximum current relative memory usage for a GPU to be considered available. GPUs with a current memory usage larger than maxMemory is not returned. (default = 0.5)
    • includeNan - True/false flag indicating whether to include GPUs where either load or memory usage is NaN (indicating usage could not be retrieved). (default = False)
    • excludeID - List of IDs, which should be excluded from the list of available GPUs. See GPU class description. (default = [])
    • excludeUUID - Same as excludeID except it uses the UUID. (default = [])
  • Outputs
    • deviceIDs - list of all available GPU device ids. A GPU is considered available, if the current load and memory usage is less than maxLoad and maxMemory, respectively. The list is ordered according to order. The maximum number of returned device ids is limited by limit.
deviceID = GPUtil.getFirstAvailable(order = 'first', maxLoad=0.5, maxMemory=0.5, attempts=1, interval=900, verbose=False)

Returns the first avaiable GPU. Availablity is determined based on current memory usage and load, and the ordering is determined by the specified order. If no available GPU is found, an error is thrown. When using the default values, it is the same as getAvailable(order = 'first', limit = 1, maxLoad = 0.5, maxMemory = 0.5)

  • Inputs
    • order - See the description for GPUtil.getAvailable(...)
    • maxLoad - Maximum current relative load for a GPU to be considered available. GPUs with a load larger than maxLoad is not returned. (default = 0.5)
    • maxMemory - Maximum current relative memory usage for a GPU to be considered available. GPUs with a current memory usage larger than maxMemory is not returned. (default = 0.5)
    • attempts - Number of attempts the function should make before giving up finding an available GPU. (default = 1)
    • interval - Interval in seconds between each attempt to find an available GPU. (default = 900 --> 15 mins)
    • verbose - If True, prints the attempt number before each attempt and the GPU id if an available is found.
    • includeNan - See the description for GPUtil.getAvailable(...). (default = False)
    • excludeID - See the description for GPUtil.getAvailable(...). (default = [])
    • excludeUUID - See the description for GPUtil.getAvailable(...). (default = [])
  • Outputs
    • deviceID - list with 1 element containing the first available GPU device ids. A GPU is considered available, if the current load and memory usage is less than maxLoad and maxMemory, respectively. The order and limit are fixed to 'first' and 1, respectively.
GPUtil.showUtilization(all=False, attrList=None, useOldCode=False)

Prints the current status (id, memory usage, uuid load) of all GPUs

  • Inputs
    • all - True/false flag indicating if all info on the GPUs should be shown. Overwrites attrList.
    • attrList - List of lists of GPU attributes to display. See code for more information/example.
    • useOldCode - True/false flag indicating if the old code to display GPU utilization should be used.
  • Outputs
    • None

Helper functions

 class GPU

Helper class handle the attributes of each GPU. Quoted descriptions are copied from corresponding descriptions by nvidia-smi.

  • Attributes for each GPU
    • id - "Zero based index of the GPU. Can change at each boot."
    • uuid - "This value is the globally unique immutable alphanumeric identifier of the GPU. It does not correspond to any physical label on the board. Does not change across reboots."
    • load - Relative GPU load. 0 to 1 (100%, full load). "Percent of time over the past sample period during which one or more kernels was executing on the GPU. The sample period may be between 1 second and 1/6 second depending on the product."
    • memoryUtil - Relative memory usage from 0 to 1 (100%, full usage). "Percent of time over the past sample period during which global (device) memory was being read or written. The sample period may be between 1 second and 1/6 second depending on the product."
    • memoryTotal - "Total installed GPU memory."
    • memoryUsed - "Total GPU memory allocated by active contexts."
    • memoryFree - "Total free GPU memory."
    • driver - "The version of the installed NVIDIA display driver."
    • name - "The official product name of the GPU."
    • serial - This number matches the serial number physically printed on each board. It is a globally unique immutable alphanumeric value.
    • display_mode - "A flag that indicates whether a physical display (e.g. monitor) is currently connected to any of the GPU's connectors. "Enabled" indicates an attached display. "Disabled" indicates otherwise."
    • display_active - "A flag that indicates whether a display is initialized on the GPU's (e.g. memory is allocated on the device for display). Display can be active even when no monitor is physically attached. "Enabled" indicates an active display. "Disabled" indicates otherwise."
GPUs = GPUtil.getGPUs()
  • Inputs
    • None
  • Outputs
    • GPUs - list of all GPUs. Each GPU corresponds to one GPU in the computer and contains a device id, relative load and relative memory usage.
GPUavailability = GPUtil.getAvailability(GPUs, maxLoad = 0.5, maxMemory = 0.5, includeNan=False, excludeID=[], excludeUUID=[])

Given a list of GPUs (see GPUtil.getGPUs()), return a equally sized list of ones and zeroes indicating which corresponding GPUs are available.

  • Inputs
    • GPUs - List of GPUs. See GPUtil.getGPUs()
    • maxLoad - Maximum current relative load for a GPU to be considered available. GPUs with a load larger than maxLoad is not returned. (default = 0.5)
    • maxMemory - Maximum current relative memory usage for a GPU to be considered available. GPUs with a current memory usage larger than maxMemory is not returned. (default = 0.5)
    • includeNan - See the description for GPUtil.getAvailable(...). (default = False)
    • excludeID - See the description for GPUtil.getAvailable(...). (default = [])
    • excludeUUID - See the description for GPUtil.getAvailable(...). (default = [])
  • Outputs
    • GPUavailability - binary list indicating if GPUs are available or not. A GPU is considered available, if the current load and memory usage is less than maxLoad and maxMemory, respectively.

See demo_GPUtil.py for examples and more details.

Examples

Select first available GPU in Caffe

In the Deep Learning library Caffe, the user can switch between using the CPU or GPU through their Python interface. This is done by calling the methods caffe.set_mode_cpu() and caffe.set_mode_gpu(), respectively. Below is a minimum working example for selecting the first available GPU with GPUtil to run a Caffe network.

# Import caffe and GPUtil
import caffe
import GPUtil

# Set CUDA_DEVICE_ORDER so the IDs assigned by CUDA match those from nvidia-smi
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"

# Get the first available GPU
DEVICE_ID_LIST = GPUtil.getFirstAvailable()
DEVICE_ID = DEVICE_ID_LIST[0] # grab first element from list

# Select GPU mode
caffe.set_mode_gpu()
# Select GPU id
caffe.set_device(DEVICE_ID)

# Initialize your network here

Note: At the time of writing this example, the Caffe Python wrapper only supports 1 GPU, although the underlying code supports multiple GPUs. Calling directly Caffe from the terminal allows for using multiple GPUs.

Occupy only 1 GPU in TensorFlow

By default, TensorFlow will occupy all available GPUs when using a gpu as a device (e.g. tf.device('\gpu:0')). By setting the environment variable CUDA_VISIBLE_DEVICES, the user can mask which GPUs should be visible to TensorFlow via CUDA (See CUDA_VISIBLE_DEVICES - Masking GPUs). Using GPUtil.py, the CUDA_VISIBLE_DEVICES can be set programmatically based on the available GPUs. Below is a minimum working example of how to occupy only 1 GPU in TensorFlow using GPUtil. To run the code, copy it into a new python file (e.g. demo_tensorflow_gputil.py) and run it (e.g. enter python demo_tensorflow_gputil.py in a terminal).

Note: Even if you set the device you run your code on to a CPU, TensorFlow will occupy all available GPUs. To avoid this, all GPUs can be hidden from TensorFlow with os.environ["CUDA_VISIBLE_DEVICES"] = ''.

# Import os to set the environment variable CUDA_VISIBLE_DEVICES
import os
import tensorflow as tf
import GPUtil

# Set CUDA_DEVICE_ORDER so the IDs assigned by CUDA match those from nvidia-smi
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"

# Get the first available GPU
DEVICE_ID_LIST = GPUtil.getFirstAvailable()
DEVICE_ID = DEVICE_ID_LIST[0] # grab first element from list

# Set CUDA_VISIBLE_DEVICES to mask out all other GPUs than the first available device id
os.environ["CUDA_VISIBLE_DEVICES"] = str(DEVICE_ID)

# Since all other GPUs are masked out, the first available GPU will now be identified as GPU:0
device = '/gpu:0'
print('Device ID (unmasked): ' + str(DEVICE_ID))
print('Device ID (masked): ' + str(0))

# Run a minimum working example on the selected GPU
# Start a session
with tf.Session() as sess:
    # Select the device
    with tf.device(device):
        # Declare two numbers and add them together in TensorFlow
        a = tf.constant(12)
        b = tf.constant(30)
        result = sess.run(a+b)
        print('a+b=' + str(result))

Your output should look something like the code block below. Notice how only one of the GPUs are found and created as a tensorflow device.

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Device: /gpu:0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:02:00.0
Total memory: 11.90GiB
Free memory: 11.76GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:02:00.0)
a+b=42

Comment the os.environ["CUDA_VISIBLE_DEVICES"] = str(DEVICE_ID) line and compare the two outputs. Depending on your number of GPUs, your output should look something like code block below. Notice, how all 4 GPUs are being found and created as a tensorflow device, whereas when CUDA_VISIBLE_DEVICES was set, only 1 GPU was found and created.

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Device: /gpu:0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:02:00.0
Total memory: 11.90GiB
Free memory: 11.76GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2c8e400
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:03:00.0
Total memory: 11.90GiB
Free memory: 11.76GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2c92040
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:83:00.0
Total memory: 11.90GiB
Free memory: 11.76GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2c95d90
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:84:00.0
Total memory: 11.90GiB
Free memory: 11.76GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2:   N N Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3:   N N Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: TITAN X (Pascal), pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: TITAN X (Pascal), pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: TITAN X (Pascal), pci bus id: 0000:84:00.0)
a+b=42

Monitor GPU in a separate thread

If using GPUtil to monitor GPUs during training, it may show 0% utilization. A way around this is to use a separate monitoring thread.

import GPUtil
from threading import Thread
import time

class Monitor(Thread):
    def __init__(self, delay):
        super(Monitor, self).__init__()
        self.stopped = False
        self.delay = delay # Time between calls to GPUtil
        self.start()

    def run(self):
        while not self.stopped:
            GPUtil.showUtilization()
            time.sleep(self.delay)

    def stop(self):
        self.stopped = True
        
# Instantiate monitor with a 10-second delay between updates
monitor = Monitor(10)

# Train, etc.

# Close monitor
monitor.stop()

License

See LICENSE

gputil's People

Contributors

anderskm avatar bashbug avatar djsutherland avatar ifeherva avatar jfainberg avatar madsdyrmann avatar neilconway avatar tmshn avatar zijwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gputil's Issues

showUtilization causes GPU stuttering

Running a simple looped call to this function (showUtilization) causes stuttering in games (recordable in frametimes) and shown in 3rd party testing below:

191485181-35c9d5c0-58a7-4286-bdc6-ebc78ccc4084
(Gif is taken from another project but the below script gives the same issue)

To Reproduce
Steps to reproduce the behavior:

  1. Open a browser page to https://www.testufo.com/animation-time-graph
  2. Allow the test to settle
  3. Run test script

Test Script

import time
import GPUtil

while True:
    GPUtil.showUtilization()
    time.sleep(1)

Get GPUs that are not used by any other user

I often don't care about the memory usage of my GPUs, but I care a lot if someone else is using the GPUs.

Is there any way, like in gpustat command to GPUtil.getFirstAvailable(order='memory', maxLoad=1, maxMemory=1) and isUsed=False ?

I have processes running on 1000s of GPUs, and I don't want to run on already used GPUs by myself or other people.

GPUtil.showUtilization does not work for individual attrList

The showUtilization function offers the possibility to restrict the output given an attrList list in the parameters.

However, if such attrList is defined in the parameter list, it will never make it to its processing. The function decides first between "all" is set or not. In both cases, either the output ("oldCode") is directly printed or the attrList parameter is overwritten, regardless whether it has been set or not.

It's just a small thing, but it would be convenient to be able to restrict the output to only the few fields one needs for debugging ...

Thanks,
Andre

ValueError when nvidia-smi finds no GPU

In Line 90:

lines = output.split(os.linesep)

returns [''] instead of [] when nvidia-smi finds no GPU, which then causes ValueError by the parser.

Suggested update:

lines = list(filter(None, output.split(os.linesep)))

Over 60 times slower than nvidia-smi to asses resource usage

Easiest way to replicate would be:

time:

import nvidia_smi
import numpy as np


nvidia_smi.nvmlInit()

for _ in range(50):
        gpus = [nvidia_smi.nvmlDeviceGetHandleByIndex(i) for i in range(nvidia_smi.nvmlDeviceGetCount())]
        res_arr = [nvidia_smi.nvmlDeviceGetUtilizationRates(handle) for handle in gpus]
        print('Usage with nivida-smi: ', np.sum([res.gpu for res in res_arr]), '%')

Then time:

import GPUtil
import numpy as np

for _ in range(50):
        res_arr = GPUtil.getGPUs()
        print('Usage with GPUtil: ', np.sum([res.load for res in res_arr])*100, '%')

YMMV here but for the first one I get constant reports of 1% GPU utilization and runtime is:

real    0m0,179s
user    0m0,688s
sys     0m0,818s

For the second one GPU utilization climb to a whooping 93% by the 6th call and the runtime is:

real    0m11,267s
user    0m0,605s
sys     0m11,449s

The getGPUs() seems to be fairly close to what nvidia SMI does with nvmlDeviceGetUtilizationRates, and quite frankly it being 63x times slower and consuming ~100% of my GPU (2080RTX) to run, as opposed to 1% seems a bit unreasonable.

Since may people use this library to figure out GPU utilization it might be reasonable to try and have a more efficient version of getGPUs for that or, if it provides some "extra" features (e.g. it samples 100x calls and average them out) a way to control the settings on that might be welcome.

Or maybe I'm doing something completely wrong here, in which case, let me know.

GPUtil CPU Usage

Hi I notice when using GPUtil that the CPU usage is much higher than pynvml, can anyone explain why or assist me?

Using GPUtil

#!/usr/bin/python
import GPUtil
gpu = GPUtil.getGPUs()[0]
gpu_util = int(gpu.load * 100)
gpu_temp = int(gpu.temperature)
$ /usr/bin/time -v ./GPUtil-test.py
        Command being timed: "./GPUtil-test.py"
        User time (seconds): 0.21
        System time (seconds): 0.43
        Percent of CPU this job got: 481%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.13
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 26088
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 7978
        Voluntary context switches: 32
        Involuntary context switches: 769
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Using pynvml

#!/usr/bin/python
import pynvml as nv
nv.nvmlInit()
handle = nv.nvmlDeviceGetHandleByIndex(0)
gpu_util = nv.nvmlDeviceGetUtilizationRates(handle).gpu
gpu_temp = nv.nvmlDeviceGetTemperature(handle, nv.NVML_TEMPERATURE_GPU)
nv.nvmlShutdown()
$ /usr/bin/time -v ./pynvml-test.py 
        Command being timed: "./pynvml-test.py "
        User time (seconds): 0.02
        System time (seconds): 0.01
        Percent of CPU this job got: 84%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 15732
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 2454
        Voluntary context switches: 2
        Involuntary context switches: 2
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Add Kubernetes support through device plugins

Hey,

I've recently been using Kubernetes on Azure through their AKS and have a couple of python packages that use this project as a dependency. In order for Kubes to support a wide range of devices they developed a standard for device interfaces to get information about devices on the machine see https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md and https://github.com/NVIDIA/k8s-device-plugin

Unfortunately this interface replaces nvidia-smi when kubes is being run and as such this project will return 0 GPUs found even when there may be a few attached to a machine.

Would it be possible to add support for finding GPUs through this interface to this project? I'm happy to give it a go and try to add support.

NameError: name 'unicode' is not defined

Hi,

On Windows 10 (64 bit), I'm getting the following error:

Python 3.6.5 | packaged by conda-forge | (default, Apr 6 2018, 16:13:55) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import GPUtil
In [2]: GPUtil.showUtilization()


NameError Traceback (most recent call last)
in ()
----> 1 GPUtil.showUtilization()
~\Anaconda3\lib\site-packages\GPUtil\GPUtil.py in showUtilization(all, attrList, useOldCode)
248 elif (isinstance(attr,str)):
249 attrStr = attr;
--> 250 elif (isinstance(attr,unicode)):
251 attrStr = attr.encode('ascii','ignore')
252 else:
NameError: name 'unicode' is not defined

Any idea how to fix this?

Thanks a lot!

GPU util stuck at 0%?

I'm having a strange issue on various machine where every call of showUtilization() shows 0% GPU util, even though nvidia-smi at the same time returns 100%. It does, however, correctly show memory usage. Any idea why this might occur?

Thanks for writing this utility!

ValueError: invalid literal for int() with base 10: 'No devices were found'

When running GPUtil.getGPUs()with 0 available GPUs, I get an error on line 102 in GPUtil.py. Line 92 assumes the number of available devices is returned, but it doesn't account for the fact that you can get the str "No devices were found" as an output and instead returns the number of devices as 1. This errors out on line 102 as we can't cast the str to an int.

Should be an easy enough fix, would just need a check after line 92 if numDevices == 1 to make sure it's an actual number and not the str.

Request: Add all query information from nvidia-smi

This is a really handy module. It would be even better if you could access more information available in nvidia-smi -q. For example:

nvidia-smi.exe -q -i 0                                                                 
==============NVSMI LOG==============

Timestamp                           : Wed Aug 12 20:36:37 2020
Driver Version                      : 442.92
CUDA Version                        : 10.2

Attached GPUs                       : 4
GPU 00000000:18:00.0
    Product Name                    : GeForce GTX 1080 Ti
    Product Brand                   : GeForce
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : N/A
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : WDDM
        Pending                     : WDDM
    Serial Number                   : N/A
    GPU UUID                        : GPU-95ef7c5d-fc11-835b-cd38-2020193cf8e0
    Minor Number                    : N/A
    VBIOS Version                   : 86.02.39.00.22
    MultiGPU Board                  : No
    Board ID                        : 0x1800
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.01.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x18
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1B0610DE
        Bus Id                      : 00000000:18:00.0
        Sub System Id               : 0x85E51043
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 172000 KB/s
        Rx Throughput               : 9000 KB/s
    Fan Speed                       : 36 %
    Performance State               : P2
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 11264 MiB
        Used                        : 458 MiB
        Free                        : 10806 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 229 MiB
        Free                        : 27 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 2 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 68 C
        GPU Shutdown Temp           : 96 C
        GPU Slowdown Temp           : 93 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 65.79 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 125.00 W
        Max Power Limit             : 300.00 W
    Clocks
        Graphics                    : 1480 MHz
        SM                          : 1480 MHz
        Memory                      : 5005 MHz
        Video                       : 1265 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 1911 MHz
        SM                          : 1911 MHz
        Memory                      : 5505 MHz
        Video                       : 1620 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 1628
            Type                    : C+G
            Name                    : Insufficient Permissions
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 10220
            Type                    : C+G
            Name                    : C:\Windows\explorer.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 10416
            Type                    : C+G
            Name                    : C:\Program Files\WindowsApps\Microsoft.Windows.Photos_2020.20070.10002.0_x64__8wekyb3d8bbwe\Microsoft.Photos.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 10988
            Type                    : C+G
            Name                    : C:\Windows\SystemApps\Microsoft.Windows.StartMenuExperienceHost_cw5n1h2txyewy\StartMenuExperienceHost.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 11376
            Type                    : C+G
            Name                    : C:\Windows\SystemApps\Microsoft.Windows.Cortana_cw5n1h2txyewy\SearchUI.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 11896
            Type                    : C+G
            Name                    : C:\Program Files\WindowsApps\Microsoft.YourPhone_1.20071.95.0_x64__8wekyb3d8bbwe\YourPhone.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 13212
            Type                    : C+G
            Name                    : C:\Program Files\WindowsApps\Microsoft.SkypeApp_15.63.76.0_x86__kzf8qxf38zg5c\Skype\Skype.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 14404
            Type                    : C+G
            Name                    : Insufficient Permissions
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 14516
            Type                    : C+G
            Name                    : C:\Windows\SystemApps\Microsoft.MicrosoftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 14568
            Type                    : C+G
            Name                    : C:\Windows\ImmersiveControlPanel\SystemSettings.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 15168
            Type                    : C+G
            Name                    : C:\Windows\SystemApps\InputApp_cw5n1h2txyewy\WindowsInternal.ComposableShell.Experiences.TextInput.InputApp.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 15536
            Type                    : C+G
            Name                    : C:\Windows\System32\MicrosoftEdgeCP.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 16048
            Type                    : C+G
            Name                    : C:\Windows\SystemApps\Microsoft.LockApp_cw5n1h2txyewy\LockApp.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 16564
            Type                    : C+G
            Name                    : C:\Windows\SystemApps\ShellExperienceHost_cw5n1h2txyewy\ShellExperienceHost.exe
            Used GPU Memory         : Not available in WDDM driver model

Please add __version__ environment variable.

Thank you creating this module. If possible please add __version__.

>>> GPUtil.__version__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'GPUtil' has no attribute '__version__'

Mixing up GPU names on slowest first GPU bus ID

I understand that GPUtil infers the GPUs attributes so it will match the nvidia-smi output.

The thing is, that GPUtil is commonly used with TensorFlow or other GPU utilizing frameworks - these frameworks usually use the IDs in a manner that is sorted by their quality.
For example, in TensorFlow if you set CUDA_VISIBLE_DEVICES = '0' in your environment variables, only the fastest GPU will be exposed to the library.

In my setup, I have two different GPUs on the same machine - during runtime I use GPUtil to figure out which GPU has most memory available and using the GPU ID I designate a GPU to use. But since my slowest GPU is installed in the first bus, then it shows up in GPUtil as 0 and the faster one as 1.

I would suggest that there will be a parameter to pass to GPUtil.getGPUs() that will help sort that out, so that any downstream frameworks that rely on CUDA_VISIBLE_DEVICES would be able to get the IDs right.

Very new to all of this, please help?

Hi there, I am running Automatic1111 and it seems very slow on my new laptop.

I am not usually a coder (at all) but I have been tinkering with this on my older laptop and am now creating on my new one.

the issue is this.

every time that I create an image I get this error:

[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified

More specifically I get this error:

Loading VAE weights specified in settings: C:\Users\xxxxxxx\stable-diffusion-webui-directml\models\VAE\klF8Anime2VAE_klF8Anime2VAE.safetensors
Applying attention optimization: InvokeAI... done.
Weights loaded in 3.6s (calculate hash: 2.3s, load weights from disk: 0.6s, apply weights to model: 0.4s, load VAE: 0.2s).
Calculating sha256 for C:\Users\xxxxxxx\stable-diffusion-webui-directml\models\Lora\add_detail.safetensors: 7c6bad76eb54e80ebe40f5a455b1cf7a743e09fe2fc1289cf333544e3aa071ce
0%| | 0/40 [00:00<?, ?it/s]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified
2%|██ | 1/40 [00:28<18:13, 28.03s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<48:17, 32.56s/it]
5%|████▏ | 2/40 [00:51<16:05, 25.40s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<39:59, 27.27s/it]
8%|██████▏ | 3/40 [01:13<14:34, 23.63s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<35:44, 24.64s/it]
10%|████████▎ | 4/40 [01:37<14:17, 23.82s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<35:01, 24.44s/it]
12%|██████████▍ | 5/40 [02:01<13:58, 23.97s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<34:30, 24.36s/it]
15%|████████████▍ | 6/40 [02:25<13:35, 23.98s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<33:55, 24.23s/it]
18%|██████████████▌ | 7/40 [02:48<13:00, 23.66s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<32:58, 23.83s/it]
20%|████████████████▌ | 8/40 [03:12<12:36, 23.65s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<32:28, 23.77s/it]
22%|██████████████████▋ | 9/40 [03:36<12:15, 23.74s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<32:09, 23.82s/it]
25%|████████████████████▌ | 10/40 [03:59<11:53, 23.80s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<31:48, 23.85s/it]
28%|██████████████████████▌ | 11/40 [04:24<11:33, 23.90s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<31:31, 23.94s/it]
30%|████████████████████████▌ | 12/40 [04:48<11:10, 23.96s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<31:11, 23.99s/it]
32%|██████████████████████████▋ | 13/40 [05:12<10:50, 24.08s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<30:55, 24.10s/it]
35%|████████████████████████████▋ | 14/40 [05:36<10:27, 24.15s/it]
Total progress: 16%|██████████▎ | 14/90 [05:41<30:36, 24.16s/it]

Does anyone know a solution to this?

Wrong kw parameter name used in README.md

Under "Main functions", README.md gives the following example:

deviceIDs = GPUtil.getAvailable(order = 'first', limit = 1, maxLoad = 0.5, maxMemory = 0.5, ignoreNan=False, excludeID=[], excludeUUID=[])

I believe ignoreNan is actually meant to be includeNan.

Handle nvidia-smi non-zero exit status

A common error is that nvidia-smi outputs an error instead of the expected data.

Example:

# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.161
# echo $?
18

This needs to be handled in the code.
(Pull request coming up)

Pyinstaller exe with console=False causes pop-up window every time nvidia-smi.exe is called

Hello,

Packaging gputil with Pyinstaller (console=False [pythonw.exe]) causes a pop-up window to open every time I want to access GPU stats.

I could suppress it by adding creationflags = subprocess.CREATE_NO_WINDOW in the Popen command.

p = Popen([nvidia_smi,"--query-gpu=index,uuid,utilization.gpu,memory.total,memory.used,memory.free,driver_version,name,gpu_serial,display_active,display_mode,temperature.gpu", "--format=csv,noheader,nounits"], stdout=PIPE)

Maybe it's of interest to future users, so I will leave it here :)

Unable to find GPU on Windows

Hi,

I'd like to thank and commend you on putting this together!

I am running Windows and this is my output of nvidia-smi:

(base) PS C:\Users\sarth> nvidia-smi.exe
Tue Jul 28 16:16:35 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 451.77       Driver Version: 451.77       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208... WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   46C    P8     7W /  N/A |   4402MiB /  8192MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+

But, I am not able to detect the GPU from the GPUtil:

>>> os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
>>> GPUtil.getAvailable()
[]
>>> GPUtil.__version__
'1.4.0'

Is there something extra I need to add in the python code to get this working?

Thanks!

GPUtil doesn't find GPU

I am having an issue with this module.
It doesn't find my GPU, but when I go in Command Line and write "nvidia-smi" everything seems to work.
I already reinstalled my NVIDIA drivers and the module, but nothing works.

nvidia-smi comes with NVIDIA drivers, and not CUDA

This is very useful module! Thank you!

I have a small suggestion wrt README.md which is somewhat misleading, as it says:

CUDA GPU with latest CUDA driver installed. GPUtil uses the program nvidia-smi to get the GPU status of all available CUDA GPUs. nvidia-smi should be installed automatically, when you install your CUDA driver.

But according to: https://developer.nvidia.com/nvidia-system-management-interface

NVIDIA-smi ships with NVIDIA GPU display drivers on Linux, and with 64bit Windows Server 2008 R2 and Windows 7.

so I think the correct description needs to replace CUDA to NVIDIA drivers.

The reason I find it's important, is because for example pytorch started shipping its own CUDA libraries and you no longer need to install CUDA-system wide. Yet, the user will still have nvidia-smi if they installed the NVIDIA driver, but not with pytorch. And currently your doc implies that a user must have CUDA installed to have nvidia-smi, which is not so.

I hope my communication was clear.

Thank you.

FileNotFoundError shows up whenever I try to use this package

This is the error it gives me whenever I try to run it:

GPUtil 1.3.0
Traceback (most recent call last):
File "demo_GPUtil.py", line 10, in
GPU.showUtilization()
File "C:\Users\dylan\AppData\Local\Programs\Python\Python36\Lib\site-packages\GPUtil\GPUtil.py", line 193, in showUtilization
GPUs = getGPUs()
File "C:\Users\dylan\AppData\Local\Programs\Python\Python36\Lib\site-packages\GPUtil\GPUtil.py", line 64, in getGPUs
p = Popen(["nvidia-smi","--query-gpu=index,uuid,utilization.gpu,memory.total,memory.used,memory.free,driver_version,name,gpu_serial,display_active,display_mode", "--format=csv,noheader,nounits"], stdout=PIPE)
File "C:\Users\dylan\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Users\dylan\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

ValueError

I have pip installed GPUtil for the first time, upon running it I get an error previously described here:
#2

gpuUtil[g] = float(vals[i])/100 causes ValueError: could not convert string to float: '[Not Supported]'

I see from the issue thread that this should be fixed - has it not made its way into the version I get via pip?

`memoryUsed` is not realtime

code as below

g0 = GPUtil.getGPUs()[0]
g0.memoryUsed  # output 11768.0

but, the nvidia-smi shows

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   33C    P8    16W / 250W |    171MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:82:00.0 Off |                  N/A |
| 23%   31C    P8    10W / 250W |     10MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

memoryUsed is not realtime, however, after I redo g0 = GPUtil.getGPUs()[0], the output changed.

FileNotFoundError

hi,
i pip installed the package and tried running GPUtil.getAvailable() but got the bellow listed massage. any thought?

thank you very much for this package.

GPUtil.getAvailable()
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\dkarl\AppData\Local\conda\conda\envs\dudy_test\lib\site-packages\GPUtil\GPUtil.py", line 123, in getAvailable
GPUs = getGPUs()
File "C:\Users\dkarl\AppData\Local\conda\conda\envs\dudy_test\lib\site-packages\GPUtil\GPUtil.py", line 64, in getGPUs
p = Popen(["nvidia-smi","--query-gpu=index,uuid,utilization.gpu,memory.total,memory.used,memory.free,driver_version,name,gpu_serial,display_active,display_mode", "--format=csv,noheader,nounits"], stdout=PIPE)
File "C:\Users\dkarl\AppData\Local\conda\conda\envs\dudy_test\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Users\dkarl\AppData\Local\conda\conda\envs\dudy_test\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

GPU Temperature

It would be nice if we were able to see the temperature of the GPU as well.

Crashing if nvidia-smi fails

It should not throw exception if the nvidia-smi fails. Instead it should return None or something to tell that none found. May be a print would be enough to tell that nvidia-smi is failing.

Values of '[Not Supported]' are not handled properly.

Values of '[Not Supported]' are not handled properly.

In [1]: import GPUtil

In [2]: g = GPUtil.getGPUs()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-871afb3451f3> in <module>()
----> 1 g = GPUtil.getGPUs()

~\AppData\Local\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\GPUtil\__init__.py in getGPUs()
     80                 deviceIds[g] = int(vals[i])
     81             elif (i == 1):
---> 82                 gpuUtil[g] = float(vals[i])/100
     83             elif (i == 2):
     84                 memTotal[g] = int(vals[i])

ValueError: could not convert string to float: '[Not Supported]'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.