nvidia / nvidia-docker Goto Github PK

View Code? Open in Web Editor NEW

17.2K 449.0 2.0K 19.38 MB

Build and run Docker containers leveraging NVIDIA GPUs

License: Apache License 2.0

nvidia-docker docker cuda gpu

nvidia-docker's Introduction

DEPRECATION NOTICE

This project has been superseded by the NVIDIA Container Toolkit.

The tooling provided by this repository has been deprecated and the repository archived.

The nvidia-docker wrapper is no longer supported, and the NVIDIA Container Toolkit has been extended to allow users to configure Docker to use the NVIDIA Container Runtime.

For further instructions, see the NVIDIA Container Toolkit documentation and specifically the install guide.

Issues and Contributing

Checkout the Contributing document!

For questions, feature requests, or bugs, open an issue against the nvidia-container-toolkit repository.

nvidia-docker's People

Contributors

Stargazers

Watchers

Forkers

injeans sirlatrom nivertech ruo91 outmanzhaohu josephwinston pranavcode foolsh lukeyeager bskaggs marcinofulus jaej-dev ruffsl yeison andrewseidl wafflesnotstacks jeanmn mhahn0106 elezar varunnagpaal mlin jfemiani lukeforehand yuhangwang metalivedev matulis rollingstone cfregly unws jameslinus hsaputra mlufei haniceboy darrengarvey caidongyun endpointcorp nikolayvoronchikhin kur0nek0 collects edsonke nagyistge hongyunnchen brijesh68kumar widdakay shenghuofei virtualization-appliance-marketplace you13 ericeiffel shenquan yanweifu jsmith50500 dataracer11 svalaer curtiszimmerman hitluobin giserh cinchcircuits georghildebrand jeremyeder subodhp jazzboysc bin2000 tomokane silky plinecom testerrandolph wilshire461 brainiarc7 jakirkham matthieudelaro feiyuhug glazari manuels liyong3forever neverlock qzhou003 seong889 harmishhk uoryon geoslegend ibrucekong chun-feng scpeters yuanwr sixsamuraisoldier jsaribeiro fangjq johnjohnsp1 cadelaren acbrewbaker kwccoin magoroku15 rajserc seelam zmoon111 zlester martinbartos dkrotx nmuzyka kostyaev

nvidia-docker's Issues

GPU Docker Plugin

I've been looking at ways to use cuda containers at my workplace, as our lab shares a common Nvidia workstation, and I'd like interact with this server in a more abstract manner so that 1) I can more readily port my robotics work to any nvidia workstation, and 2) minimize the impact of changes effecting others using shared research workstation.

One gap I'm wrestling with is how to incorporate the current NVIDIA Docker wrapper with the rest of the existing docker ecosystem: docker compose, machine, and swarm. The current drop-in replacement for docker run|create CLI is awesome, but it only gets us so far. The moment we need to use any additional tooling for abstracting or scaling up our apps or avoiding the need to interact with the host directly, well its hard to get to that last step.

So I'm thinking this might be a case for making relevant docker plugin, harkening back to a recent post on the Docker blog, Extending Docker with Plugins. That post was perhaps geared more towards networking and storage drivers, but perhaps our issue here could be treated as a custom volume management. I feel the same level of integration of GPU device options may be called for to achieve the desired user experience in cloud development or cluster computing with Nvidia. This'll most likely call for something more demanding than shell scripts to extend the needed interfaces, so I'd like to hear the rest of the community's and Nvidia devs take on this.

Using nvidia-docker from third-party tools

It's very easy to use nvidia-docker when running individual containers, but is there a way to run nvidia-docker instead of docker from other Docker tools like docker-compose, Tutum, Rancher, etc?

I am assuming one would just need to specify the nvidia-docker volume to be mounted in the container, but I couldn't find any documentation on the correct syntax.

Large Image Sizes

So I was probing the current cuda image, and I'm thinking it could stand to shed a little weight:

$ docker images
REPOSITORY                                TAG                    IMAGE ID            CREATED             VIRTUAL SIZE
cudnn                                     v2-7.0                 80a8e0efea8d        21 hours ago        2.041 GB
cuda                                      7.0                    f8abb195d52b        21 hours ago        2.012 GB
ubuntu                                    14.04                  e9ae3c220b23        9 days ago          187.9 MB

A quick check on the just the size of the cuda libraries reviles:

$ docker run -it cuda:7.0 du -sh usr/local/cuda-7.0/
1.5G    usr/local/cuda-7.0/

$ docker run -it cuda:7.0 du -sh usr/local/cuda-7.0/*
4.0K    LICENSE
4.0K    README
43M     bin
170M    doc
5.9M    extras
0       include
0       lib64
180M    libnsight
122M    libnvvp
22M     nvvm
184M    samples
272K    share
592K    src
790M    targets
160K    tools

So we've have about about 324.1 megabytes coming in elsewhere. I'm seeing Xserver and perhaps other unessential deps installed that users could get themselves if need be. So perhaps we could cut a little fat by tweaking call to apt-get install. Also, not sure if the samples could be installed later, but that might be nice thing to toss into a higher tag. Others not deploying cuda containers to cloud instances might still want it.

Where is nvcc inside nvidia/caffe?

I can not find nvcc inside the nvidia/caffe image.

Would you please add it into the image and provide some explanation of how nvidia-docker works? Because I find it little mysterious to understand how nvidia-docker talks to host GPU device.

Besides, I suggest installing caffe with source code. Because examples of caffe and cuda are valuable tools for beginners.

Thank you very much.

Add CentOS images

CentOS is one of the most popular image on the Docker Hub, we should provide CUDA images based on https://hub.docker.com/_/centos/

Suggested by @Kaixhin

nvidia-docker script does't work on daemon mode?

I tried ./nvidia-docker/nvidia-docker run deviceQuery and it works fine:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 970"
  CUDA Driver Version / Runtime Version          7.5 / 7.5
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 4082 MBytes (4279894016 bytes)
  (13) Multiprocessors, (128) CUDA Cores/MP:     1664 CUDA Cores
  GPU Max Clock rate:                            1367 MHz (1.37 GHz)
  Memory Clock rate:                             3505 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 1835008 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 970
Result = PASS

but when I tried to use sudo ./nvidia-docker/nvidia-docker -H 0.0.0.0:2375 -d to create a docker daemon and use docker API to create container and run, it failed and gave this error:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel

I'm upgrading my driver and hitting this error during install.

I've hit it before and found that nvidia-docker was causing the problem. Can you remind me which command I need to run to be able to move ahead with my driver installation?

failure running `sudo make install`

I'm running this on an EC2 instance.

make -C /home/ubuntu/nvidia-docker/tools install
make[1]: Entering directory `/home/ubuntu/nvidia-docker/tools'
flag provided but not defined: --build-arg
See 'docker build --help'.
make[1]: *** [build] Error 2
make[1]: Leaving directory `/home/ubuntu/nvidia-docker/tools'
make: *** [install] Error 2

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Make'd nvidia-docker doesn't parse single-character options with additional parameters correctly

I have just pulled the repository and tried the sudo make install created nvidia-docker launcher. I have been using Nvidia-docker on other computers using the old script launcher.

I just installed Ubuntu on this computer, installed the NVIDIA drivers for my graphics card, CUDA, Docker, Sublime-Text and now this, so I feel confident that this is not a conflict with something else.

At the moment, if I run
nvidia-docker run -v /home/me:/home/me my_image
It tells me: Error parsing reference: "/home/me:/home/me" is not a valid repository/tag
However, if I run
nvidia-docker run --volume /home/me:/home/me my_image
It starts the image correctly.
Additionally:

docker run -v /home/me:/home/me my_image runs fine, and
I get the same error as above when I run the same command without the 'my_image' at the end.

Something similar happens for the '-p' command.

It seems to be parsing single-character options as if they have no additional parameters, and so parses the next thing passed as the image to start, and the rest as commands for the image

developer.download.nvidia.com: Host not found

Hi,

I cloned the project succesfully with:

git clone https://github.com/NVIDIA/nvidia-docker

and I have troubles at the beginning steps of the make install when retrieving some packages from the nvidia repositories:

Step 5 : RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/GPGKEY && apt-key adv --export --no-emit-version -a $NVIDIA_GPGKEY_FPR | tail -n +2 > cudasign.pub && echo "$NVIDIA_GPGKEY_SUM cudasign.pub" | sha256sum -c --strict - && rm cudasign.pub && echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64 /" > /etc/apt/sources.list.d/cuda.list ---> Running in 4f8f2eb021fe Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --homedir /tmp/tmp.McNrxDbdy4 --no-auto-check-trustdb --trust-model always --primary-keyring /etc/apt/trusted.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-jessie-automatic.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-jessie-security-automatic.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-jessie-stable.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-squeeze-automatic.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-squeeze-stable.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-wheezy-automatic.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-wheezy-stable.gpg --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/GPGKEY ?: developer.download.nvidia.com: Host not found gpgkeys: http fetch error 7: couldn't connect: Connection timed out gpg: no valid OpenPGP data found. gpg: Total number processed: 0 gpg: keyserver internal error gpg: WARNING: unable to fetch URI http://developer.download.nvidia.com/compute/cuda/repos/GPGKEY: keyserver error The command '/bin/sh -c apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/GPGKEY && apt-key adv --export --no-emit-version -a $NVIDIA_GPGKEY_FPR | tail -n +2 > cudasign.pub && echo "$NVIDIA_GPGKEY_SUM cudasign.pub" | sha256sum -c --strict - && rm cudasign.pub && echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64 /" > /etc/apt/sources.list.d/cuda.list' returned a non-zero code: 2 make[1]: *** [build] Error 1 make[1]: Leaving directory '/home/user/nvidia-docker/tools' make: *** [tools] Error 2

I am quite new using docker but I found similar problems compiling Dockerfiles, where apt-get udpate failed when retrieving other online package repositories too. I guess this could be related with this problem.

Provide Debian packages

Having packages would ease the installation process.
Let's start with Debian first.

install issue

Running sudo make install gives me:

Removing intermediate container b9ace43118ae
Step 14 : RUN useradd --uid $UID build
 ---> Running in 1d2bc6d18b7e
useradd: UID 0 is not unique
The command '/bin/sh -c useradd --uid $UID build' returned a non-zero code: 4
make[1]: *** [/home/ubuntu/git/nvidia-docker/tools/bin] Error 1
make[1]: Leaving directory `/home/ubuntu/git/nvidia-docker/tools'
make: *** [install] Error 2

Newby question to CUDA container and ssh

Hello,
I have a machine with a proper CUDA and Docker installation. When I start an interactive container and for example do an nvidia-sim -l everything looks fine. However when I add an ssh server that in the future other users can also use CUDA (without knowing about Docker) the same container fails when I do an nvidia-sim, although the binary is there.
I read about the nvidia-docker-plugin, but I think I need something like a step by step instruction on how to use it.
Regards,
Stefan

Add README image

cuDNN not loaded when using tensorflow with the runtime image

Hi, thank you for this great tool! I have been using the devel images without major problems in containers along with Theano/Tensorflow but the runtime images seem to miss libcudnn.so.

I searched in /usr/local/nvidia but I was not able to find it. Is it missing, located somewhere else or not included?

Only works once

I'm not very familiar with docker volume, but it appears to be ONLY good for one use.

sudo ./nvidia-docker volume setup
```
nvidia_driver_352.55
```

docker volume ls

DRIVER              VOLUME NAME
local               nvidia_driver_352.55

./nvidia-docker run --rm nvidia/cuda nvidia-smi

+------------------------------------------------------+                       
| NVIDIA-SMI 352.55     Driver Version: 352.55         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 680     Off  | 0000:01:00.0     N/A |                  N/A |
| 34%   54C    P8    N/A /  N/A |    653MiB /  4093MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 580     Off  | 0000:02:00.0     N/A |                  N/A |
| 46%   54C   P12    N/A /  N/A |      7MiB /  3071MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20c          Off  | 0000:03:00.0     Off |                  Off |
| 37%   49C    P0    48W / 225W |     96MiB /  5119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
|    1                  Not Supported                                         |
+-----------------------------------------------------------------------------+

./nvidia-docker run --rm nvidia/cuda nvidia-smi

Error response from daemon: Error looking up volume plugin nvidia-docker: Plugin not found

docker volume ls
```
DRIVER              VOLUME NAME
```

Tested on Ubuntu 14.04 running Docker 1.91 and Centos 7 running docker 1.9.0

It just seems like if sudo nvidia-docker volume setup is in the "Initial setup" section, than it shouldn't need to be run every time I create a new container, or am I missing something?

Build fails behind a HTTP proxy

OS and docker version below:

toships@server-05:~/Documents/nvidia-docker$ uname -a
Linux server-05 3.16.0-52-generic #71~14.04.1-Ubuntu SMP Fri Oct 23 17:24:53 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
toships@server-05:~/Documents/nvidia-docker$ docker -v
Docker version 1.9.1, build a34a1d5
toships@server-05:~/Documents/nvidia-docker$

Error on install below :

toships@server-05:~/Documents/nvidia-docker$ sudo make install
make -C /home/toships/Documents/nvidia-docker/tools install
make[1]: Entering directory '/home/toships/Documents/nvidia-docker/tools'
Sending build context to Docker daemon 96.77 kB
Step 1 : FROM golang
---> cd6e9b146853
Step 2 : ENV NVIDIA_GPGKEY_SUM bd841d59a27a406e513db7d405550894188a4c1cd96bf8aa4f82f1b39e0b5c1c
---> Using cache
---> ab2714b30180
Step 3 : ENV NVIDIA_GPGKEY_FPR 889bee522da690103c4b085ed88c3d385c37d3be
---> Using cache
---> 1de66c070c4d
Step 4 : ENV NVIDIA_GDK_SUM 1e32e58f69fe29ee67b845233e7aa9347f37994463252bccbc8bfc8a7104ab5a
---> Using cache
---> 697201ceaa3d
Step 5 : RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/GPGKEY && apt-key adv --export --no-emit-version -a $NVIDIA_GPGKEY_FPR | tail -n +2 > cudasign.pub && echo "$NVIDIA_GPGKEY_SUM cudasign.pub" | sha256sum -c --strict - && rm cudasign.pub && echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64 /" > /etc/apt/sources.list.d/cuda.list
---> Using cache
---> cabc7dd1abe4
Step 6 : RUN apt-get update && apt-get install -y --no-install-recommends --force-yes cuda-cudart-dev-6-5=6.5-19 cuda-misc-headers-6-5=6.5-19 && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 784c8f082087
Step 7 : RUN objcopy --redefine-sym memcpy=memcpy@GLIBC_2.2.5 /usr/local/cuda-6.5/lib64/libcudart_static.a
---> Using cache
---> 717306ff1350
Step 8 : RUN wget -O gdk.run -q http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_352_39_gdk_linux.run && echo "$NVIDIA_GDK_SUM gdk.run" | sha256sum -c --strict - && chmod +x gdk.run && ./gdk.run --silent && rm gdk.run
---> Using cache
---> 3535f3a748da
Step 9 : COPY src /go/src
---> Using cache
---> fff8b44018c0
Step 10 : VOLUME /go/bin
---> Using cache
---> f879f9376c93
Step 11 : ENV CGO_CFLAGS "-I /usr/local/cuda-6.5/include -I /usr/include/nvidia/gdk"
---> Using cache
---> 1c8fce5a08a4
Step 12 : ENV CGO_LDFLAGS "-L /usr/local/cuda-6.5/lib64 -L /usr/src/gdk/nvml/lib -ldl -lrt"
---> Using cache
---> 8bdd291703ac
Step 13 : ARG UID
---> Using cache
---> 4f6319b1aa77
Step 14 : RUN useradd --non-unique --uid $UID build
---> Using cache
---> 6a24e18b8818
Step 15 : USER build
---> Using cache
---> 9d6085d78502
Step 16 : CMD go get -v -ldflags="-s" nvidia-docker nvidia-docker-plugin
---> Using cache
---> 4c0f4b8e2793
Successfully built 4c0f4b8e2793
Fetching https://golang.org/x/crypto/ssh?go-get=1
https fetch failed.
import "golang.org/x/crypto/ssh": https fetch: Get https://golang.org/x/crypto/ssh?go-get=1: dial tcp 216.58.192.17:443: i/o timeout
package golang.org/x/crypto/ssh: unrecognized import path "golang.org/x/crypto/ssh"
Fetching https://golang.org/x/crypto/ssh/agent?go-get=1
https fetch failed.
import "golang.org/x/crypto/ssh/agent": https fetch: Get https://golang.org/x/crypto/ssh/agent?go-get=1: dial tcp 216.58.192.49:443: i/o timeout
package golang.org/x/crypto/ssh/agent: unrecognized import path "golang.org/x/crypto/ssh/agent"
Fetching https://golang.org/x/crypto/ssh/terminal?go-get=1
https fetch failed.
import "golang.org/x/crypto/ssh/terminal": https fetch: Get https://golang.org/x/crypto/ssh/terminal?go-get=1: dial tcp 216.58.192.49:443: i/o timeout
package golang.org/x/crypto/ssh/terminal: unrecognized import path "golang.org/x/crypto/ssh/terminal"
github.com/justinas/alice (download)
# cd .; git clone https://github.com/justinas/alice /go/src/github.com/justinas/alice
Cloning into '/go/src/github.com/justinas/alice'...
fatal: unable to access 'https://github.com/justinas/alice/': Failed to connect to github.com port 443: Connection timed out
package github.com/justinas/alice: exit status 128
Fetching https://gopkg.in/tylerb/graceful.v1?go-get=1
https fetch failed.
import "gopkg.in/tylerb/graceful.v1": https fetch: Get https://gopkg.in/tylerb/graceful.v1?go-get=1: dial tcp 107.178.216.236:443: i/o timeout
package gopkg.in/tylerb/graceful.v1: unrecognized import path "gopkg.in/tylerb/graceful.v1"
make[1]: *** [build] Error 1
make[1]: Leaving directory '/home/toships/Documents/nvidia-docker/tools'
make: *** [install] Error 2

I checked "https://golang.org/x/crypto/ssh?go-get=1" and I get a "nothing to see here" message. Is this an issue or am I doing something wrong ? Please help.

Build fails on RH 6.5

I am going to preface this with I am VERY new to docker...

Currently using RH 6.5

Problem:

I went through the process of installing docker and confirmed it is working via their hello-world docker instance. Additionally, I cloned the nvidia-docker repo but when I went to build and install via the "sudo make install" command I got the following error:

make -C /home/user_name/Desktop/nvidia-docker/tools install
make[1]: Entering directory /home/user_name/Desktop/nvidia-docker/tools' flag provided but not defined: --build-arg See 'docker build --help'. make[1]: *** [/home/user_name/Desktop/nvidia-docker/tools/bin] Error 2 make[1]: Leaving directory/home/user_name/Desktop/nvidia-docker/tools'
make: *** [install] Error 2

The "docker build --help" command does not show any reference to --build_arg but if I remove the argument the build will initiate but fail eventually fail indicating the following:

Step 12 : ARG
Unknown instruction: ARG
make[1]: *** [/home/user_name/Desktop/nvidia-docker/tools/bin] Error 1
make[1]: Leaving directory `/home/user_name/Desktop/nvidia-docker/tools'
make: *** [install] Error 2

Any advice or help would be greatly appreciated!

-Thanks

nvidia-docker appears to pass arguments through incorrectly.

I run the following code and get the resulting output.

nvidia-docker run -e UFORA_WORKER_OWN_ADDRESS=10.197.144.141 --workdir=/volumes/ufora -e AWS_ACCESS_KEY_ID=XXXXXXX -e AWS_SECRET_ACCESS_KEY=XXXXX -p 30000asdf:30000asdf -p 30002:30002 -p 30009:30009 -p 30010:30010 --privileged=true -v /mnt/ufora:/var/ufora -v /home/ubuntu/build:/volumes -e PYTHONPATH=/volumes/ufora -e ROOT_DATA_DIR=/mnt/ufora ufora/build:29df74040b988c35cdb94bd8a934958d pwd
Pulling repository docker.io/library/30000asdf
nvidia-docker | 2016/02/20 22:00:17 Error: image library/30000asdf not found

the "asdf" is there because I was trying to figure out why it's asking for docker.io/library/30000 and it must be munging the arguments incorrectly in the script somewhere.

I installed nvidia-docker from the binaries. perhaps this is fixed in the latest source and the binaries are out of date?

OpenCL support

It seems, that the symlink to /usr/local/cuda/include/CL is missing, once it is manually set up, OpenCL examples build properly.

Equally, /usr/local/cuda/lib64 must be added to /etc/ld.so.conf (or a corresponding file), to ensure that libOpenCL* can be found

I guess, it might be a good idea to try to build the CUDA/OpenCL examples as part of the update/release process ?

Thanks

[SOLVED] How to run nvidia-docker with TensorFlow GPU docker

Thanks for releasing the nvidia-docker repo, this is a really great idea and very useful!

What I've Done

I have setup an equivalent of a Nvidia DIGITS machine (running Ubuntu 14.04 server), and am attempting to run everything in docker containers.

I have docker installed, and have run nvidia-docker run nvidia/cuda nvidia-smi described here, and I see my 4 TitanX graphic cards.
I have also run the nvidia-docker-plugin described here as sudo -u nvidia-docker nvidia-docker-plugin -s /var/lib/nvidia-docker and I get the output:

nvidia-docker-plugin | 2016/02/04 12:54:02 Loading NVIDIA management library
nvidia-docker-plugin | 2016/02/04 12:54:04 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/02/04 12:54:04 Discovering GPU devices
nvidia-docker-plugin | 2016/02/04 12:54:05 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2016/02/04 12:54:05 Serving plugin API at /var/lib/nvidia-docker
nvidia-docker-plugin | 2016/02/04 12:54:05 Serving remote API at localhost:3476

which signifies to me that it's working.

I ran the tests here and they all passed.

My Problem

When I try to run the TensorFlow GPU docker image using nvidia-docker

I first run sudo -u nvidia-docker nvidia-docker-plugin -s /var/lib/nvidia-docker in a tmux session.

Then I run nvidia-docker run -it -p 8888:8888 b.gcr.io/tensorflow/tensorflow-devel-gpu it downloads everything and runs the docker container. Next I run ipython and try to import tensorflow but I get the following errors:

In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:92] LD_LIBRARY_PATH: /usr/local/cuda/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:121] hostname: 16b84b6e71f9
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:146] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:257] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.79  Wed Jan 13 16:17:53 PST 2016
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:150] kernel reported version is: 352.79
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1054] LD_LIBRARY_PATH: /usr/local/cuda/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1055] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so; dlerror: libcuda.so: cannot open shared object file: No such file or directory

**I think I just have a lack in understanding about how I should run the TensorFlow container, or maybe I have to build the container using nvidia-docker.

Any ideas about how to do this, or general advice about what I'm doing wrong would be amazing. **

Thanks so much.

Brad

cuda toolkit version difference between OS and docker

Can I install different version of the cuda toolkit from the OS? Currently I have 7.5 on my OS and would like to install 7.0 on docker for tensorflow and other usage.

nvidia-docker | 2016/01/21 20:42:37 Error: link /usr/bin/nvidia-cuda-mps-control /home/docker/volumes/nvidia_driver_352.68/_data/bin/nvidia-cuda-mps-control: invalid cross-device link

Hi folks,

Been using nvidia-docker and noticed a new upgrade. I'm using a CentOS 7 box and I followed the instructions as follows:

542 sudo make install
543 sudo nvidia-docker volume setup

When I run volume setup, I'm met with this error:

nvidia-docker | 2016/01/21 20:42:37 Error: link /usr/bin/nvidia-cuda-mps-control /home/docker/volumes/nvidia_driver_352.68/_data/bin/nvidia-cuda-mps-control: invalid cross-device link

Any ideas?

Is there support for older versions of CUDA? Something older than 7.0?

Secure and speedy builds

Looks like we are still using wget to fetch the gpg key 97ccc7e. Using a check sum is much better than before, but this still requires two apt-get update calls for every build to install the wget prerequisite. I've suggested using gpg's key server function along with a high-availibility subset of a public keyserver pool in ruffsl@2000a09 to circumvent this issue like in docker's docs, but no takers thus far. This is Nvidia's key correct? Is Nvida planning on running an independent key server?

environmental variables not passed through

Compare:

$ THEANO_FLAGS='lib.cnmem=1' docker run -e THEANO_FLAGS -it master:5000/localrunner bash
root@fc14c3530051:~# echo $THEANO_FLAGS
lib.cnmem=1

with:

$ THEANO_FLAGS='lib.cnmem=1' nvidia-docker run -e THEANO_FLAGS -it master:5000/localrunner bash
root@394481a065ab:~# echo $THEANO_FLAGS

nvidia-docker does not pull missing images

$ nvidia-docker run 4catalyzer/theano nvidia-smi
nvidia-docker | 2016/01/11 13:07:47 Error: No such image or container: 4catalyzer/theano

whereas for regular docker:

$ docker run 4catalyzer/theano nvidia-smi
Unable to find image '4catalyzer/theano:latest' locally
latest: Pulling from 4catalyzer/theano
...

CuDNN library unpacked to wrong path

I noticed that the CuDNN libraries are put in the root directory although they should actually be under /usr/local/cuda/lib64. I tried to boil it down and the issue seems to be at the untar command:

tar -xzf cudnn-7.0-linux-x64-v3.0-prod.tgz --wildcards 'cuda/lib64/libcudnn.so*' -C /usr/local

I tried, but the files are untarred into the root, i.e. /cuda/lib64/... One workaround would be

cd /usr/local && \
tar -xzf /cudnn-7.0-linux-x64-v3.0-prod.tgz --wildcards 'cuda/lib64/libcudnn.so*'

PS: Btw, a big thanks for making CUDA + CuDNN available on Docker Hub! That really was the missing piece to make my toolchain available on Docker Hub.

Volume setup fails across device boundaries

When your Docker graph is not mounted on the same device as /usr/bin etc., you see these sorts of errors during nvidia-docker volume setup:

Error: link /usr/bin/nvidia-cuda-mps-control /mnt/docker/volumes/nvidia_driver_352.79/_data/bin/nvidia-cuda-mps-control: invalid cross-device link

This is caused by hard-linking the binaries here:

nvidia-docker/tools/src/nvidia/volumes.go

Line 179 in e7b7922

if err := os.Link(f, l); err != nil {

Release file has an invalid format

I have used the *.deb file from https://developer.nvidia.com/cuda-downloads and got the following error:

The message is in Japanese, but actually it say that the Release file which is downloaded from your repo server has an invalid format (it doesn't have the Date entry).
I had a look on the file http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1504/x86_64/Release

Origin: NVIDIA
Label: NVIDIA CUDA
Architecture: x86_64
MD5Sum:
 53fb7789d89de57b8b052e5b28254866           150643 Packages
 d4d3a64356184ac129fec95a6244e548            31840 Packages.gz

Compared to a normal Release file in http://ppa.launchpad.net/webupd8team/java/ubuntu/dists/xenial/Release

Origin: LP-PPA-webupd8team-java
Label: Oracle Java (JDK) 7 / 8 / 9 Installer PPA
Suite: xenial
Version: 16.04
Codename: xenial
Date: Thu, 10 Dec 2015 11:34:49 UTC
Architectures: amd64 arm64 armhf i386 powerpc ppc64el s390x
Components: main
Description: Ubuntu Xenial 16.04
MD5Sum:
 c5317774ad35c439d0dc00f34aafafbe              148 main/binary-amd64/Release
 d2ad9cb36753a85b34709643dfa292b2             3092 main/binary-amd64/Packages.gz
 28e0cfd043c5f191555eba324b6ee4e3            19522 main/binary-amd64/Packages
 34efda5be4bc4cc9356281fefd1df8b3             3376 main/binary-amd64/Packages.bz2
 b3303d9091b8b3837e1c19b2b7e2ea77              148 main/binary-arm64/Release
...

My environment is as follows:

OS: Ubuntu 16.04 Review
GPU: GeForce GTX 560 Ti

Error: nvidia-docker volumn setup

$ sudo nvidia-docker volumn setup
nvidia-docker: symbol lookup error: nvidia-docker: undefined symbol: nvmlInit_v2

nvidia-docker should be more obvious about no-op pass-through

Nvidia-docker wraps three different docker commands, and passes the rest through. It would have saved me a bit of time to know that nvidia-docker was not mounting the GPU device and doing a silent no-op.

Based on

nvidia-docker/tools/src/nvidia-docker/main.go

Line 52 in 302467b

if command != "create" && command != "run" && command != "volume" {

(the main branch as of this writing), the three commands that nvidia-docker intercepts are create / run / volume.

One way that this had bitten me was using a Dockerfile (which is its own issue).

I was trying to build a container image via nvidia-docker build ., where docker build . is a common pattern (afaict), and I had assertions during my build that would run tests that tested app functionality on the GPU.

Compare running nvidia-docker run nvidia/cuda:7.5-devel nvidia-smi versus starting a Dockerfile with
FROM nvidia/cuda:7.5-devel RUN nvidia-smi ...

There are two workarounds about the Dockerfile issue: 1) rewrite the Dockerfile as a shell script, and pass it to run; 2) run the image build via straight docker, and run tests thereafter.

Support of Dockerfiles is useful and is orthogonal to transparency about a nvidia-docker no-op.

Thanks very much :)

flag provided but not defined: --build-arg

$ make install
make -C /home/zichaoy/nvidia-docker/tools install
make[1]: Entering directory `/home/zichaoy/nvidia-docker/tools'
flag provided but not defined: --build-arg
See 'docker build --help'.
make[1]: *** [build] Error 2
make[1]: Leaving directory `/home/zichaoy/nvidia-docker/tools'
make: *** [install] Error 2

I am using

$ lsb_release -a
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:    CentOS release 6.7 (Final)
Release:    6.7
Codename:   Final

$ docker --version
Docker version 1.7.1, build 786b29d

"CUDA driver version is insufficient" despite sufficient drivers

I am receiving the following error upon running deviceQuery in nvidia/cuda:7.0-cudnn3-devel:

./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

However, I have run containers with CUDA7 on the same system before. The host system is on 346.46, which should be sufficient. The container was started with

docker run --device /dev/nvidia-uvm:/dev/nvidia/uvm --device /dev/nvidia0:/dev/nvidia0 --device  \
/dev/nvidia1:/dev/nvidia1 --device /dev/nvidia2:/dev/nvidia2 --device /dev/nvidia3:/dev/nvidia3 --device \
/dev/nvidiactl:/dev/nvidiactl -it nvidia/cuda:7.0-cudnn3-devel bash

Any idea why that happens or what I should check? A big thanks in advance!

OpenGL Support

As mentioned by @3XX0 in #7 , OpenGL is not supported with the current framework, yet there is an ongoing discussion internally in Nvidia to enable it. This issues serves as a means for users to track progress or propose recommendations. May want to ask @jfrazelle for some ideas, I know she's experimented a good deal in OpenGL support for docker containers.

Tagged image for cuDNN

I know accessing releases of the cuDNN GPU Accelerated Deep Learning currently requires membership to the Accelerated Computing Developer Program, but would it be possible to have a tag of the cuda image with cuDNN included? Or would this be done under the DIGITS image you where talking of doing, @flx42 ?

A Container with Centos 6.7

Would you guys happen to have one for Centos 6.7? That's what I have running on enterprise right now and would love to run a few ML docker containers.

nvidia-docker from a windows host

Is it currently possible to attach nvidia devices from a windows host to a virtualbox host running the nvidia-docker images?

Installation failing due to no write permissions on nvidia-docker/

Sorry, finding a separate issue from my other one (#49)...

Starting from a clean repo, I'm finding that sudo make install fails because root can't write to nvidia-docker/tools/. Specifically:

$ cd ~/src
$ git clone https://github.com/NVIDIA/nvidia-docker
$ cd nvidia-docker/
$ sudo make install
...
Successfully built 3ce152a99c3e
mkdir: cannot create directory ‘<home>/src/nvidia-docker/tools/bin’: Permission denied
make[1]: *** [build] Error 1
make[1]: Leaving directory `<home>/src/nvidia-docker/tools'
make: *** [install] Error 2

I was able to fix this by granting write permissions to all users for nvidia-docker/, recursively. The original permissions on the folder were rwxr-xr-x <myself> <myprimarygroup>.

Should I do something different to make this work out of the box?

Making the nvidia/cuda automated repo

Thanks for making an cuda docker repo! One suggestion I'd make would be to make the nvidia/cuda docker hub repo to and automated repo. This repo could be useful as a testbed to test any future tags before official submission, but making it automated could really save time on maintenance in keeping the images up to date with the Dockerfiles. That's how we use the use them at osrf/ros. A neat thing also is use the git-repo's README.md to rendder the discription in the docker-repo, see Understand the build process.

Issues with Optimus

I've encountered the following problem on my laptop with Optimus. This first launch of the device-query container after system reboot is successful, but all subsequent launches until the next reboot produce the following error:

[ NVIDIA ] =INFO= Driver version: 352.63
[ NVIDIA ] =INFO= CUDA image version: 7.5
[ NVIDIA ] =WARN= Could not find library: nvcuvid
[ NVIDIA ] =WARN= Could not find library: nvidia-encode
[ NVIDIA ] =WARN= Could not find binary: nvidia-cuda-mps-control
[ NVIDIA ] =WARN= Could not find binary: nvidia-cuda-mps-server

WARNING: Your kernel does not support memory swappiness capabilities, memory swappiness discarded.
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.2.0-1-amd64/modules.dep.bin'
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

Here is the command that I am using to launch the container:

GPU=0 PATH=$PATH:'/sbin' optirun ./nvidia-docker run --rm=true device_query

I am using Debian Sid with 4.2.0-1 kernel. Output of uname -a is

Linux kaliburn 4.2.0-1-amd64 #1 SMP Debian 4.2.6-1 (2015-11-10) x86_64 GNU/Linux

The Optimus is managed by Bumblebee, version 3.2.1-10.

nvidia-docker-plugin exits with "Error: Not supported" after GPU detection.

I am trying to run the nvidia-docker-plugin locally, but get the following error:

$> sudo -u nvidia-docker nvidia-docker-plugin -s /var/lib/nvidia-docker
nvidia-docker-plugin | 2016/01/26 09:57:13 Loading NVIDIA management library
nvidia-docker-plugin | 2016/01/26 09:57:13 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/01/26 09:57:13 Discovering GPU devices
nvidia-docker-plugin | 2016/01/26 09:57:13 Error: Not Supported

(this is after following the other steps described in the wiki).

The output from nvidia-smi is as follows:

$> nvidia-smi
Tue Jan 26 09:56:53 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2100M       Off  | 0000:01:00.0     Off |                  N/A |
| N/A   55C    P0    N/A /  N/A |    194MiB /  2047MiB |     13%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1084    G   /usr/bin/X                                     104MiB |
|    0      2054    G   /usr/bin/gnome-shell                            74MiB |
|    0      3097    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd     5MiB |
+-----------------------------------------------------------------------------+

Devices Flag not Set if No Volumes Needed

Right now GenerateLocalArgs does not get called if no volumes are needed which means the devices flag is not set.

This causes issues for images that expect the CUDA devices to exist. (see for example Tensorflow).

This is different from the original version which did set the flag.

Installation tries to write to /go/bin

I'm getting the following error when I run sudo make install:

...
go install nvidia-docker-plugin: open /go/bin/nvidia-docker-plugin: permission denied
go install nvidia-docker: open /go/bin/nvidia-docker: permission denied
make[1]: *** [build] Error 1
make[1]: Leaving directory `<path to nvidia-docker>/nvidia-docker/tools'
make: *** [install] Error 2

It seems like it's trying to write to a directory that doesn't exist...

I eventually fixed the issue by editing the Makefile in tools/, where I removed :/go/bin at line 27 (as of hash e7b7922).

Am I doing something wrong? Am I the only person who experienced this?

Running on Ubuntu 14.04.

Mesos REST API

There is REST API for mesos slave nodes ( /v1.0/mesos/cli). Is there any howto about using this api with mesos? I am using a mesos cluster for cuda based ML trainings and developed some tricks to select GPU for dockerized training processes. Investigation of /v1.0/mesos/cli showed that the API provides configuration parameters for mesos slaves (attributes and resources) but it is not clear how it can be used with mesos frameworks (marathon, singularity) and nvidia-docker. I'd be grateful if someone could provide additional information about using nvidia-docker-plugin REST API with a mesos ecosystem.

nvidia-docker not working correctly on machines where sudo is required to run docker

On machines where sudo permissions are required for docker (see for exampel the discussion on/etc/sudoers from http://www.projectatomic.io/blog/2015/08/why-we-dont-let-non-root-users-run-docker-in-centos-fedora-or-rhel/), the nvidia-docker script does not seem to work as expected.

I have tried the following:

Added alias docker='sudo /usr/bin/docker' to ~/.bashrc
Added alias docker='sudo /usr/bin/docker' to ~/.profile

But the alias not correctly propagated to nvidia-docker.

Adding alias docker='sudo /usr/bin/docker' to the top of the nvidia-docker script directly allows for the script to work correctly in this setting.

Any pointers on how this could be addressed locally?

Official Library Image in Docker Hub

It'd be great to get started on the review process to get an official cuda image up in Docker Hub for researchers and developers to build off from. If this is inline with what the maintainers here have in mind, I'd suggest forks of the docker-library/official-images and docker-library/docs be made on the Nvidia's github org so that further PRs can be made.

Truth be told, I'd like to see some official image of a few other projects, like Tensorflow, however official images can only derive from other official images, and I think it makes sense that cuda warrants it's own official image, rather that roping in these dependencies in other project Dockerfiles elsewhere.

Tagged image for DIGITS

As mentioned by @flx42 in #7 , an image for DIGITS should be in the works at Nvidia. This issues merely serves to track it's progress or propose recommendations.

I just got around to watching some of the footage from GTC 2015 expo floor. I'm looking forward to trying out framework once the image comes around.

libcuda.so can not be found in /usr/local/cuda/lib64 when building mxnet in nvidia/cuda docker

I am trying to make a Dockerfile that compiles mxnet using nvidia-docker, based on the nvidia/cuda image. mxnet uses variable

USE_CUDA_PATH

in its make script to set the location of the cuda driver. It seems to ignore LD_LIBRARY_PATH
Usually you would set this to be /usr/local/cuda/lib64/ the libcudnn.so library can indeed be found there, for example.

In the nvidia/cuda Docker image however there is no libcuda.so in /usr/local/cuda/lib64, instead it seems to be located in /usr/local/nvidia/lib64/

Funnily enough, when I ln -s this libcuda.so.1 to /usr/local/cuda/lib64 it does build from within nvidia-docker run nvidia/cuda, but gives me a -lcuda not found error when performing the same command in "nvidia-docker build ..."

Is there a way to get libcuda.so in the /usr/local/cuda/lib64 directory during the nvidia-docker build?

nvidia-docker does not work with complex commands

When running the following command:

docker run --rm -ti nvidia/cuda:7.5-cudnn4-devel /bin/bash -ci 'groupadd -f -g 1000 dummy && useradd -u 1000 -g dummy dummy && mkdir --parent /home/dummy && chown -R dummy:dummy /home/dummy && sudo -u dummy HOME=/home/dummy bash'

The container starts as expected, and the prompt is shown as:

`````` dummy@e7ffa1ea404f:/$```

When I use the nvidia-docker wrapper, however:


GPU=0 nvidia-docker run --rm -ti nvidia/cuda:7.5-cudnn4-devel /bin/bash -ci 'groupadd -f -g 1000 dummy && useradd -u 1000 -g dummy dummy && mkdir --parent /home/dummy && chown -R dummy:dummy /home/dummy && sudo -u dummy HOME=/home/dummy bash'

```
the following output is shown:
```

[ NVIDIA ] =INFO= Driver version: 352.63
[ NVIDIA ] =INFO= CUDA image version: 7.5

Usage: groupadd [options] GROUP

Options:
  -f, --force                   exit successfully if the group already exists,
                                and cancel -g if the GID is already used
  -g, --gid GID                 use GID for the new group
  -h, --help                    display this help message and exit
  -K, --key KEY=VALUE           override /etc/login.defs defaults
  -o, --non-unique              allow to create groups with duplicate
                                (non-unique) GID
  -p, --password PASSWORD       use this encrypted password for the new group
  -r, --system                  create a system account
  -R, --root CHROOT_DIR         directory to chroot into

and the container is not started.

Changing the last line of the nvidia-docker wrapper to:
$DOCKER $DOCKER_ARGS $NV_DOCKER_ARGS "$@"
seems to address this.