Giter Club home page Giter Club logo

coreos-nvidia's Introduction

Container Linux (aka CoreOS) NVIDIA Driver Build Status

Yet another NVIDIA driver container for Container Linux (aka CoreOS).

Many different solutions to load the NVIDIA modules in a CoreOS kernel has been created during the last years, this is just another one trying to fit the source{d} requirements:

  • Load the NVIDIA modules in the kernel of the host.
  • Make available the NVIDIA libraries and binaries to other containers.
  • Works with unmodified third-party containers.
  • Avoid permanent changes on the host system.

Contents

How it works

Executing the srcd/coreos-nvidia for your CoreOS version the nvidia modules are loaded in the kernel and the devices are created in the rootfs.

source /etc/os-release
docker run --rm --privileged --volume /:/rootfs/ srcd/coreos-nvidia:${VERSION}

You can test the execution running the next command:

docker run --rm $(for d in /dev/nvidia*; do echo -n "--device $d "; done) \
    srcd/coreos-nvidia:${VERSION} nvidia-smi -L

// Outputs:
// GPU 0: Tesla K80 (UUID: GPU-d57ec7e8-ab97-8612-54ac-9d53a183f818)

Installation

The installation is done using a systemd unit, this unit has two goals:

  • Load the modules in the kernel in every startup, unload it if the service is stopped.
  • Keep running a docker container called nvidia-driver to allow other images access to the libraries and binaries from the NVIDIA driver, using the --volumes-from.

Create the following systemd unit at /etc/systemd/system/coreos-nvidia.service:

[Unit]
Description=NVIDIA driver
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=20m
EnvironmentFile=/etc/os-release
ExecStartPre=-/usr/bin/docker rm nvidia-driver
ExecStartPre=/usr/bin/docker run --rm --privileged --volume /:/rootfs/ srcd/coreos-nvidia:${VERSION}
ExecStart=/usr/bin/docker run --rm --name nvidia-driver srcd/coreos-nvidia:${VERSION} sleep infinity
ExecStop=/usr/bin/docker stop nvidia-driver
ExecStop=-/sbin/rmmod nvidia_uvm nvidia

[Install]
WantedBy=multi-user.target

And now just enable and start the unit:

sudo systemctl enable /etc/systemd/system/coreos-nvidia.service
sudo systemctl start coreos-nvidia.service

After start the service we should see the modules loaded in the kernel:

lsmod | grep -i nvidia
nvidia_uvm            679936  0
nvidia              12980224  1 nvidia_uvm

And the nvidia-driver container running:

docker ps | grep -i nvidia-driver
8cea48f9d556   srcd/coreos-nvidia:1465.7.0   "sleep infinity"   11 hours ago   nvidia-driver

Usage

To easily use the NVIDIA driver in other standard containers, we use the --volumes-from, this requires to run a container based on our image, the /dev/nvidia* devices and a setting the $PATH and $LD_LIBRARY_PATH variables to make it work properly.

A simple example running nvidia-smi in a bare fedora container:

docker run --rm -it \
    --volumes-from nvidia-driver \
    --env PATH=$PATH:/opt/nvidia/bin/ \
    --env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/lib \
    $(for d in /dev/nvidia*; do echo -n "--device $d "; done) \
    fedora:26 nvidia-smi

Running the tensorflow GPU enabled container and verifying the identified devices:

docker run --rm -it \
    --volumes-from nvidia-driver \
    --env PATH=$PATH:/opt/nvidia/bin/ \
    --env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/lib \
    $(for d in /dev/nvidia*; do echo -n "--device $d "; done) \
    gcr.io/tensorflow/tensorflow:latest-gpu \
        python -c "import tensorflow as tf;tf.Session(config=tf.ConfigProto(log_device_placement=True))"

Available Images

Eventually an image for all the Container Linux version for all the release channels should be available, to ensure this, a Travis cron is executed everyday that checks if a new Container Linux versions exists, if exists a new image will be created.

The list of images is available at: https://hub.docker.com/r/srcd/coreos-nvidia/tags/.

What I can do if I can't find an image for my version?

If your version was released today, you must to wait until the nightly cron. If wasn't released today and was after 11/Oct/2016, open an issue, something has failed. If you image is older than this you must to build the image from the Dockerfile.

Custom images

The builds of the Docker image are managed by a Makefile.

To build a image fot the latest stable version of Linux Container and the latest version of the NVIDIA driver just execute:

make build

The configuration is done through environment variables, for example if you want to build the image for the latest alpha version you can execute:

COREOS_RELEASE_CHANNEL=alpha make build

Variables:

  • COREOS_RELEASE_CHANNEL: Linux Container release channel: stable, beta or alpha. By default stable
  • COREOS_VERSION: Linux Container version, if empty the last available version for the given release channel will be used. The version is retrieved making a request to the release feed.
  • NVIDIA_DRIVER_VERSION: NVIDIA Driver version, if empty the last available version will be used. The version is retrieve from https://github.com/aaronp24/nvidia-versions/.
  • KERNEL_VERSION: Kernel version used in the given COREOS_VERSION, if empty is retrieve from the CoreOS release feed.

License

GPLv3, see LICENSE

coreos-nvidia's People

Contributors

jacobstr avatar mcuadros avatar smeruelo avatar trevex avatar vmarkovtsev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

coreos-nvidia's Issues

Installation fails with: Syntax error: "if" unexpected

Hi! I am trying to install the nvidia drivers on CoreOS using your container. However when running the container I get the following error:

core@marco-gpu1 ~ $ source /etc/os-release
core@marco-gpu1 ~ $ docker run --rm --privileged --volume /:/rootfs/ srcd/coreos-nvidia:${VERSION}
/bin/sh: 1: Syntax error: "if" unexpected

I think you are missing a ; or a && right after this line: https://github.com/src-d/coreos-nvidia/blob/master/Dockerfile#L103

insmod: ERROR: could not insert module /rootfs/usr/lib64/modules/4.14.19-coreos/kernel/drivers/char/ipmi/ipmi_msghandler.ko: File exists

# cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1632.3.0
VERSION_ID=1632.3.0
BUILD_ID=2018-02-14-0338
PRETTY_NAME="Container Linux by CoreOS 1632.3.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
# modprobe ipmi_devintf
# source /etc/os-release
# docker run --rm --privileged --volume /:/rootfs/ srcd/coreos-nvidia:${VERSION}
insmod: ERROR: could not insert module /rootfs/usr/lib64/modules/4.14.19-coreos/kernel/drivers/char/ipmi/ipmi_msghandler.ko: File exists

Based on the recommendation given in issue #3, I was trying to start the container. Before, it failed with the message mentioned in #3, but now I encounter the reported issue and have no idea how to proceed. I was trying it as a normal and as root use. No success.
Any advice will be welcome.

Thanks.

insmod: ERROR: could not insert module: Unknown symbol in module

Hi! I'm using CoreOS 1632.2.1 and can't insert kernel module with such error:

core@localhost ~ $ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1632.2.1
VERSION_ID=1632.2.1
BUILD_ID=2018-02-01-2053
PRETTY_NAME="Container Linux by CoreOS 1632.2.1 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
core@localhost ~ $ docker run --rm -it --privileged --volume /:/rootfs/ srcd/coreos-nvidia:1632.2.1
insmod: ERROR: could not insert module /opt/nvidia/lib/modules/4.14.16/video/nvidia.ko: Unknown symbol in module

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.