Giter Club home page Giter Club logo

container-canary's Introduction

Container Canary

Test GitHub go.mod Go version GitHub tag (latest SemVer)

A little bird to validate your container images.

$ canary validate --file examples/awesome.yaml your/container:latest
Validating your/container:latest against awesome
 ๐Ÿ“ฆ Required packages are installed                  [passed]
 ๐Ÿค– Expected services are running                    [passed]
 ๐ŸŽ‰ Your container is awesome                        [passed]
validation passed

Many modern compute platforms support bring-your-own-container models where the user can provide container images with their custom software environment. However platforms commonly have a set of requirements that the container must conform to, such as using a non-root user, having the home directory in a specific location, having certain packages installed or running web applications on specific ports.

Container Canary is a tool for recording those requirements as a manifest that can be versioned and then validating containers against that manifest. This is particularly useful in CI environments to avoid regressions in containers.

Installation

You can find binaries and instructions on our releases page.

Example (Kubeflow)

The Kubeflow documentation has a list of requirements for container images that can be used in the Kubeflow Notebooks service.

That list looks like this:

  • expose an HTTP interface on port 8888:
    • kubeflow sets an environment variable NB_PREFIX at runtime with the URL path we expect the container be listening under
    • kubeflow uses IFrames, so ensure your application sets Access-Control-Allow-Origin: * in HTTP response headers
  • run as a user called jovyan:
    • the home directory of jovyan should be /home/jovyan
    • the UID of jovyan should be 1000
  • start successfully with an empty PVC mounted at /home/jovyan:
    • kubeflow mounts a PVC at /home/jovyan to keep state across Pod restarts

With Container Canary we could write this list as the following YAML spec.

# examples/kubeflow.yaml
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: kubeflow
description: Kubeflow notebooks
env:
  - name: NB_PREFIX
    value: /hub/jovyan/
ports:
  - port: 8888
    protocol: TCP
volumes:
  - mountPath: /home/jovyan
checks:
  - name: user
    description: ๐Ÿ‘ฉ User is jovyan
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "[ $(whoami) = jovyan ]"
  - name: uid
    description: ๐Ÿ†” User ID is 1000
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "id | grep uid=1000"
  - name: home
    description: ๐Ÿ  Home directory is /home/jovyan
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "[ $HOME = /home/jovyan ]"
  - name: http
    description: ๐ŸŒ Exposes an HTTP interface on port 8888
    probe:
      httpGet:
        path: /
        port: 8888
      initialDelaySeconds: 10
  - name: NB_PREFIX
    description: ๐Ÿงญ Correctly routes the NB_PREFIX
    probe:
      httpGet:
        path: /hub/jovyan/lab
        port: 8888
      initialDelaySeconds: 10
  - name: allow-origin-all
    description: "๐Ÿ”“ Sets 'Access-Control-Allow-Origin: *' header"
    probe:
      httpGet:
        path: /
        port: 8888
        responseHttpHeaders:
          - name: Access-Control-Allow-Origin
            value: "*"
      initialDelaySeconds: 10

The Canary Validator spec reuses parts of the Kubernetes configuration API including probes. In Kubernetes probes are used to check on the health of a pod, but in Container Canary we use them to validate if the container meets our specification.

We can then run our specification against any desired container image to see a pass/fail breakdown of requirements. We can test one of the default images that ships with Kubeflow as that should pass.

$ canary validate --file examples/kubeflow.yaml public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-scipy:v1.5.0-rc.1
Validating public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-scipy:v1.5.0-rc.1 against kubeflow
 ๐Ÿ‘ฉ User is jovyan                                   [passed]
 ๐Ÿ†” User ID is 1000                                  [passed]
 ๐Ÿ  Home directory is /home/jovyan                   [passed]
 ๐ŸŒ Exposes an HTTP interface on port 8888           [passed]
 ๐Ÿงญ Correctly routes the NB_PREFIX                   [passed]
 ๐Ÿ”“ Sets 'Access-Control-Allow-Origin: *' header     [passed]
validation passed

For more examples see the examples directory.

Validator reference

Validator manifests are YAML files that describe how to validate a container image. Check out the examples directory for real world applications.

Metadata

Each manifests starts with some metadata.

# Manifest versioning
apiVersion: container-canary.nvidia.com/v1
kind: Validator

# Metadata
name: foo  # The name of the platform that this manifest validates for
description: Foo runs containers for you  # A description of that platform
documentation: https://example.com  # A link to the documentation that defines the container requirements in prose

Runtime options

Next you can set runtime configuration for the container you are validating. You should set these to mimic the environment that the compute platform will create. When you validate a container it will be run locally using Docker.

Environment variables

A list of environment variables that should be set on the container.

env:
  - name: HELLO
    value: world
  - name: FOO
    value: bar

Ports

Ports that need to be exposed on the container. These need to be configured in order for Container Canary to perform connectivity tests.

ports:
  - port: 8888
    protocol: TCP

Volumes

Volumes to be mounted to the container. This is useful if the compute platform will always mount an empty volume to a specific location.

volumes:
  - mountPath: /home/jovyan

Command

You can specify a custom command to be run inside the container.

command:
 - foo
 - --bar=true

Checks

Checks are the tests that we want to run against the container to ensure it is compliant. Each check contains a probe, and those probes are superset of the Kubernetes probes API and so any valid Kubernetes probe can be used in a check.

checks:
  - name: mycheck  # Name of the check
    description: Ensuring a thing  # Descrption of what is being checked (will be used in output)
    probe:
      ...  # A probe to run

Exec

An exec check runs a command inside the running container. If the command exits with 0 the check will pass.

checks:
  - name: uid
    description: User ID is 1234
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "id | grep uid=1234"

HTTPGet

An HTTP Get check will perform an HTTP GET request against your container. If the response code is <300 and the optional response headers match the check will pass.

checks:
  - name: http
    description: Exposes an HTTP interface on port 80
    probe:
      httpGet:
        path: /
        port: 80
        httpHeaders:  # Optional, headers to set in the request
          - name: Foo-Header
            value: "myheader"
        responseHttpHeaders:  # Optional, headers that you expect to see in the response
          - name: Access-Control-Allow-Origin
            value: "*"

TCPSocket

A TCP Socket check will ensure something is listening on a specific TCP port.

checks:
  - name: tcp
    description: Is listening via TCP on port 80
    probe:
      tcpSocket:
        port: 80

Delays, timeouts, periods and thresholds

Checks also support the same delays, timeouts, periods and thresholds that Kubernetes probes do.

checks:
  - name: uid
    description: User ID is 1234
    probe:
      exec:
        command: [...]
      initialDelaySeconds: 0  # Delay after starting the container before the check should be run
      timeoutSeconds: 30  # Overall timeout for the check
      successThreshold: 1  # Number of times the check must pass before moving on
      failureThreshold: 1  # Number of times the check is allowed to fail before giving up
      periodSeconds: 1  # Interval between runs if threasholds are >1

Contributing

Contributions are very welcome, be sure to review the contribution guidelines.

Maintaining

Maintenance steps can be found here.

License

Apache License Version 2.0, see LICENSE.

container-canary's People

Contributors

bashbunni avatar dependabot[bot] avatar jacobtomlinson avatar jameslamb avatar kylefromnvidia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

container-canary's Issues

Implement gRPC check

The gRPC liveness check is in alpha in Kubernetes v1.23 and behind a feature gate. Once it is not behind a gate it needs to be supported here in order for the checks to continue to be a superset of the Kubernetes probes.

It could also be implemented sooner and placed in a similar alpha state.

feature request: make container startup timeout configurable

Description

canary validate fails if the container takes more than 10 seconds to start up.

That 10 second threshold is currently hard-coded.

if time.Since(startTime) > (time.Second * 10) {
err := c.Remove()
if err != nil {
return err
}
return errors.New("container failed to start after 10 seconds")
}

That timeout should be configurable.

Benefits of this work

Would allow the use of canary validate with images that take longer than 10 seconds to start up, for example:

  • containers that do some heavy work on startup, like starting a process and polling until it passes health checks)
  • docker daemon that is running slowly, e.g. because it's performing I/O on a network filesystem or is in some other way resource-constrainted

Acceptance Criteria

  • it's possible to modify, via command-line argument(s), how long canary validate waits for a container to start up before timing out

Approach

Modify this

if time.Since(startTime) > (time.Second * 10) {
err := c.Remove()
if err != nil {
return err
}
return errors.New("container failed to start after 10 seconds")
}

such that that 10 second timeout can be altered via configuration.

For example, I'd like to be able to run the following:

canary validate \
   --file ./checks.yaml \
   --startup-timeout 30 \
   ${IMAGE_URI}

Notes

Created this after observing this exact timeout while testing RAPIDS images over in rapidsai/docker#670.

feature request: allow overriding command without modifying checks config file

Description

container-canary should support setting the command: for running an image without modifying the file containing validation checks.

Benefits of this work

Would separate details of how the validation checks are run from what is being validated, making it easier to share validation manifests across many different images. See "Motivation" below for details.

Acceptance Criteria

  • it is possible to set / override the command: used when container-canary starts up a container without modifying the file that check configurations are stored in

Approach

This might be accomplished by adding new command-line arguments to container-canary, like --cmd to set the command.

canary validate \
    --file tests.yaml \
    --cmd "sh -c 'sleep 9999'" \
    ${IMAGE_URI}

The more Kubernetes-thonic (is that a word?) but also more difficult to implement approach would be to allow providing multiple files and merging them all together.

For example, something like this:

canary validate \
    --file https://raw.githubusercontent.com/NVIDIA/container-canary/main/examples/kubeflow.yaml \
    --file ./override-command.yaml \
    ubuntu:22.04

Where override-command.yaml just contains something like this:

command:
  - /bin/sh
  - -c
  - "sleep 1234"

And where the implication is that later files are patched on top of earlier files.

Following the way that some of these other tools work:

Notes

Motivation

container-canary starts up a container based on the image passed to it, then run checks inside that running container.

If that image's default CMD doesn't result in starting a process (e.g. like a webserver), then container-canary will fail to run on it.

Consider the following:

canary validate \
    --file https://raw.githubusercontent.com/NVIDIA/container-canary/main/examples/kubeflow.yaml \
    ubuntu:22.04

That kubeflow.yaml file doesn't have a command: entry, so the container will run whatever the default CMD is on the image.

In this case, it's just a shell.

docker inspect ubuntu:22.04 \
| jq '.[0].ContainerConfig.Cmd'
[
  "/bin/sh",
  "-c",
  "#(nop) ",
  "CMD [\"/bin/bash\"]"
]

And as a result, container-canary fails to run any checks.

Error: container failed to start

In the current state of container-canary, you have the following options:

  1. change the default CMD to something long-running, like python -m http.server 80
  2. copy the contents of that remote config and add a command: block tacking on some long-running command

If instead you could override just the command: but still reuse the remote config file, then that one config could be the source of truth for something like "what characteristics does a valid Kubeflow notebook image have" for many different images spread over many repositories, without them needing to manually hold their own copies of it or add unnecessary CMD entries in their images just for the sake of this validation.

Add optional conditions

It would be useful to make some conditions as optional and add a flag to enable them if desired.

For example in the databricks example folks may want Python OR R but not both. They may also want to check for CUDA toolkit for use on GPU nodes, but not everyone will want to test for that.

It would be nice to add an optional setting which still runs them but doesn't fail the test if they fail, but have a CLI flag that makes them mandatory.

E.g

# foo.yaml
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: foo
command:
  - /bin/sh
  - -c
  - "sleep 3600"
checks:
  - name: bash
    description: Has bash installed
    optional: true
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "which bash"
$ canary validate --file foo.yaml --check-optional "bash" somecontainer

--debug flag causes panic

$ canary version                                              
Container Canary
 Version:         v0.2.1
 Go Version:      go1.17.8
 Commit:          d97ec23
 OS/Arch:         linux/amd64
 Built:           2022-04-14T10:03:44Z

$ canary validate --file examples/awesome.yaml ubuntu --debug     
Validating ubuntu against awesome
Running container with command 'docker run -d --name canary-runner-f716bacd ubuntu sleep 30'
 ๐Ÿ“ฆ Required packages are installed                  [passed]
 ๐Ÿค– Expected services are running                    [passed]
 ๐ŸŽ‰ Your container is awesome                        [passed]
validation passed
Caught panic:

runtime error: invalid memory address or nil pointer dereference

Restoring terminal...

goroutine 1 [running]:
runtime/debug.Stack()
        /opt/hostedtoolcache/go/1.17.8/x64/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /opt/hostedtoolcache/go/1.17.8/x64/src/runtime/debug/stack.go:16 +0x19
github.com/charmbracelet/bubbletea.(*Program).StartReturningModel.func3()
        /home/runner/go/pkg/mod/github.com/charmbracelet/[email protected]/tea.go:359 +0x95
panic({0xb3e660, 0x11f1150})
        /opt/hostedtoolcache/go/1.17.8/x64/src/runtime/panic.go:1047 +0x266
github.com/nvidia/container-canary/internal/validator.model.View({0xc000048360, {0xd116d8, 0xc00035a000}, 0x1, {0xc0001dc8c0, 0x3, 0x4}, 0x1, {{{0x11f6e60, 0x4, ...}, ...}, ...}, ...})
        /home/runner/work/container-canary/container-canary/internal/validator/validator.go:184 +0x185
github.com/charmbracelet/bubbletea.(*Program).StartReturningModel(0xc0001b0200)
        /home/runner/go/pkg/mod/github.com/charmbracelet/[email protected]/tea.go:549 +0x1438
github.com/nvidia/container-canary/internal/validator.Validate({0x7fff7bb98156, 0x6}, {0x7fff7bb98140, 0x15}, 0xc0000dbdd0, 0x1)
        /home/runner/work/container-canary/container-canary/internal/validator/validator.go:239 +0x545
github.com/nvidia/container-canary/cmd.glob..func1(0x11f91c0, {0xc00009ac40, 0x1, 0x4})
        /home/runner/work/container-canary/container-canary/cmd/validate.go:50 +0xd1
github.com/spf13/cobra.(*Command).execute(0x11f91c0, {0xc00009ac00, 0x4, 0x4})
        /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x60e
github.com/spf13/cobra.(*Command).ExecuteC(0x11f8f40)
        /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
        /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
github.com/nvidia/container-canary/cmd.Execute()
        /home/runner/work/container-canary/container-canary/cmd/root.go:44 +0x25
main.main()
        /home/runner/work/container-canary/container-canary/main.go:23 +0x17
Error: program returned unknown model

`tcpSocket` doesn't actually test TCP ports inside container

Consider the following:

#!/bin/sh

set -ex

cat > phony-tcp.Dockerfile <<EOF
FROM ubuntu:22.04

# It succeeds even without the EXPOSE command
# EXPOSE 8080

CMD /bin/bash -c 'while true; do sleep 60; done'
EOF

cat > phony-tcp.yaml <<EOF
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: phony-tcp
description: phony-tcp checks
ports:
  - port: 8080
    protocol: tcp
checks:
  - name: tcp
    probe:
      tcpSocket:
        port: 8080
EOF

docker build -t phony-tcp -f phony-tcp.Dockerfile .

container-canary validate --file phony-tcp.yaml phony-tcp

The check succeeds even though the container is clearly not listening to port 8080, because container-canary is connecting to the Docker proxy, rather than the actual process inside the container.

Unfortunately, I'm not sure on how to actually fix this. We may have to simply issue a warning for this particular check.

provide linux arm64 binaries

Description

For Linux, this project is currently only publishing amd64 binaries on releases.

We should consider also publishing arm64 binaries.

Benefits of this work

  • allows install-from-releases workflow to work in arm64 Linux environment

That would be useful because it allows use of this project without needing to set up Go and do go install.

Acceptance Criteria

  • releases contain a canary_linux_arm64 binary

Approach

As described in https://www.digitalocean.com/community/tutorials/building-go-applications-for-different-operating-systems-and-architectures#using-your-local-goos-and-goarch-environment-variables, Go has builtin support for cross-compiling, so I think this should be achievable on the GitHub-hosted ubuntu-latest runner, without requiring an arm64 runner.

Like this:

GOOS=linux GOARCH=arm64 go build

Notes

Writing this up specifically from the perspective of RAPIDS. This would be helpful (but not critical) for rapidsai/docker#667, as RAPIDS builds both amd64 and arm64 images there and it's helpful to run those images on amd64 and arm64 runners directly (instead of using emulation).

Container fails to start if port already in use

If canary tries to expose a container port for testing and that port is already in use the container fails to start and canary fails to validate.

Works

# check-port.yaml
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: check-port
description: Check port
env: []
ports:
  - port: 80
    protocol: TCP
volumes: []
checks:
  - name: http
    description: Check port 80
    probe:
      httpGet:
        path: /
        port: 80
      failureThreshold: 30
$ canary validate --file check-port.yaml nginx
Validating nginx against check-port
 Check port 80                                      [passed]
validation passed

Reproducer

$ docker run -p 80:80 nginx  # Start a process that binds to port 80 in another terminal
$ canary validate --file /tmp/test.yaml nginx
Validating nginx against check-port
\ Starting container
Error: container failed to start after 10 seconds

The container also doesn't get cleaned up.

$ docker ps -a             
CONTAINER ID   IMAGE                                  COMMAND                  CREATED         STATUS         PORTS                               NAMES
e8d32b8f45aa   nginx                                  "/docker-entrypoint.โ€ฆ"   2 minutes ago   Created                                            canary-runner-d43137e8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.