Giter Club home page Giter Club logo

kube-runner's Introduction

kube-runner

This repository provides tools and instructions for running nextflow pipelines on a Kubernetes cluster. These scripts have been tested for the following pipelines:

(In Progress)

Dependencies

To get started, all you need is nextflow, kubectl, and access to a Kubernetes cluster (in the form of ~/.kube/config). If you want to test Docker images on your local machine, you will also need docker and nvidia-docker (for GPU-enabled Docker images).

Configuration

There are a few administrative tasks which must be done in order for nextflow to be able to run properly on the Kubernetes cluster. These tasks only need to be done once, but they may require administrative access to the cluster, so you may need your system administrator to handle this part for you.

  • Nextflow needs a service account with the edit and view cluster roles:
kubectl create rolebinding default-edit --clusterrole=edit --serviceaccount=<namespace>:default
kubectl create rolebinding default-view --clusterrole=view --serviceaccount=<namespace>:default
  • Nextflow needs access to shared storage in the form of a Persistent Volume Claim (PVC) with ReadWriteMany access mode. The process for provisioning a PVC depends on what types of storage is available. The kube-create-pvc.sh script provides an example of creating a PVC for CephFS storage, but it may not apply to your particular cluster. Consult your system administrator for assistance if necessary. There may already be a PVC available for you. You can check using the following command:
kubectl get pvc

NOTE: If you are a user of the NRP from the Feltus lab, there is already a PVC available for you called deepgtex-prp.

Usage

Consult the examples folder for examples of running nextflow pipelines on a Kubernetes cluster. Consult the Nextflow Kubernetes documentation for more general information on using Nextflow and Kubernetes together.

This repository provides two scripts, kube-load.sh and kube-save.sh, for transferring data between your local machine and your Kubernetes cluster. In general, to run a nextflow pipeline with Kubernetes, you will need to transfer your input data beforehand using kube-load.sh and transfer your output data afterward using kube-save.sh:

./kube-load.sh <pvc-name> <input-dir>

nextflow [-C nextflow.config] kuberun <pipeline> -v <pvc-name> [options]

./kube-save.sh <pvc-name> <output-dir>

NOTE: If you use kube-load.sh to upload a directory when that directory already exists remotely, kube-load.sh will not overwrite the remote directory. Instead, it will copy the local directory into the remote directory. For example, if you try to upload a directory called input and that directory already exists remotely, the local input directory will be copied to input/input. Keep this in mind whenever you try to update an existing directory! You must delete or rename the remote directory before copying the new directory.

The nextflow kuberun command will automatically create a pod that runs your pipeline. Alternatively, you can provide your own pod spec. The kube-run.sh script can generate a pod spec and launch it using the same parameters as nextflow kuberun:

# transfer local nextflow.config if necessary
./kube-load.sh <pvc-name> nextflow.config

# run pipeline
./kube-run.sh <pvc-name> <pipeline> [options]

As you run pipelines, nextflow will create pods to perform the work. Some pods may not be properly cleaned up due to errors or other issues, therefore it is important to clean up your pods periodically. You can list all of the pods in your namespace using kubectl:

kubectl get pods

You can use the kube-clean.sh script in this repository to clean up dangling pods:

./kube-clean.sh

Lastly, there are a few additional scripts you can use to manage the pods in your namespace:

./kube-logs.sh
./kube-pods.sh

Appendix

Working with Docker images

NOTE: Generally speaking, Docker requires admin privileges in order to run. On Linux, for example, you may need to run Docker commands with sudo. Alternatively, if you add your user to the docker group then you will be able to run docker without sudo.

Build a Docker image:

docker build -t <tag> <build-directory>

Run a Docker container:

docker run [--runtime=nvidia] --rm -it <tag> <command>

List the Docker images on your machine:

docker images

Push a Docker image to Docker Hub:

docker push <tag>

Remove old Docker data:

docker system prune

Interacting with a Kubernetes cluster

Test your Kubernetes configuration:

kubectl config view

View the physical nodes on your cluster:

kubectl get nodes --show-labels

Check the status of your pods:

kubectl get pods -o wide

Get information on a pod:

kubectl describe pod <pod-name>

Get an interactive shell into a pod:

kubectl exec -it <pod-name> -- bash

Delete a pod:

kubectl delete pod <pod-name>

Using Nextflow with Kubernetes

Create a pod with an interactive terminal on a Kubernetes cluster:

nextflow kuberun login -v <pvc-name>

Run a nextflow pipeline on a Kubernetes cluster:

nextflow [-C nextflow.config] kuberun <pipeline> -v <pvc-name>

NOTE: If you create your own nextflow.config in your current directory then nextflow will use that config file instead of the default.

kube-runner's People

Contributors

bentsherman avatar feltus avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.