Giter Club home page Giter Club logo

cain's Introduction

Release Travis branch Docker Pulls Go Report Card license

Cain

Cain is a backup and restore tool for Cassandra on Kubernetes. It is named after the DC Comics superhero Cassandra Cain.

Cain supports the following cloud storage services:

  • AWS S3
  • Minio S3
  • Azure Blob Storage
  • Google Cloud Storage

Cain is now an official part of the Helm incubator/cassandra chart!

Install

Prerequisites

  1. git
  2. dep

From a release

Download the latest release from the Releases page or use it with a Docker image

From source

mkdir -p $GOPATH/src/github.com/maorfr && cd $_
git clone https://github.com/maorfr/cain.git && cd cain
make

Commands

Backup Cassandra cluster to cloud storage

Cain performs a backup in the following way:

  1. Backup the keyspace schema (using cqlsh).
  2. Get backup data using nodetool snapshot - it creates a snapshot of the keyspace in all Cassandra pods in the given namespace (according to selector).
  3. Copy the files in parallel to cloud storage using Skbn - it copies the files to the specified dst, under namespace/<cassandrClusterName>/keyspace/<keyspaceSchemaHash>/tag/.
  4. Clear all snapshots.

Usage

$ cain backup --help
backup cassandra cluster to cloud storage

Usage:
  cain backup [flags]

Flags:
  -b, --buffer-size float           in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
      --cassandra-data-dir string   cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
  -c, --container string            container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
      --dst string                  destination to backup to. Example: s3://bucket/cassandra. Overrides $CAIN_DST
  -k, --keyspace string             keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string            namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
  -p, --parallel int                number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
  -l, --selector string             selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")

Examples

Backup to AWS S3

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst s3://db-backup/cassandra

Backup to Azure Blob Storage

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst abs://my-account/db-backup-container/cassandra

Backup to Google Cloud Storage

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst gcs://db-backup/cassandra

Restore Cassandra backup from cloud storage

Cain performs a restore in the following way:

  1. Restore schema if schema is specified.
  2. Truncate all tables in keyspace.
  3. Copy files from the specified src (under keyspace/<keyspaceSchemaHash>/tag/) - restore is only possible for the same keyspace schema.
  4. Load new data using nodetool refresh.

Usage

$ cain restore --help
restore cassandra cluster from cloud storage

Usage:
  cain restore [flags]

Flags:
  -b, --buffer-size float           in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
      --cassandra-data-dir string   cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
  -c, --container string            container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
  -k, --keyspace string             keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string            namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
  -p, --parallel int                number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
  -s, --schema string               schema version to restore (optional). Overrides $CAIN_SCHEMA
  -l, --selector string             selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
      --src string                  source to restore from. Example: s3://bucket/cassandra/namespace/cluster-name. Overrides $CAIN_SRC
  -t, --tag string                  tag to restore. Overrides $CAIN_TAG
      --user-group string           user and group who should own restored files. Overrides $CAIN_USER_GROUP (default "cassandra:cassandra")

Examples

Restore from S3

cain restore \
    --src s3://db-backup/cassandra/default/ring01
    -n default \
    -k keyspace \
    -l release=cassandra \
    -t 20180903091624

Restore from Azure Blob Storage

cain restore \
    --src s3://my-account/db-backup-container/cassandra/default/ring01
    -n default \
    -k keyspace \
    -l release=cassandra \
    -t 20180903091624

Restore from Google Cloud Storage

cain restore \
    --src gcs://db-backup/cassandra/default/ring01
    -n default \
    -k keyspace \
    -l release=cassandra \
    -t 20180903091624

Describe keyspace schema

Cain describes the keyspace schema using cqlsh. It can return the schema itself, or a checksum of the schema file (used by backup and restore).

Usage

$ cain schema --help
get schema of cassandra cluster

Usage:
  cain schema [flags]

Flags:
  -c, --container string   container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
  -k, --keyspace string    keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string   namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
  -l, --selector string    selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
      --sum                print only checksum. Overrides $CAIN_SUM

Examples

cain schema \
    -n default \
    -l release=cassandra \
    -k keyspace
cain schema \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --sum

Environment variables support

Cain commands support the usage of environment variables instead of flags. For example: The backup command can be executed as mentioned in the example:

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst s3://db-backup/cassandra

You can also set the appropriate envrionment variables (CAIN_FLAG, _ instead of -):

export CAIN_NAMESPACE=default
export CAIN_SELECTOR=release=cassandra
export CAIN_KEYSPACE=keyspace
export CAIN_DST=s3://db-backup/cassandra

cain backup

Support for additional storage services

Since Cain uses Skbn, adding support for additional storage services is simple. Read this post for more information.

Skbn compatibility matrix

Cain version Skbn version
0.6.0 0.5.0
0.5.1 0.4.2
0.5.0 0.4.1
0.4.2 0.4.1
0.4.1 0.4.1
0.4.0 0.4.0
0.3.0 0.3.0
0.2.0 0.2.0
0.1.0 0.1.1

Credentials

Kubernetes

Cain tries to get credentials in the following order:

  1. if KUBECONFIG environment variable is set - cain will use the current context from that config file
  2. if ~/.kube/config exists - cain will use the current context from that config file with an out-of-cluster client configuration
  3. if ~/.kube/config does not exist - cain will assume it is working from inside a pod and will use an in-cluster client configuration

AWS

Cain uses the default AWS credentials chain.

Azure Blob Storage

Cain uses AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_ACCESS_KEY environment variables for authentication.

Google Cloud Storage

Cain uses Google Application Default Credentials. Basically, it will first look for the GOOGLE_APPLICATION_CREDENTIALS environment variable. If it is not defined, it will look for the default service account, or throw an error if none is configured.

Examples

  1. Helm example
  2. Code example

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.