Read this in other languages: 한국어.
This project demonstrates the deployment of a multi-node scalable Cassandra cluster on Kubernetes. Apache Cassandra is a massively scalable open source NoSQL database. Cassandra is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple datacenters and the cloud.
In this journey we show a cloud native Cassandra deployment on Kubernetes. Cassandra understands that it is running within a cluster manager, and uses this cluster management infrastructure to help implement the application.
Leveraging Kubernetes concepts like Replication Controller, StatefulSets we provide step by step instructions to deploy either non-persistent or persistent Cassandra clusters on top of Bluemix Container Service using Kubernetes cluster
Create a Kubernetes cluster with either:
- Minikube for local testing
- IBM Bluemix Container Service to deploy in cloud, or
- IBM Cloud Private for either senario. For deploying on IBM Cloud Private follow the instructions here The code here is regularly tested against Kubernetes Cluster from Bluemix Container Service using Travis.
If you want to deploy Cassandra nodes directly to Bluemix, click on 'Deploy to Bluemix' button below to create a Bluemix DevOps service toolchain and pipeline for deploying the sample, else jump to Steps
You will need to create your Kubernetes cluster first and make sure it is fully deployed in your Bluemix account.
Please follow the Toolchain instructions to complete your toolchain and pipeline.
The Cassandra cluster will not be exposed on the public IP of the Kubernetes cluster. You can still access them by exporting your Kubernetes cluster configuration using bx cs cluster-config <your-cluster-name>
and doing Step 5 or to simply check their status kubectl exec <POD-NAME> -- nodetool status
- Create a Replication Controller
- Validate the Replication Controller
- Scale the Replication Controller
- Using Cassandra Query Language (CQL)
- Create Local Volumes
- Create a StatefulSet
- Validate the StatefulSet
- Scale the StatefulSet
- Using Cassandra Query Language(CQL)
In this sample app you don’t need load-balancing and a single service IP. In this case, you can create “headless” service by specifying none for the clusterIP. We'll need the headless service for the Pods to discover the IP address of the Cassandra seed. Here is the Service description for the headless Service:
apiVersion: v1
kind: Service
metadata:
labels:
app: cassandra
name: cassandra
spec:
clusterIP: None
ports:
- port: 9042
selector:
app: cassandra
You can create the headless service using the provided yaml file:
$ kubectl create -f cassandra-service.yaml
service "cassandra" created
If you want to create a persistent Cassandra cluster using StatefulSets, please jump to Step 6
The Replication Controller is the one responsible for creating or deleting pods to ensure the number of Pods match its defined number in "replicas". The Pods' template are defined inside the Replication Controller. You can set how much resources will be used for each pod inside the template and limit the resources they can use. Here is the Replication Controller description:
apiVersion: v1
kind: ReplicationController
metadata:
name: cassandra
# The labels will be applied automatically
# from the labels in the pod template, if not set
# labels:
# app: cassandra
spec:
replicas: 1
# The selector will be applied automatically
# from the labels in the pod template, if not set.
# selector:
# app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
containers:
- env:
- name: CASSANDRA_SEED_DISCOVERY
value: cassandra
# CASSANDRA_SEED_DISCOVERY should match the name of the service in cassandra-service.yaml
- name: CASSANDRA_CLUSTER_NAME
value: Cassandra
- name: CASSANDRA_DC
value: DC1
- name: CASSANDRA_RACK
value: Rack1
- name: CASSANDRA_ENDPOINT_SNITCH
value: GossipingPropertyFileSnitch
image: docker.io/anthonyamanse/cassandra-demo:7.0
name: cassandra
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
volumeMounts:
- mountPath: /var/lib/cassandra/data
name: data
volumes:
- name: data
emptyDir: {}
You can create a Replication Controller using the provided yaml file with 1 replica:
$ kubectl create -f cassandra-controller.yaml
replicationcontroller "cassandra" created
You can view a list of Replication Controllers using this command:
$ kubectl get rc
NAME DESIRED CURRENT READY AGE
cassandra 1 1 1 1m
If you view the list of the Pods, you should see 1 Pod running. Use this command to view the Pods created by the Replication Controller:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
cassandra-xxxxx 1/1 Running 0 1m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
To check if the Cassandra node is up, perform a nodetool status:
You may not be able to run this command for some time if the Pod hasn't created the container yet or the Cassandra instance hasn't finished setting up.
$ kubectl exec -ti cassandra-xxxxx -- nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.xxx.xxx.xxx 109.28 KB 256 100.0% 6402e90d-7995-4ee1-bb9c-36097eb2c9ec Rack1
To increase the number of Pods, you can scale the Replication Controller as many as the available resources can acccomodate. Proceed to the next step.
To scale the Replication Controller, use this command:
$ kubectl scale rc cassandra --replicas=4
replicationcontroller "cassandra" scaled
After scaling, you should see that your desired number has increased.
$ kubectl get rc
NAME DESIRED CURRENT READY AGE
cassandra 4 4 4 3m
You can view the list of the Pods again to confirm that your Pods are up and running.
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
cassandra-1lt0j 1/1 Running 0 13m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
cassandra-vsqx4 1/1 Running 0 38m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
cassandra-jjx52 1/1 Running 0 38m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
cassandra-wzlxl 1/1 Running 0 38m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
You can check that the Pods are visible to the Service using the following service endpoints query:
$ kubectl get endpoints cassandra -o yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: 2017-03-15T19:53:09Z
labels:
app: cassandra
name: cassandra
namespace: default
resourceVersion: "10591"
selfLink: /api/v1/namespaces/default/endpoints/cassandra
uid: 03e992ca-09b9-11e7-b645-daaa1d04f9b2
subsets:
- addresses:
- ip: 172.xxx.xxx.xxx
nodeName: 169.xxx.xxx.xxx
targetRef:
kind: Pod
name: cassandra-xp2jx
namespace: default
resourceVersion: "10583"
uid: 4ee1d4e2-09b9-11e7-b645-daaa1d04f9b2
- ip: 172.xxx.xxx.xxx
nodeName: 169.xxx.xxx.xxx
targetRef:
kind: Pod
name: cassandra-gs64p
namespace: default
resourceVersion: "10589"
uid: 4ee2025b-09b9-11e7-b645-daaa1d04f9b2
- ip: 172.xxx.xxx.xxx
nodeName: 169.xxx.xxx.xxx
targetRef:
kind: Pod
name: cassandra-g5wh8
namespace: default
resourceVersion: "109410"
uid: a39ab3ce-0b5a-11e7-b26d-665c3f9e8d67
- ip: 172.xxx.xxx.xxx
nodeName: 169.xxx.xxx.xxx
targetRef:
kind: Pod
name: cassandra-gf37p
namespace: default
resourceVersion: "109418"
uid: a39abcb9-0b5a-11e7-b26d-665c3f9e8d67
ports:
- port: 9042
protocol: TCP
You can perform a nodetool status to check if the other cassandra nodes have joined and formed a Cassandra cluster. Substitute the Pod name to the one you have:
$ kubectl exec -ti cassandra-xxxxx -- nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.xxx.xxx.xxx 109.28 KB 256 50.0% 6402e90d-7995-4ee1-bb9c-36097eb2c9ec Rack1
UN 172.xxx.xxx.xxx 196.04 KB 256 51.4% 62eb2a08-c621-4d9c-a7ee-ebcd3c859542 Rack1
UN 172.xxx.xxx.xxx 114.44 KB 256 46.2% 41e7d359-be9b-4ff1-b62f-1d04aa03a40c Rack1
UN 172.xxx.xxx.xxx 79.83 KB 256 52.4% fb1dd881-0eff-4883-88d0-91ee31ab5f57 Rack1
Note: It can take around 5 minutes for the Cassandra database to finish its setup.
You can check if the Cassandra nodes are up and running by using this command: Substitute the Pod name to the one you have
$ kubectl exec cassandra-xxxxx -- nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.xxx.xxx.xxx 109.28 KB 256 50.0% 6402e90d-7995-4ee1-bb9c-36097eb2c9ec Rack1
UN 172.xxx.xxx.xxx 196.04 KB 256 51.4% 62eb2a08-c621-4d9c-a7ee-ebcd3c859542 Rack1
UN 172.xxx.xxx.xxx 114.44 KB 256 46.2% 41e7d359-be9b-4ff1-b62f-1d04aa03a40c Rack1
UN 172.xxx.xxx.xxx 79.83 KB 256 52.4% fb1dd881-0eff-4883-88d0-91ee31ab5f57 Rack1
You will need to wait for the status of the nodes to be Up and Normal (UN) to execute the commands in the next steps.
You can access the cassandra container using the following command:
$ kubectl exec -it cassandra-xxxxx /bin/bash
root@cassandra-xxxxx:/# ls
bin boot dev docker-entrypoint.sh etc home initial-seed.cql lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
Now run the sample .cql file to create and update employee table on cassandra keyspace using the following commands:
You only need to run the .cql file once in ONE Cassandra node. The other pods should also have access to the sample table created by the .cql file.
root@cassandra-xxxxx:/# cqlsh -f initial-seed.cql
root@cassandra-xxxxx:/# cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> DESCRIBE TABLES
Keyspace my_cassandra_keyspace
------------------------------
employee
Keyspace system_schema
----------------------
tables triggers views keyspaces dropped_columns
functions aggregates indexes types columns
Keyspace system_auth
--------------------
resource_role_permissons_index role_permissions role_members roles
Keyspace system
---------------
available_ranges peers batchlog transferred_ranges
batches compaction_history size_estimates hints
prepared_statements sstable_activity built_views
"IndexInfo" peer_events range_xfers
views_builds_in_progress paxos local
Keyspace system_distributed
---------------------------
repair_history view_build_status parent_repair_history
Keyspace system_traces
----------------------
events sessions
cqlsh> SELECT * FROM my_cassandra_keyspace.employee;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+------------+---------
1 | SF | David | 9848022338 | 50000
2 | SJC | Robin | 9848022339 | 40000
3 | Austin | Bob | 9848022330 | 45000
You now have a non-persistent Caasandra cluster ready!!
If you want to create persistent Cassandra clusters, pelase move forward. Before proceeding to the next steps, delete your Cassandra Replication Controller.
$ kubectl delete rc cassandra
If you have not done it before, please create a Cassandra Headless Service before moving forward.
To create persistent Cassandra nodes, we need to provision Persistent Volumes. There are two ways to provision PV's: dynamically and statically.
For Dynamic provisioning, you'll need to have StorageClasses and you'll need to have a paid Kubernetes cluster service. If you have one and choose to use Dynamic provisioning, specify the StorageClass you'd like to use in the cassandra-statefulset.yaml
file or comment the annotations for StorageClass to use the default set in your Kubernetes cluster.
In this journey, we will use Static provisioning where we will create volumes manually using the provided yaml files. You'll need to have the same number of Persistent Volumes as the number of your Cassandra nodes.
Example: If you are expecting to have 4 Cassandra nodes, you'll need to create 4 Persistent Volumes
The provided yaml file already has 4 Persistent Volumes defined. Configure them to add more if you expect to have greater than 4 Cassandra nodes.
$ kubectl create -f local-volumes.yaml
You will use the same service you created earlier.
Make sure you have deleted your Replication Controller if it is still running.
kubectl delete rc cassandra
The StatefulSet is the one responsible for creating the Pods. It has the features of ordered deployment, ordered termination and unique network names. You will start with a single Cassandra node using StatefulSet. Run the following command.
$ kubectl create -f cassandra-statefulset.yaml
You can check if your StatefulSet has deployed using the command below.
$ kubectl get statefulsets
NAME DESIRED CURRENT AGE
cassandra 1 1 2h
If you view the list of the Pods, you should see 1 Pod running. Your Pod name should be cassandra-0 and the next pods would follow the ordinal number (cassandra-1, cassandra-2,..) Use this command to view the Pods created by the StatefulSet:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
cassandra-0 1/1 Running 0 1m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
To check if the Cassandra node is up, perform a nodetool status:
$ kubectl exec -ti cassandra-0 -- nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.xxx.xxx.xxx 109.28 KB 256 100.0% 6402e90d-7995-4ee1-bb9c-36097eb2c9ec Rack1
To increase or decrease the size of your StatefulSet, use this command:
$ kubectl edit statefulset cassandra
You should be redirected to and editor in your terminal. You need to edit the line where it says replicas: 1
and change it to replicas: 4
Save it and the StatefulSet should now have 4 Pods
After scaling, you should see that your desired number has increased.
$ kubectl get statefulsets
NAME DESIRED CURRENT AGE
cassandra 4 4 2h
If you watch the Cassandra pods deploy, they should be created sequentially.
You can view the list of the Pods again to confirm that your Pods are up and running.
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
cassandra-0 1/1 Running 0 13m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
cassandra-1 1/1 Running 0 38m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
cassandra-2 1/1 Running 0 38m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
cassandra-3 1/1 Running 0 38m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
You can perform a nodetool status to check if the other cassandra nodes have joined and formed a Cassandra cluster.
$ kubectl exec -ti cassandra-0 -- nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.xxx.xxx.xxx 109.28 KB 256 50.0% 6402e90d-7995-4ee1-bb9c-36097eb2c9ec Rack1
UN 172.xxx.xxx.xxx 196.04 KB 256 51.4% 62eb2a08-c621-4d9c-a7ee-ebcd3c859542 Rack1
UN 172.xxx.xxx.xxx 114.44 KB 256 46.2% 41e7d359-be9b-4ff1-b62f-1d04aa03a40c Rack1
UN 172.xxx.xxx.xxx 79.83 KB 256 52.4% fb1dd881-0eff-4883-88d0-91ee31ab5f57 Rack1
You can do Step 5 again to use CQL in your Cassandra Cluster deployed with StatefulSet.
- If your Cassandra instance is not running properly, you may check the logs using
kubectl logs <your-pod-name>
- To clean/delete your data on your Persistent Volumes, delete your PVCs using
kubectl delete pvc -l app=cassandra
- If your Cassandra nodes are not joining, delete your controller/statefulset then delete your Cassandra service.
kubectl delete rc cassandra
if you created the Cassandra Replication Controllerkubectl delete statefulset cassandra
if you created the Cassandra StatefulSetkubectl delete svc cassandra
- To delete everything:
kubectl delete rc,statefulset,pvc,svc -l app=cassnadra
kubectl delete pv -l type=local
- This Cassandra example is based on Kubernete's Cloud Native Deployments of Cassandra using Kubernetes.