Howto Setup and run a MongoDB Cluster via Kubernetes in AWS
License: MIT License
mongodb-kubernetes-aws's Introduction
mongodb-kubernetes-aws
Why Kubernetes
Kubernetes makes a MongoDB cluster Highly Available. If a pod fails, the system will reschedule and restart it.
Effectively adding redundancy with minimal human intervention.
MongoDB Cluster via Kubernetes on AWS
Docker containers are stateless. While, MongoDB database nodes have to be stateful.
In the event that a container faSetupils, and is rescheduled, it's undesirable for the data to be lost.
To solve this, features such as the Volume abstraction in Kubernetes can be used to map what would otherwise be an ephemeral MongoDB data directory in the container to a persistent location where the data survives container failure and rescheduling.
Persistence: In AWS this Data Persistence is achieved by using Elastic Block Storage.
MongoDB database nodes within a replica set must communicate with each other โ including after rescheduling.
All of the nodes within a replica set must know the addresses of all of their peers. However, when a container is rescheduled, it is likely to be restarted with a different IP address. For example, all containers within a Kubernetes Pod share a single IP address, which changes when the pod is rescheduled. With Kubernetes, this can be handled by associating a Kubernetes Service with each MongoDB node, which uses the Kubernetes DNS service to provide a hostname for the service that remains constant through rescheduling.
Once each of the individual MongoDB nodes is running (each within its own container), the replica set must be initialized and each node added. This is likely to require some additional logic beyond that offered by off the shelf orchestration tools. Specifically, one MongoDB node within the intended replica set must be used to execute the rs.initiate and rs.add commands.
Availability Zones: If we want to distribute the master or nodes across multiple availability zones you need to add some parameters to the create clister cli.
For masters in multiple AZs specify --master-zones=us-west-2a,us-west-2b,us-west-2c. For nodes in multiple AZs specify for instance --zones=us-west-2a,us-west-2b,us-west-2c.
You can then change the number of nodes you run total by passing --node-count. However, the nodes are not guaranteed to be distributed across all AZs.
If you want to guarantee a count per zone, you need to create multiple instance groups for your nodes,
one per AZ (instance groups map to ASGs). You can tweak your nodes any time (i.e. after cluster creation).
Get VPC_ID and NETWORK_CIDR from AWS Console. Details here.
You dont need a replication controller as mentioned here, since deployments already take care of that.
Deployments subsume the role of RCs.
This is what it looks like. Instead of ReplicationController -- think a deployment.
* After adding 3 replicas the cluster looks like this:
Enabling Replica Sets to Talk to each other.
Connect to any one of the mongodb instances. Either use the public IP. Use kubectl get svc to get the services and the public IP.
However, if the cluster is not externally visible, log into the pods kubectl exec -it $POD_ID bash
run the inititate command:
rs.initiate()
We need to edit the config of the replica set. Run conf=rs.conf().
This member should be known by the external IP address and the port. Add the primary as such: