Giter Club home page Giter Club logo

aws-amazon-eks-ansible-example's Introduction

Status: Work-in-progress. Please create issues or pull requests if you have ideas for improvement.

Amazon EKS full automated deployment with Ansible

Example deployment of Amazon EKS using Ansible and Cloudformation.

Summary

This project demonstrates the deployment of an Amazon EKS cluster and basic services using Ansible and Amazon Cloudformation. The deployment is fully automated and will enable you to start learing and testing the elasticity and agility of AWS Services used with Kubernetes based microservice architectures.

High-Level Architecture

Architecture

Disclaimer

This project is an example of an deployment and meant to be used for testing and learning purposes only. Do not use in production.

Be aware that the deployment is not covered by the AWS free tier. Please use the AWS pricing calculator to an estimation beforehand

Table of Contents

  1. Getting started
  2. Prerequisites
  3. Parameters
  4. Templates
  5. Testing
  6. Resources
  7. Security
  8. License

Getting started

Just a few steps are needed to get started with the example deployment. the deployment process is seperated in a cluster deployment containing the creation of the Amazon EKS cluster itself and various cluster extensions and optional demo/example deployments to show the Amazon EKS cluster and the extensions in action.

You may use the deployment playbook for the automatic deployment of Amazon EKS via Ansible. To destroy/remove the Amazon EKS deployment from your AWS Account you can use the destroy playbook.

Prerequisites

To run the Ansible based deployment you need to have some software installed and configured on your device:

For the DNS Automation to work you'll need to have a Hosted Zone setup at your AWS Account

Parameters

Deployment parameters:

Some static deployment variables are to be altered/placed into the vars/static/definitions.yaml

Parameter Name Default Value Description Comment
eksexample_region eu-central-1 the region to be used for deployments define the region to be used. Please be aware that the Amazon EKS on AWS Fargate used in some examples is not available in all regions
eksexample_worker_desiredcount 2 desired worker nodes to start with
eksexample_worker_maxcount 10 maximum workers to be provisoned if scaling out the cluster
eksexample_worker_mincount 2 minimum worker nodes i recommend at least 2
eksexample_worker_instancetype t3a.medium instance size of the worker nodes
eksexample_bastion_instancetype t3a.small instance size of the bastion host
eksexample_clustername ansible-eks-testcluster name of the Amazon EKS cluster
eksexample_clusterversion 1.18 version of the Amazon EKS cluster versions <1.16 are not tested with this automation
eksexample_aws_profilename ansible the profile name setup for the local awscli i recommend to setup a local profile. if you decide to shift it directly to instance_profile based EC2 admin instances, alter the ansible module parameters to not use "profile:"

rename the vars/static/custom_definitions.yaml and alter the parameters according to your needs

Parameter Name Default Value Description Comment
eksexample_hostedzoneid the ID of your Route53 Zone where the DNS automation should work on.
eksexample_hostedzonename the Domainname of the Hosted Zone on Route53

Deployment

The deployment will take approx 30-45 minutes.

Template structure and deployment workflow

The Deployment consists of one main playbook triggering multible tasks, cloudformation templates and kubernetes manifests

Playbooks

  • eks-deploy-cluster.playbook.yaml: this playbook starts the overall deployment of the Amazon EKS cluster and triggers also the deployment of all extensions. can be started with ansible-playbook ./eks-deploy-cluster.playbook.yaml
  • eks-destroy-cluster.playbook.yaml: this playbook destroys the whole deployment. If you deployed the example deployments into the cluster, make sure these get destroyed first using the ./docs/examples/destroy-examples.playbook.yaml playbook.
  • ./docs/examples/deploy-examples.playbook.yaml: this playbook will deploy some microservice and overall deployment examples to demonstrate the functionality of the extensions.
  • ./docs/examples/destroy-examples.playbook.yaml: will remove the example deployments from the cluster (but leaves the cluster intact)

Tasks

  • acm.tasks.yaml: Sets up ACM and adds the validation Records into the defined Hosted Zone.
  • bastion.tasks.yaml: Sets up a Bastion Hosts used for the Amazon EKS deplyoment as well as all Kubernetes Deployments. Using the Bastion we'll not have to expose any ports/services of the Amazon EKS cluster backend to public internet.
  • eks-cluster.task.yaml: deploys the Amazon EKS cluster and enables the logging for the masterplane
  • eks-cluster-autoscaler.task.yaml: setup of the cluster-autoscaler
  • eks-container-insights.task.yaml: enable container insights for the Amazon EKS cluster
  • eks-external-dns.task.yaml: setup of the Route53 automation via external-dns
  • eks-ingress-controller.task.yaml: setup of the aws-load-balancer-controller to automate service exposure
  • eks-metrics-server.task.yaml: setup of the metrics server used by the Horizontal Pod Autoscaler
  • eks-storage-provider-ebscsi.task.yaml: setup of the Amazon Elastic Block Store (EBS) CSI driver. The driver will ensure automatic provisioning of persistent block storage volumes for workloads
  • eks-storage-provider-efscsi.task.yaml: setup of the Amazon EFS CSI driver. The driver will ensure automatic provisioning of persistent shared storage volumes for workloads
  • eks-xray.task.yaml: setup of the X-Ray deamon setup to trace application deployments

Cloudformation Templates

  • eks-bastion.template.yaml
  • eks-cluster-autoscaler-iam.template.yaml: provisioning of the IAM Policy granting access for the cluster autoscaler to Amazon EC2 and EC2 Autoscaling groups.
  • eks-container-insights-iam.template.yaml: provisioning of the IAM Policy allowing Amazon Cloudwatch Access via the Worker Nodes
  • eks-external-dns-iam.template.yaml: provisioning of the IAM Policy granting access for the external-dns pods to Route53
  • eks-ingress-controller-iam.template.yaml: provisioning of the IAM Policy granting access for the aws-load-balancer-controller towards Elastic Load Balancing
  • eks-storage-provider-ebscsi-iam.template.yaml: IAM Policies to Allow EBS Access via the CSI Driver Deployment
  • eks-storage-provider-efscsi-storage.template.yaml: provisioning of the EFS FileSystem, Mountpoints and related Securitygroups

Kubernetes Manifests

All Manifests used for the Kubernetes Service and Application deployments are seperated into subfolders of either ./k8s/ or ./docs/examples/k8s These Manifests are triggered using the community.kubernetes modules via the Ansible tasks.

additional Files

  • ansible.cfg: contains some tweaks for the SSH connectivity, logging and so on.
  • ansible.log: will be written on each run but not checked into the git repo (-> .gitignore )
  • ./vars/dynamic/*.yaml: Variable files written by the playbooks. Contain information about SSL Arns and Bastion Host IP/Name Details.
  • ./secrets/id_rsa_eks: this file contains the private ssh key to log into the bastion host via SSH. this is used by the Ansible playbooks but may be also used by you to log into the bastion to do some testing. this file will not be checked into git (-> .gitignore )

Testing

When the deployment of the cluster is fullfilled you may test your setup. some useful commands to see what actually got deployed:

first of all you should log into your bastion host. from there you will find a ready to use kubeconfig within .kube/config

to see if nodes are there and your access to the Amazon EKS cluster is working type:

kubectl get nodes

should give you something like:

NAME                                              STATUS   ROLES    AGE   VERSION
ip-192-168-43-140.eu-central-1.compute.internal   Ready    <none>   15m   v1.17.9-eks-4c6976
ip-192-168-93-136.eu-central-1.compute.internal   Ready    <none>   15m   v1.17.9-eks-4c697

to see if the pods of the extensions are up and running you may use:

kubectl get pod -o=wide -n kube-system

which should show something like:

NAME                                      READY   STATUS    RESTARTS   AGE     IP               NODE                                              NOMINATED NODE   READINESS GATES
aws-load-balancer-controller-7568799df8-pnch4   1/1     Running   0          7m30s   192.168.45.158   ip-192-168-43-140.eu-central-1.compute.internal   <none>           <none>
aws-node-7vg6h                            1/1     Running   0          17m     192.168.93.136   ip-192-168-93-136.eu-central-1.compute.internal   <none>           <none>
aws-node-scl29                            1/1     Running   0          17m     192.168.43.140   ip-192-168-43-140.eu-central-1.compute.internal   <none>           <none>
cluster-autoscaler-7884f5ff6d-k6vpw       1/1     Running   0          8m44s   192.168.68.113   ip-192-168-93-136.eu-central-1.compute.internal   <none>           <none>
coredns-5fdf64ff8-7m4bd                   1/1     Running   0          21m     192.168.70.235   ip-192-168-93-136.eu-central-1.compute.internal   <none>           <none>
coredns-5fdf64ff8-cnwlc                   1/1     Running   0          21m     192.168.52.7     ip-192-168-43-140.eu-central-1.compute.internal   <none>           <none>
ebs-csi-controller-668bbc964-2ttdt        4/4     Running   0          13m     192.168.88.219   ip-192-168-93-136.eu-central-1.compute.internal   <none>           <none>
ebs-csi-controller-668bbc964-pt8n4        4/4     Running   0          13m     192.168.41.152   ip-192-168-43-140.eu-central-1.compute.internal   <none>           <none>
ebs-csi-node-qq5r8                        3/3     Running   0          13m     192.168.43.140   ip-192-168-43-140.eu-central-1.compute.internal   <none>           <none>
ebs-csi-node-zsxs9                        3/3     Running   0          13m     192.168.93.136   ip-192-168-93-136.eu-central-1.compute.internal   <none>           <none>
ebs-snapshot-controller-0                 1/1     Running   0          13m     192.168.32.10    ip-192-168-43-140.eu-central-1.compute.internal   <none>           <none>
efs-csi-node-dzz5j                        3/3     Running   0          11m     192.168.93.136   ip-192-168-93-136.eu-central-1.compute.internal   <none>           <none>
...

you can check if for instance the ingress-controller is working as expected using:

kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o alb-ingress[a-zA-Z0-9-]+)

if everything is cool it should look like:

-------------------------------------------------------------------------------
AWS ALB Ingress controller
  Release:    v1.1.8
  Build:      git-ec387ad1
  Repository: https://github.com/kubernetes-sigs/aws-load-balancer-controller.git
-------------------------------------------------------------------------------

W0813 14:32:48.050307       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0813 14:32:48.096117       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="aws-load-balancer-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0813 14:32:48.096518       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="aws-load-balancer-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0813 14:32:48.096622       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="aws-load-balancer-controller" "source"=
I0813 14:32:48.096910       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="aws-load-balancer-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0813 14:32:48.096963       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="aws-load-balancer-controller" "source"=
I0813 14:32:48.097188       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="aws-load-balancer-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0813 14:32:48.098011       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="aws-load-balancer-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":"","kubeProxyVersion":"","operatingSystem":"","architecture":""}}}}
I0813 14:32:48.103658       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="aws-load-balancer-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"containers":null},"status":{}}}
I0813 14:32:48.105447       1 leaderelection.go:205] attempting to acquire leader lease  kube-system/ingress-controller-leader-alb...
I0813 14:32:48.119414       1 leaderelection.go:214] successfully acquired lease kube-system/ingress-controller-leader-alb
I0813 14:32:48.119775       1 recorder.go:53] kubebuilder/manager/events "level"=1 "msg"="Normal"  "message"="aws-load-balancer-controller-7568799df8-pnch4_dc09d9d6-dd71-11ea-a82f-6e94ec7ac6f2 became leader" "object"={"kind":"ConfigMap","namespace":"kube-system","name":"ingress-controller-leader-alb","uid":"5e3275ce-3936-411a-9de5-4503c2223c8b","apiVersion":"v1","resourceVersion":"3156"} "reason"="LeaderElection"
I0813 14:32:48.222253       1 controller.go:134] kubebuilder/controller "level"=0 "msg"="Starting Controller"  "controller"="aws-load-balancer-controller"
I0813 14:32:48.322547       1 controller.go:154] kubebuilder/controller "level"=0 "msg"="Starting workers"  "controller"="aws-load-balancer-controller" "worker count"=1

if you replace alb-ingress with external-dns or cluster-autoscaler you can use the same command to get the logs of these extensions as well.

Testing the example deployments

if you deployed the example(s) via ./docs/examples/deploy-examples.playbook.yaml you can check some neat details.

to check the Horizontal Pod Autscaler coming with the example-microservice use:

watch -n 1 kubectl get hpa -n eksdemo

which will give you:

NAME               REFERENCE                     TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
eksdemo-crystal    Deployment/eksdemo-crystal    1%/30%   2         10        1          17s
eksdemo-frontend   Deployment/eksdemo-frontend   3%/30%   2         10        0          14s
eksdemo-nodejs     Deployment/eksdemo-nodejs     2%/30%   2         10        1          20s

then open the test webpage using your browser of choise. the web address will be "https://eksdemo. + what you used for the variable "eksexample_hostedzonename" open the page on multiple tabs should generate some load you'll recordnize. you also can use for instance curl as a load generator

watch -n 0.1 curl -v https://eksdemo.example.com

There are many other things to try out. Feel free to share your ideas :)


Resources

Security

See CONTRIBUTING for more information.

License

This Example is licensed under the MIT-0 License. See the LICENSE file.

Individual files may contain code by other authors under other licenses. See their license headers for details.

aws-amazon-eks-ansible-example's People

Contributors

couchgott avatar cuongnc01655 avatar dale-c-anderson avatar jwmatthews avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-amazon-eks-ansible-example's Issues

fatal: [localhost]: FAILED! => {"changed": false, "msg": "The config profile (ansible) could not be found"}

TASK [Create EC2 KeyPair] ******************************************************
Monday 02 May 2022 15:40:26 +0000 (0:00:00.023) 0:00:00.325 ************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "The config profile (ansible) could not be found"}
PLAY RECAP *********************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

eks-container-insights.task.yaml still has an example name hardwired in it

The Problem

When eksexample_clustername in /vars/static/defaults.yaml is changed from its default, the create Roles and Policies for container insights task always fails to create it's cloudformation stack.

Root Cause

The default value of the stack name is wired into the shell command in the get notegroups rolename task.

So instead of describing the stack that was created, the command tries to describe a stack that doesn't exist, causing the next task to fail.

The fix

Replace the static portion of the stack name being queried with the portion that's expected from the variables file:

-  shell: > 
-    aws cloudformation describe-stack-resources --region {{ eksexample_region }} --stack-name eksctl-{{ eksexample_clustername }}-nodegroup-ansible-eks-testcluster-linux-nodes \
+  shell: >
+    aws cloudformation describe-stack-resources --region {{ eksexample_region }} --stack-name eksctl-{{ eksexample_clustername }}-nodegroup-{{ eksexample_clustername }}-linux-nodes \

SSH connection to the bastion can fail if the operator's agent tries too many other keys first

Original problem
The task Get AWS Account Information in the deploy playbook fails to connect to the bastion, with the SSH error Too many authentication failures.

Root cause
I have 7 ssh keys loaded in my agent:

$ ssh-add -l|wc -l
7

Evidently too many of these were tried before the playbook's own generated key was tried. Retrying the playbook without any adjustments to SSH configuration results in the same error. Attempting to SSH manually to the bastion also results in the same error.

My Fix
Appdending -o IdentitiesOnly=yes to the ssh_args in the playbook's ansible.cfg file ensures the playbook only tries the single key that was generated, thus allowing the playbook to connect to the bastion and continue executing.

The check for cluster existence is not compatible with new version of eksctl

The 0.38.0 release of the eksctl utility has started to always output timestamp and version information, which breaks all the stdout checks for No clusters found strings in tasks/eks-cluster.task.yaml and eks-destroy-cluster.playbook.yaml.

Old/Working:

$ eksctl version
0.36.2

$ eksctl get cluster --region us-west-2
No clusters found

New/Noisy/Breaky:

$ eksctl version 
0.38.0

$ eksctl get cluster --region us-west-2 
2021-02-25 09:54:00 [ℹ]  eksctl version 0.38.0
2021-02-25 09:54:00 [ℹ]  using region us-west-2
No clusters found

To Reproduce
Use the repo to create a new cluster (which pulls the latest version of eksctl), or upgrade an existing bastion's eksctl version to 0.38.0

Possible fix
eksctl has a --verbose 0 flag that silences all the extra stuff, which will get the checks working again, but this still seems somewhat brittle; maybe there's a better way to check.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.