matthiasscholz / cos Goto Github PK

Basic Cluster Orchestration Setup

License: GNU Lesser General Public License v3.0

HCL 63.93% Shell 18.31% YASnippet 1.75% Makefile 4.10% Go 11.90%

cos's Introduction

Basic Cluster Orchestration System

Making use of terraform and nomad to setup a cluster orchestration system. This respository will provide an extended example from the main nomad terraform module

Architecture

The COS (Cluster Orchestration System) consists of three core components.

A cluster of nomad servers servers (NMS, the leaders).
Several nomad clients (NMC).
A cluster of consul servers used as service registry. Together with the consul-agents on each of the instances, consul is used as Service Discovery System.

The nomad instances are organized in so called data-centers. A data-center is a group of nomad instances. A data-center can be specified as destination of a deployment of a nomad job.

The COS organizes it's nodes in five different data-centers.

DC leader: Contains the nomad servers (NMS).
DC backoffice: Contains those nomad clients (NMC) that provide basice functionality in order to run services. For example the prometheus servers and Grafana runs there.
DC public-services: Contains public facing services. These are services the clients directly get in touch with. They process ingress traffic. Thus an ingress load-balancer like fabio runs on those nodes.
DC private-services: Contains services wich are used internally. Those services do the real work, but need no access from/ to the internet.
DC content-connector: Contains services which are used to obtain/ scrape data from external sources. They usually load data from content-providers.

The data-centers of the COS are organized/ live in three different subnets.

Backoffice: This is the most important one, since it contains the most important instances, like the nomad servers. Thus it's restricted the most and has no access to the internet (either ingress nor egress).
Services: This subnet contains the services that need no egress access to the internet. Ingress access is only granted for some of them over an ALB, but not directly.
Content-Connector: This subnet contains services that need egress access to the internet in order to obtain data from conent-providers.

Docker Registry

This Cluster Orchestration System allows to pull docker images from public docker registries like Docker Hub and from AWS ECR.

Regarding AWS ECR, it is only possible to pull from the registry of the AWS account and region where this COS is deployed to. Thus you have to create an ECR in the same region on the same account and push your docker images there.

HA-Setup

The consul-servers as well as the nomad-servers are build up in an high-availability set up. At least three consul- and nomad-servers are deployed in different availability-zones. The nomad-clients are deployed in three different AZ's as well.

Structure

_docs

Providing detailed documentation for this module.

examples

Provides example instanziation of this module. The root-example builds up a full working nomad-cluster including the underlying networking, the nomad servers and -clients and a consul cluster for service discovery.

modules

Terraform modules for separate aspects of the cluster orchestration system.

nomad: Module that creates a cluster of nomad masters.
nomad-datacenter: Module that creates a cluster of nomad clients for a specific data-center.
consul: Module building up a consul cluster.
ui-access: Module building up alb's to grant access to nomad-, consul- and fabio-ui.
sgrules: Module connecting security groups of instances apropriately to grant the minimal needed access.
ami: Module for creating an AMI having nomad, consul and docker installed (based on Amazon Linux AMI 2017.09.1 .
ami2: Module for creating an AMI having nomad, consul and docker installed (based on AAmazon Linux 2 LTS Candidate AMI 2017.12.0).
networking: This module is only used to support the examples. It is not part of the main cos module.

Module Dependencies

The picture shows the dependencies within the modules of the cos-stack and the dependencies to the networking-stack.

Troubleshooting

Monitoring Server and Nodes

nomad monitor -log-level error|warn|info|debug|trace -node-id <node_id> | -server-id <server_id>
supported since nomad 0.10.2

Nomad CLI complains about invalid Certificate

If you have deployed the cluster with https endpoints for the ui-albs and have created a selfsigned certificate you might get errors from the nomad cli complanig about an invalid certificate (x509: certificate is..). To fix this you have to integrate your custom root-CA you used for signing your certificate apropriately into your system.

Provide access to CA cert-file

Therefore you have to store the PEM encoded CA cert-file locally and give the information where to find it to nomad.

There are two options:

-ca-cert=<path> flag or NOMAD_CACERT environment variable
-ca-path=<path> flag or NOMAD_CAPATH environment variable

Disable Certificate verification

To overcome certificate verification issues you can also (not recommended) temporarily skip the certificate verification when using the nomad CLI.

-tls-skip-verify As additional parameter in your cli calls. i.e. nomad plan -tls-skip-verify jobfile.nomad
NOMAD_SKIP_VERIFY Just set the environment variable to 1. export NOMAD_SKIP_VERIFY=1 And then call your CLI commands as usual. i.e. nomad plan jobfile.nomad

References

License

cos's People

Contributors

Stargazers

Watchers

Forkers

serkas darkslategrey fossabot daniel-meyer-dme vasilerobert85 elumalainarasimman dantodor torustad mawa-jnd apohadin j-guillot

cos's Issues

Nomad UI LogMessage Visibility

Using the nomad ui and trying to examine the logs for one allocation just gives empty results for stdout and stderr:

Nomad UI Reviewing Logs

Upgrade to nomad 0.10

Overview

A new nomad version with a couple of interesting features was release: 0.10.:

Service Mesh via Consul Connect Integration including mutualTLS support
Host Volumes
Network Namespaces: Isolate network for task groups

Reference

https://www.hashicorp.com/blog/hashicorp-nomad-0-10-general-availability/

Remove instance key as required parameter

Currently it is required to have an instance key configured and available in AWS.
This was done to ease debugging and issue investigation during development.

With the AWS SystemManager an other alternative exists which avoid the use of SSH keys.
Furthermore no bastion setup is needed in order to interact with the instance.

Nevertheless the AWS SystemManager has to be supported by the AMI.

Provide possibility to inject userdata

Some use-cases (i.e. mounting an efs mount-target) are best implemented via user-data when the instance is created.
With the current module api it is not possible to add steps to the user-data of the nomad nodes.

It would be nice to have this option.

Cleanup nomad module

The nomad module should be cleaned up because it uses gitlab references, e.g.:
* https://github.com/MatthiasScholz/cos/blob/master/modules/nomad/README.md

Upgrade to nomad 1.0.2

Time is running new version are available for all of our dependencies.

Update:

nomad
consul
fabio
terraform modules
testing

Update Nomad/Consul ami to use consul 1.3.1

The newest released consul version is 1.3.1.
Released on November 13.

With this upgrade beside bugfixes we also get features like Connect Envoy Support (part of v 1.3.0).

EBS attachment support

We need to be able to attach user-defined EBS volumes to the nomad nodes.
It should be possible to attach more than one EBS to specific data-center nodes.
Mounting of the attached volumes should be made automatically during instance creation.

Packer build of Amazon Linux 2 fails

Calling packer build -var 'aws_region=us-east-1' -var 'ami_regions=us-east-1' nomad-consul-docker-ecr.json

Fails with:

==> amazon-linux-ami2: Error modify AMI attributes: InvalidAMIAttributeItemValue: Invalid attribute item value " " for userId item type.
==> amazon-linux-ami2:  status code: 400, request id: 672be7fd-f89b-49ba-a76c-426341955d8d

Upgrade: Restart Policy for systemd services

Summary

To support rolling upgrades of the cluster orchestration system the restart policy for the nomad and consul services needs to be enhanced.

Expectation

In case of a normal shutdown of the application the service should not restart automatically.
In case of an abnormal shutdown the application the service should be restarted automatically by systemd.

Hint

Configuration systemd: Restart=

In nw-separation setup nomad fails to download the images

04/14/18 08:58:48 UTC Restarting Task restarting in 30.185113191s
04/14/18 08:58:48 UTC Driver Failure failed to initialize task "ping_service_task" for alloc "b9f34abf-f20c-27ff-6341-5c1040f9476f": Failed to pull 307557990628.dkr.ecr.us-east-1.amazonaws.com/service/ping-service:0.0.7: API error (500): {"message":"Get https://307557990628.dkr.ecr.us-east-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"}

Every container is accessible over ingress ALB

All traffic on ingress ALB is routed to the services subnet on the public-services data-center. There on each node runs fabio.
Currently fabio is able to route the traffic to every location inside the vpc and will do as soon as a job registers itself at consul using the tagging-mechanism.

For example prometheus is running o the backoffice nodes. Fabio will route the traffic to them since prometheus currently registers at consul using the tagging-mechanism.

Cached Nomad Node Id

Using AWS snapshots for instance startup runs into duplicated node ids due to the node id caching.

Check: Consul Issue 3415 for further details.

Add possibility to tag instances

With having the ability to tag datacenter nodes it is easier to distinguish between different types of them.
This helps especially if you want to find the right target nodes for node draining in a script based manner.

i.e. for a version upgrade

UDP blocked for docker ports

Some service (i.e. thanos) needs to be able to communicate with its cluster components using UDP.
Currently only TCP traffic is allowed across the docker ports.

--> open UDP.

Remove EFS and S3 dependencies in CloudInit of Nomad Client Nodes

Issue

Currently there are two variables that are used to modify the cloud-init script from the outside.
This adds a unnecessary dependency to components that are not needed for the COS deployment. Thus the COS is less usable/ harder to setup, since all dependencies have to be satisfied.

Goal

Remove the not needed dependencies + the variables.
https://github.com/MatthiasScholz/cos/blob/master/modules/nomad-datacenter/user-data-nomad-client.sh#L53

Move to terraform 0.12

Terraform 0.12 is now out for a while (current version is 0.12.9). Since it contains breaking changes in the HCL the code of the COS has to be adjusted to it.

As a precondition the referenced modules have to be terraform 0.12 compatible as well

terrafrom-aws-nomad at v0.5.0 is compatible
terraform-aws-consul at v0.7.0 is compatible

Names of nomad and consul instances are unclear

Current state

nomad-client (dc public-services): COS-public-services-shiner
consul: COS-consul-shiner
nomad-server: nomad-example-server

Problem

The names can't be long, since they are used inside the official nomad-module as name-prefix, which introduces size limits for names (32 for target-groups, 64 for security-groups)
To avoid collisions the name should contain a random part.

Proposal

Keep random part at the end
Restrict the data-center name (only in this names) to max of 10 items.
Add abbrev. for node-type

nomad-client: NMC
nomad-server: NMS
consul: SDCFG (Service Discovery + Configuration)

Examples

COS-SDCFG-consul-shiner
COS-NMC-public-ser-shiner
COS-NMC-private-se-shiner
COS-NMC-backoffice-shiner
COS-NMC-content-se-shiner
COS-NMS-leader-shiner

Missing network device dependency

Due to the missing network device dependency for the nomad application startup during system startup the application can not connect to the cluster.

Support AWS System Manager Session Manager

Summary

Making use of the AWS Session Manager will allow to deprecate the bastion setup in order to debug cluster issues where direct instance access is needed.

Using the AWS Session Manager provides better security and less infrastructure to maintain and pay for.

AWS System Manager Session Manager

Details

ensure AWS Session Manager is installed on the instances ( by default for AWS AmazonLinux 2 )
ensure instance is allowed to interact with AWS Session Manager ( instance profile )
cleanup documentation to advertise AWS Session Manager over Bastion setup ( +sshuttle )

Update to v0.4.4 of consul module

Update the infrastructure module for consul using the newest version v0.4.4.
This includes breaking changes in their API.

Provide ingress ALB connection to backoffice DC nodes.

It makes sense to separate user ingress-traffic from ops requests (i.e. to the monitoring system).
A natural separation can be done via load-balancer.
To enable this it has to be possible to connect a ALB to the ASG of the Backoffice nodes.

Upgrade nomad to 0.8.0

Released just a few days ago ... we should upgrade.
Changelog: https://github.com/hashicorp/nomad/blob/v0.8.0/CHANGELOG.md

Interesting improvements:

core: Servers can now service client HTTP endpoints [GH-3892] ... would solve #9
cli: Node status and filesystem related commands do not require direct network access to the Nomad client nodes [GH-3892]
ui: All views poll for changes using long-polling via blocking queries [GH-3936]

Script to calculate cidr-blocks for egress_aws NatGW

Why

We want to restrict access of the nomad-masters (leader) to the internet. That's why they are inside a subnet that has only access to AWS services. This restriction is made by allowing only routes to AWS services a specified at: https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

Problem - access to ECR needs a lot of the ip's specified at https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

Which results in more than 50 route-entries for a route-table. And the limit for route-tables is 50.
Of course a limit increase can be requested, but due to potential performance impact it's not recommended to do so.

With #6 we solved the issue with widening the cidrs to /8. But as a long term solution we need to have more restricting cidr's (i.e. /16).
But to generate these correctly (+ merge them) and optimal (least number of rules possible) we need a sophisticated script.

Missing example for nomad module

Cleanup ami module

Recommenced, mainly used and tested is on the module ami2.
There are no know use cases in order to preserve AMI creation using the module ami.

Hence this issue should be used to remove the unused module.

Update Nomad/ Consul ami to use consul 1.4

A new consul version was released.
This is 1.4.0 which includes bugfixes and interesting features to be tested.

Release Notes

Cached Consul Node Id

This issue is similar to this one: #25.

The data cached by consul is backed into a snapshot and hence reused when the snapshot get instantiated a second time.

Docker ports open to 0.0.0.0/0

No need to have the docker ports (20000...32000) open to the "world".
At least they can be restricted to the cidr of the used vpc or better just connect the SG's of the nodes accordingly.

Refactoring

cidr_blocks -> source_security_group_id

Fix get_nomad_client_info.sh

Currently it returns only the public-services client node since it works on instance-tags. But these are no longer valid.

get_nomad_client_info.sh
2018-04-14 10:52:29 [INFO] [get_nomad_client_info.sh] aws ec2 describe-instances --region us-east-1 --profile playground --filter Name=tag:Name,Values=COS-NMC-public-ser-coyote Name=instance-state-name,Values=running
ISTANCE_ID INSTANCE_IP INSTANCE_IP (private)
52.91.138.241 i-0263e643f56bf7faa 10.128.50.55

Add time synchronisation support to the AMIs

AWS provides a Amazon Time Sync Service allowing to get accurate and synchronized clocks on each instance this will facilitate the monitoring and logging.

Amazon Time Sync Service

Enable jump from shared to bastion

The current SG on the bastion does not support jumping from the shared network.

Prometheus Port on Backoffice is Closed

Root-Cause

The backoffice data-center nodes are not allowed to communicate with each other over port 4646.
But this port is needed in order to be able to scrape metrics from them.

Task

Allow the backoffice-nodes to access each other over port 4646.

Add https listener for ui-ALB's

Problem

Currently the communication over the UI-ALB's to the COS components is using HTTP. Thus no encryption at transport is in place.

Don't use external repositories to get images/ binaries for Nomad

Why

Problem - binaries/ images from non ECR sources.

The fabio binary is loaded directly from github. But there is no route that allows egress access to GH.

Refacor amazon-ecr-credential-helper usage

The configuration is cluttered around and it is unclear what is necessary and used.

The docker config.json is at two places:

~/.docker/config.json
/etc/docker/config.json

As well for Amazon Linux 2 LTS now a yum package is available:

Installation + "-y"

This will facilitate binary installation.

Restrict access to nomad-servers

see nomad/servers.tf

 # HACK: Still everything open for the nomad-servers. Has to be closed.
  allowed_inbound_cidr_blocks = ["0.0.0.0/0"]

Restricting the allowed CIDR blocks for the UI-ALB's does not work

The variable allowed_cidr_blocks_for_ui_alb is not used inside the module correct. Thus setting the variable has no effect currently.

Migrate Terratest to go modules

Summary

Terratest recently added support for the new dependency management system in go.

Hence we can migrate from go dep to go modules.

References

Unable to build AMI with packer

When installing consul, sudo yum update -y is called.
During this process the os is not able to load some packages.

amazon-linux-ami2: Updated:
amazon-linux-ami2: amazon-linux-extras.noarch 0:1.4-1.amzn2
amazon-linux-ami2: aws-cfn-bootstrap.noarch 0:1.4-30.amzn2
amazon-linux-ami2: dhclient.x86_64 12:4.2.5-58.amzn2.3.2
amazon-linux-ami2: dhcp-common.x86_64 12:4.2.5-58.amzn2.3.2
amazon-linux-ami2: dhcp-libs.x86_64 12:4.2.5-58.amzn2.3.2
amazon-linux-ami2: dotnet-host.x86_64 0:2.1.0_preview2_26411_07-1
amazon-linux-ami2: ec2-utils.noarch 0:0.5-1.amzn2.0.1
amazon-linux-ami2: kernel-tools.x86_64 0:4.14.33-59.34.amzn2
amazon-linux-ami2: mssql-server.x86_64 0:14.0.3025.34-3
amazon-linux-ami2:
amazon-linux-ami2: Failed:
amazon-linux-ami2: msodbcsql17.x86_64 0:17.0.1.1-1 msodbcsql17.x86_64 0:17.1.0.1-1
amazon-linux-ami2: mssql-tools.x86_64 0:17.0.1.1-1 mssql-tools.x86_64 0:17.1.0.1-1

Establish CI/CD System

Overview

Running a CI/CD system will improve the confidence into change introduced into the repository due to the automated execution of verification steps like validity checks, linting, test execution etc.

Details

Two systems are currently in evaluation:

Prometheus can't scrape metrics from nomad-clients

Reason the SG on the clients does not allow inbound on 4646

Upgrade: Documentation of Cluster Upgrade

Description how to upgrade

consul server instances
nomad server instances
nomad client instances

Nomad is not able to pull from DockerHub

When trying to deploy a docker image from docker-hub nomad responds with the following error message:

failed to initialize task "ping_service_task" for alloc "8f46a473-90de-3e96-71bb-149ad2916453": Failed to find docker auth for repo "thobe/ping_service": docker-credential-ecr-login with input "thobe/ping_service" failed with stderr: credentials not found in native keychain

Example job file:

# job>group>task>service
# container for tasks or task-groups that nomad should run
job "ping_service" {
  datacenters = ["public-services"]
  #,"private-services","content-connector","backoffice"]
  type = "service"

  meta {
    my-key = "example"
  }

  # The group stanza defines a series of tasks that should be co-located on the same Nomad client.
  # Any task within a group will be placed on the same client.
  group "ping_service_group" {
    count = 1

    # restart-policy
    restart {
      attempts = 10
      interval = "5m"
      delay = "25s"
      mode = "delay"
    }

     ephemeral_disk {
      migrate = false
      size    = "50"
      sticky  = false
    }

    # The task stanza creates an individual unit of work, such as a Docker container, web application, or batch processing.
    task "ping_service_task" {
      driver = "docker"
      config {
        # Docker Hub:
        image = "thobe/ping_service:0.0.9"
      }

      logs {
        max_files     = 2
        max_file_size = 10
      }

      config {
        port_map = {
          http = 8080
        }
      }

      resources {
        cpu    = 100 # MHz
        memory = 20 # MB
        network {
          mbits = 10
          port "http" {
          }
        }
      }

      # The service stanza instructs Nomad to register the task as a service using the service discovery integration
      service {
        name = "ping-service"
        tags = ["urlprefix-/ping"] # fabio
        port = "http"
        check {
          name     = "Ping-Service Alive State"
          port     = "http"
          type     = "http"
          method   = "GET"
          path     = "/ping"
          interval = "10s"
          timeout  = "2s"
        }
       }

      env {
        SERVICE_NAME        = "${NOMAD_DC}",
        PROVIDER            = "ping-service",
        # uncomment to enable sd over consul
        CONSUL_SERVER_ADDR  = "172.17.0.1:8500"
        #PROVIDER_ADDR = "ping-service:25000"
      }
    }
  }
}

Client Nodes can't access each other over 4646

This problem blocks the for example the prometheus to scrape data or to see logs.

Make instance count configurable on DC level

It would be nice if it is possible to adjust the amount of nodes per data-center.
For example on the backoffice dc there might be only a fraction of nodes necessary compared to the number of nodes needed for the private- or public services dc.

Access to logs does not work

Problem

calling nomad logs -stderr -f -job ping_service locally (even with active sshuttle)
Does not show any logs.
When the command is executed directly on the server, it works.

dnsmasq not used for DNS based service-discovery

Since not all services support consul based service-discovery, we added dnsmasq on the instances.
As specified at https://www.consul.io/docs/guides/forwarding.html dnsmasq can be used to intercept all queries to the consul domain to 127.0.0.1:8600.
There at 127.0.0.1:8600 the consul-agent is running which is able to provide a service-discovery based on the service-catalog.

But dnsmaq is not configured correctly in order to intercept local calls.

Nomad does not start with newest AMI version

Problem

Linux AMI2 build based on current state of master is buggy.
On instances based on this AMI nomad does not start at all.

● nomad.service - Nomad
   Loaded: loaded (/etc/systemd/system/nomad.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mi 2018-09-26 09:25:15 UTC; 2h 9min ago
     Docs: https://nomadproject.io/docs/
  Process: 4462 ExecStart=/opt/nomad/bin/nomad agent -config /opt/nomad/config -data-dir /opt/nomad/data (code=exited, status=1/FAILURE)
 Main PID: 4462 (code=exited, status=1/FAILURE)

Sep 26 09:25:15 ip-10-124-52-121.us-east-2-integration systemd[1]: Started Nomad.
Sep 26 09:25:15 ip-10-124-52-121.us-east-2-integration systemd[1]: Starting Nomad...
Sep 26 09:25:15 ip-10-124-52-121.us-east-2-integration nomad[4462]: No configuration loaded from /opt/nomad/config
Sep 26 09:25:15 ip-10-124-52-121.us-east-2-integration nomad[4462]: ==> Must specify either server, client or dev mode for the agent.
Sep 26 09:25:15 ip-10-124-52-121.us-east-2-integration systemd[1]: nomad.service: main process exited, code=exited, status=1/FAILURE
Sep 26 09:25:15 ip-10-124-52-121.us-east-2-integration systemd[1]: Unit nomad.service entered failed state.
Sep 26 09:25:15 ip-10-124-52-121.us-east-2-integration systemd[1]: nomad.service failed.

Add network-separation support

Put nomad data-centers into separate networks

matthiasscholz / cos Goto Github PK

cos's Introduction

Basic Cluster Orchestration System

Architecture

Docker Registry

HA-Setup

Structure

_docs

examples

modules

Module Dependencies

Troubleshooting

Monitoring Server and Nodes

Nomad CLI complains about invalid Certificate

Provide access to CA cert-file

Disable Certificate verification

References

License

cos's People

Contributors

Stargazers

Watchers

Forkers

cos's Issues

Overview

Reference

Summary

Expectation

Hint

Issue

Goal

Current state

Problem

Proposal

Examples

Summary

Details

Why

Problem - access to ECR needs a lot of the ip's specified at https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

Refactoring

Root-Cause

Task

Problem

Why

Problem - binaries/ images from non ECR sources.

Summary

References

Overview

Details

Problem

Problem

Recommend Projects

Recommend Topics

Recommend Org