Giter Club home page Giter Club logo

hmrc.ecs-appmesh-task-helper's Introduction

ecs-appmesh-task-helper

What this is

The repo is to define a small event driven python app that orchestrates the lifecycle stages of an AWS ECS task in an AppMesh environment. The stages we are interested in are startup and shutdown. In an ECS service with a load balancer attached, during a task definition update ECS ensures that the new tasks started are registered with the LB and in service before it continues to drain and stop the old tasks. This native helper does not exist for AppMesh, so this task helper has been written in order to bridge that gap.

Operation

When a task in an AppMesh mesh has been asked by ECS to stop, it needs to notify its downstreams that it is stopping so that the downstreams stop sending traffic before the task stops. This is to ensure the downstreams do not receive any 5xx or timeout errors, which could impact service performance. This is done by signalling the Envoy in the task to go into healthcheck fail mode. The Envoy admin api has an endpoint which controls this. Secondly, this task helper is designed to delay the termination of the application and envoy containers in the task in order to ensure the downstream envoys have completed their upstream health check cycle. When ECS terminates a task, it executes "docker stop" against each container in the task in the reverse order of the defined dependencies in the task. "docker stop" sends a SIGTERM signal to the process inside the container when it runs. After the SIGTERM, this task helper calls the admin api and pauses for a definable drain timeout period. When the task helper DependsOn the envoy and application containers, their termination is delayed. There is a further problem with a race condition that often sees retiring tasks stopped before new tasks are in service during a task definition update. There is a delay period encoded before the admin api is called to ensure that new tasks have plenty of time to become in service before retiring tasks are drained and stopped. This ensures an overlap between the new and retiring tasks so that updates can be non-disruptive.

How to use this

A container image containing the application can be generated by running "make build" in the project folder. This uses docker with python:slim and python:alpine source images. The container image should then be included in the container definitions for your ECS task, with DependsOn dependencies defined to your application and envoy containers. The task helper container should have a StopTimeout of 120s defined to ensure optimum operation. There are 2 environment variables that can be optionally passed to the container to adjust its operation.

  • DRAIN_DELAY: The time period to wait after the SIGTERM but before calling the healthcheck/fail admin endpoint. Defaults to 40
  • DRAIN_TIMEOUT: The time period to wait after calling the healthcheck/fail admin endpoint. Defaults to 40

License

This code is open source software licensed under the Apache 2.0 License.

hmrc.ecs-appmesh-task-helper's People

Contributors

sreddel avatar adam-nx avatar hmrc-web-operations avatar craigjbass avatar webit4me avatar cjhodges77 avatar fruiti-lewis avatar neilmillard avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.