Giter Club home page Giter Club logo

buildkite-agent-scaler's Introduction

Buildkite Agent Scaler

An AWS lambda function that handles the scaling of an Amazon Autoscaling Group (ASG) based on metrics provided by the Buildkite Agent Metrics API.

In practice, we've seen 300% faster initial scale-ups with this lambda vs native AutoScaling rules. ๐Ÿš€

Why?

The Elastic Stack depends on being able to scale up quickly from zero instances in response to scheduled Buildkite jobs. Amazon's AutoScaling primatives have a number of limitations that we wanted more granular control over:

  • The median time for a scaling event to be triggered was 2 minutes, due to needing two samples with a minimum period of 60 seconds between.
  • Scaling can either be by a fixed rate, a fixed step size or tracking, but tracking doesn't work well with custom metrics like we use.

How does it work?

The lambda (or cli version) polls the Buildkite Metrics API every 10 seconds, and based on the results sets the DesiredCount to exactly what is needed. This allows much faster scale up.

Gracefully scaling in

Whilst the lambda does support scaling in via setting DesiredCount, Amazon ASGs appear to not send Lifecycle Hooks before terminating instances, so jobs in progress are interrupted.

Instead, in the Elastic Stack we run the scaler with scale-in disabled (DISABLE_SCALE_IN) and rely on the recent addition in buildkite-agent v3.10.0 of --disconnect-after-idle-timeout in the Agent combined with a systemd PostStop script to terminate the instance and atomically decrease the DesiredCount after the agent has been idle for a time period. We've found it to work really well, and is less complicated than relying on [lifecycled][] and Lifecycle Hooks.

See the forum post for more details.

Publishing Cloudwatch Metrics

The scaler collects it's own metrics and doesn't require the buildkite-agent-metrics. It supports optionally publishing the metrics it collects back to Cloudwatch, although it only supports a subset of the metrics that the buildkite-agent-metrics binary collects:

  • Buildkite > (Org, Queue) > ScheduledJobsCount
  • Buildkite > (Org, Queue) > RunningJobCount

Running as an AWS Lambda

An AWS Lambda bundle is created and published as part of the build process. The lambda will require the following IAM permissions:

  • cloudwatch:PutMetricData
  • autoscaling:DescribeAutoScalingGroups
  • autoscaling:SetDesiredCapacity

It's entrypoint is handler, it requires a go1.x environment and requires the following env vars:

  • BUILDKITE_AGENT_TOKEN or BUILDKITE_AGENT_TOKEN_SSM_KEY
  • BUILDKITE_QUEUE
  • AGENTS_PER_INSTANCE
  • ASG_NAME

If BUILDKITE_AGENT_TOKEN_SSM_KEY is set, the token will be read from AWS Systems Manager Parameter Store GetParameter which can also read from AWS Secrets Manager.

aws lambda create-function \
  --function-name buildkite-agent-scaler \
  --memory 128 \
  --role arn:aws:iam::account-id:role/execution_role \
  --runtime go1.x \
  --zip-file fileb://handler.zip \
  --handler handler

Running locally for development

$ aws-vault exec my-profile -- go run . \
  --asg-name elastic-runners-AgentAutoScaleGroup-XXXXX
  --agent-token "$BUILDKITE_AGENT_TOKEN"

Copyright

Copyright (c) 2014-2019 Buildkite Pty Ltd. See LICENSE for details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.