Giter Club home page Giter Club logo

spot-termination-exporter's Introduction

Spot instance termination exporter

Prometheus exporters are used to export metrics from third-party systems as Prometheus metrics - this is an exporter to scrape for AWS spot price termination notice on the instance for Hollowtrees.

Spot instance lifecycle

  • User submits a bid to run a desired number of EC2 instances of a particular type. The bid includes the price that the user is willing to pay to use the instance for an hour.
  • If the bid price exceeds the current spot price (that is determined by AWS based on current supply and demand) the instances are started.
  • If the current spot price rises above the bid price or there is no available capacity, the spot instance is interrupted and reclaimed by AWS. 2 minutes before the interruption the internal metadata endpoint on the instance is updated with the termination info.
  • If the instance is interrupted the action taken by AWS varies depending on the interruption behaviour (start, stop or hibernate) and the request type (one-time or persistent). These can be configured when requesting the instance. See more about this here

Spot instance termination notice

The Termination Notice is accessible to code running on the instance via the instance’s metadata at http://169.254.169.254/latest/meta-data/spot/termination-time. This field becomes available when the instance has been marked for termination and will contain the time when a shutdown signal will be sent to the instance’s operating system. At that time, the Spot Instance Request’s bid status will be set to marked-for-termination.
The bid status is accessible via the DescribeSpotInstanceRequests API for use by programs that manage Spot bids and instances.

Quick start

The project uses the promu Prometheus utility tool. To build the exporter promu needs to be installed. To install promu and build the exporter:

go get github.com/prometheus/promu
promu build

The following options can be configured when starting the exporter:

./spot-termination-exporter --help
Usage of ./spot-termintation-exporter:
  -bind-addr string
        bind address for the metrics server (default ":9189")
  -log-level string
        log level (default "info")
  -metadata-endpoint string
        metadata endpoint to query (default "http://169.254.169.254/latest/meta-data/")
  -metrics-path string
        path to metrics endpoint (default "/metrics")

Test locally

The AWS instance metadata is available at http://169.254.169.254/latest/meta-data/. By default this is the endpoint that is being queried by the exporter but it is quite hard to reproduce a termination notice on an AWS instance for testing, so the meta-data endpoint can be changed in the configuration. There is a test server in the utils directory that can be used to mock the behavior of the metadata endpoint. It listens on port 9092 and provides dummy responses for /instance-id and /spot/instance-action. It can be started with:

go run util/test_server.go

The exporter can be started with this configuration to query this endpoint locally:

./spot-termination-exporter --metadata-endpoint http://localhost:9092/latest/meta-data/ --log-level debug

Metrics

# HELP aws_instance_metadata_service_available Metadata service available
# TYPE aws_instance_metadata_service_available gauge
aws_instance_metadata_service_available{instance_id="i-0d2aab13057917887"} 1
# HELP aws_instance_termination_imminent Instance is about to be terminated
# TYPE aws_instance_termination_imminent gauge
aws_instance_termination_imminent{instance_action="stop",instance_id="i-0d2aab13057917887"} 1
# HELP aws_instance_termination_in Instance will be terminated in
# TYPE aws_instance_termination_in gauge
aws_instance_termination_in{instance_id="i-0d2aab13057917887"} 119.888545

Default Hollowtrees node exporters associated to alerts:

spot-termination-exporter's People

Contributors

kozmagabor avatar martonsereg avatar matyix avatar mvisonneau avatar waynz0r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

spot-termination-exporter's Issues

Instance scheduled maintenance

Is your feature request related to a problem? Please describe.
I'm looking for a metric that reports ec2 instance scheduled maintenance on ondemand instances. Tried this one but only works now for stop instances. Adding support for ondemand should parse the http://169.254.169.254/latest/meta-data/events/maintenance/scheduled entry. I do not know if spot instances also may use this entry for underlying host problems

Describe the solution you'd like to see
A new metric for aws_scheduled_maintenance_seconds

$ curl http://169.254.169.254/latest/meta-data/events/maintenance/scheduled
[ {
"NotBefore" : "1 Jan 2021 02:00:00 GMT",
"Code" : "instance-stop",
"Description" : "The instance is running on degraded hardware",
"EventId" : "instance-event-07a1c1c08b77d4bf1",
"State" : "active"
} ]

Describe alternatives you've considered
Right now we manage this via the email notifications that is far from ideal way to find and control what instances may have problems, the deadline for releasing the host and if it was already solved.

Additional context
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/monitoring-instances-status-check_sched.html#viewing_scheduled_events

Optionally expose new metrics for AWS Spot Rebalance Recommendations

Is your feature request related to a problem? Please describe.

AWS now provide a new metadata endpoint to (potentially) pre-warn of likely spot interruption on instances. Metrics on these rebalance recommendations along with the time they were generated are likely to be useful for cluster operators.

Describe the solution you'd like to see

The spot termination exporter to (potentially optionally) expose a number of new metrics:

aws_instance_metadata_service_events_available     Metadata service events endpoint available
aws_instance_rebalance_recommended                 Instance rebalance is recommended
aws_instance_rebalance_recommended_at              Unix epoch rebalance recommendation was exposed at

Describe alternatives you've considered

A completely separate component scraping the relevant metadata endpoint (as it differs from the already scraped spot termination endpoint.) However this would result in running another daemonset alongside the existing one.

Additional context

I've already done most of the work to perform this scraping and metrics exposition on an internal fork of the project, happy to raise the PR to add this functionality to the wider project.

Is this still maintained?

Describe the bug

It doesn't support IMDSv2.

Steps to reproduce the issue:

Expected behavior

Screenshots

Additional context

formatting issue and go 1.11 behavior

Hello!

I faced with different issues during installation your exporter.

  1. seems like go version for promu have different format:
    go: version: 1.11.4

Example: https://github.com/prometheus/promu/blob/master/.promu.yml#L4

  1. i faced with: go get github.com/prometheus/promu && promu build -> -bash: promu: command not found.

Here is my instcution, how to install:

  • install go.
  • clone github.com/prometheus/promu repo.
  • install cmake and gcc. For example in centos you could run: sudo yum install cmake gcc -y
  • cd promu && make
  • sudo mv promu /usr/local/bin
  • clone github.com/banzaicloud/spot-termination-exporter repo.
  • change .promu.yml as i described in point 1)

Environment: CentOS-7

If you will enable an access i can send MR with those changes.

Thanks!

Missing support for annotation + nodeselector issue

The currect daemonset.yaml does support adding of Annotations.

I had to update the file as follow:
{{- with .Values.Annotations }}
annotations:
{{ toYaml . | nindent 8 }}
{{- end }}

Another bug that i found when using node selector, the useHostNetwork failed, fixed by :
{{ toYaml .Values.resources | indent 10 }}
hostNetwork: {{ .Values.useHostNetwork }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{ toYaml . | indent 8 }}
{{- end }}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.