Prometheus exporter for AWS CloudWatch - Discovers services through AWS tags, gets CloudWatch metrics data and provides them as Prometheus metrics with AWS tags as labels

License: Apache License 2.0

Go 92.44% Dockerfile 0.11% Makefile 0.08% Jsonnet 7.36%

cloudwatch cloudwatch-metrics prometheus prometheus-exporter

yet-another-cloudwatch-exporter's People

Contributors

Stargazers

Watchers

Forkers

discordianfish linefeedse nhinds alphagov sanchezpaco geekswine tyvich abuchanantw yleisradio grandtechcloud wjam cdchris12 mg03 eladdolev ohbonsai junohq raags nkalev hcbraun quintoandar bhks nagendrapy deepak1100 celtra gabriel-dantas98 josephreynolds wcannon gruberrichard bheight-zymergen peak rabunkosar-dd j-nix wangbokun eugenetolan-zz mentos1386 deanrock daviddetorres mkriedel yuriilf wutianchen abhi4890 moma891 rrusso1982 mistersquishy ebersb nickman afroschauer rajumnit tuapuikia ivannpaz alecrajeev pluvio-phile reddoggad hannaeko alexandermandl gajendersingh1982 udhos segmentio verygood-ops gloterman mannytoledo singhjagmohan1000 vishalraina botono c0psrul3 solidnerd dharmi rvandegrift bigboxdave pr11t capsulehealth 12ushan arnitolog jp curioustauseef yikela1990 smcavallo shencan smr-devlin andrewchubatiuk raiv-devops nethalo thobach araldo zqad lablabs its-bussdev uehara jeschkies diraol maziadi wmmmd rhys-evans gethopi andreynpetrov nmiculinic andonescu bonclay7 isgasho hbocodelabs

yet-another-cloudwatch-exporter's Issues

many-to-many error

Sometimes after exporter's restart, I got an error:
many-to-many matching not allowed: matching labels must be unique on one side
I guess it is due to one of the metrics in the query is missing at that moment. Do you have any ideas on how to handle this?
Here is the query which I use:
aws_ec2_cpuutilization_average + on (name) group_left(tag_Product, tag_Name) aws_ec2_info

Returned metrics are inconsistent and average crashes the service

I have the following config:

discovery:
  - region: us-west-2
    type: 'ec2'
    searchTags:
      - Key: 'aws:elasticmapreduce:instance-group-role'
        Value: 'MASTER'
    metrics:
      - name: CPUUtilization
        statistics:
        - 'Average'
        period: 300
        length: 60
      - name: StatusCheckFailed
        statistics:
        - 'Maximum'
        period: 300
        length: 60

My goal is to return all my EMR master nodes and get their CPU Usage as well as if they've had a status check fail.

Whenever I do this a couple of unexpected things happen:

YACE crashes with the following message

2018/10/22 17:27:38 Parse config..
2018/10/22 17:27:38 Startup completed
2018/10/22 17:27:41 Not implemented statisticsAverage

When I change the CPUUtilization statistic to Maximum it doesn't crash but sometimes it only returns one or two instances and sometimes it returns none(instead of 11 instances like I expect).

I've tested with both 0.6.1 and 0.7.0-alpha

Add availabilicy zone for ec2 instances metrics

Hello. Is it possible to add availability zone as label for ec2 instances metrics?

Custom labels for static metrics

Hey,

is it somehow possible without changing Go code to add custom labels when using Static metrics ?

Thanks,
Jonas

Migrating to aws sdk interface calls

This should help to add tests and refactor smaller parts of code afterwards.

YACE manages to get metrics but crashes at the end

Hi,

I've got weird crash error, please have a look and comment is this a bug or configuration problem:

./yet-another-cloudwatch-exporter -debug
2019/05/22 16:31:30 Parse config..
2019/05/22 16:31:30 Startup completed
2019/05/22 16:31:35 CLI helper - aws cloudwatch get-metric-statistics --metric-name SuccessfulCalls --dimensions --namespace AWS/EC2/API --statistics Sum --period 60 --start-time 2019-05-22T16:26:35Z --end-time 2019-05-22T16:31:35Z
2019/05/22 16:31:35 {
EndTime: 2019-05-22 16:31:35.95210587 +0000 UTC m=+5.863628358,
MetricName: "SuccessfulCalls",
Namespace: "AWS/EC2/API",
Period: 60,
StartTime: 2019-05-22 16:26:35.952108304 +0000 UTC m=-294.136369495,
Statistics: ["Sum"]
}
2019/05/22 16:31:35 {
EndTime: 2019-05-22 16:31:35.95210587 +0000 UTC m=+5.863628358,
MetricName: "SuccessfulCalls",
Namespace: "AWS/EC2/API",
Period: 60,
StartTime: 2019-05-22 16:26:35.952108304 +0000 UTC m=-294.136369495,
Statistics: ["Sum"]
}
2019/05/22 16:31:36 {
Datapoints: [
{
Sum: 227,
Timestamp: 2019-05-22 16:30:00 +0000 UTC,
Unit: "None"
},
{
Sum: 146,
Timestamp: 2019-05-22 16:26:00 +0000 UTC,
Unit: "None"
},
{
Sum: 128,
Timestamp: 2019-05-22 16:27:00 +0000 UTC,
Unit: "None"
},
{
Sum: 173,
Timestamp: 2019-05-22 16:28:00 +0000 UTC,
Unit: "None"
},
{
Sum: 201,
Timestamp: 2019-05-22 16:29:00 +0000 UTC,
Unit: "None"
}
],
Label: "SuccessfulCalls"
}
2019/05/22 16:31:36 http: panic serving 10.96.98.133:59700: descriptor Desc{fqName: "aws_ec2/api_successful_calls_sum", help: "Help is not implemented yet.", constLabels: {}, variableLabels: []} is invalid: "aws_ec2/api_successful_calls_sum" is not a valid metric name
goroutine 4 [running]:
net/http.(*conn).serve.func1(0xc0001dc140)
/usr/lib/golang/src/net/http/server.go:1746 +0xd0
panic(0x95b9c0, 0xc00044f470)
/usr/lib/golang/src/runtime/panic.go:513 +0x1b9
github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc00036c0a0, 0xc0000dbcc0, 0x1, 0x1)
/root/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:391 +0xad
main.metricsHandler(0xae4f80, 0xc000212000, 0xc0001d4400)
/root/yet-another-cloudwatch-exporter/main.go:50 +0x2e1
net/http.HandlerFunc.ServeHTTP(0xa1a7b8, 0xae4f80, 0xc000212000, 0xc0001d4400)
/usr/lib/golang/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0xe895c0, 0xae4f80, 0xc000212000, 0xc0001d4400)
/usr/lib/golang/src/net/http/server.go:2361 +0x127
net/http.serverHandler.ServeHTTP(0xc0001c8680, 0xae4f80, 0xc000212000, 0xc0001d4400)
/usr/lib/golang/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc0001dc140, 0xae5800, 0xc000066700)
/usr/lib/golang/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
/usr/lib/golang/src/net/http/server.go:2851 +0x2f5

As you can see I get datapoints, but unfortunately yace crashes without moving forward, please advice.

My config:

static:

namespace: 'AWS/EC2/API'
region: redacted
metrics:
- name: SuccessfulCalls
  statistics:
  - 'Sum'
    period: 60
    length: 300

Thanks,
Jonas

alb type discovers Network load balancers too but does not collect metrics

I have this simple configuration.

discovery:
  exportedTagsOnMetrics:
    alb:
      - Name
  jobs:
  - type: "alb"
    region: us-west-2
    metrics:
      - name: UnHealthyHostCount 
        statistics: [Maximum]
        period: 60
        length: 600

It works great and it discovers 12 Load balancers that are running. 3 of these LBs are Network Load Balancers. when it comes to metrics, it only exposes metrics for the Application Load Balancers.

Is there a way to collect metrics for NLBs? or I am missing something here.

Improve documentation for static metrics by some examples e.g. for s3

Hello the project itself is very interesting, but for some reason i cannot get the metrics at all, i start to use with this simple config

static:
  - namespace: AWS/S3
    region: us-east-1
    metrics:
      - name: BucketSizeBytes
        statistics:
          - Average
        period: 86400
        length: 172800
        disableTimestamp: true

but do not get any metrics, user policy and permissions are good in logs there is nothing, when i try to enable debug i get this

2019/02/13 08:24:20 {
  EndTime: 2019-02-13 08:24:20.022910123 +0000 UTC m=+23.127186598,
  MetricName: "BucketSizeBytes",
  Namespace: "AWS/S3",
  Period: 86400,
  StartTime: 2019-02-11 08:24:20.022910469 +0000 UTC m=-172776.872813072,
  Statistics: ["Average"]
}
2019/02/13 08:24:20 {
  EndTime: 2019-02-13 08:24:20.022910123 +0000 UTC m=+23.127186598,
  MetricName: "BucketSizeBytes",
  Namespace: "AWS/S3",
  Period: 86400,
  StartTime: 2019-02-11 08:24:20.022910469 +0000 UTC m=-172776.872813072,
  Statistics: ["Average"]
}

but there is a lot of buckets in this region, it try to change region but no luck, can you point me what i do wrong?

Gap in the prometheus time series

as already being pointed out in the documentation https://github.com/ivx/yet-another-cloudwatch-exporter#help-my-metrics-are-intermittent.

The length could cause a problem of intermittent metrics.

My thinking is the following:

Why would I consider a length of 600, when I can set it 1800. because in the case of 1800 yace scraper looks back 1800s(=30m) and take the latest data point (as far as I have understood).

Isn't always better than length: 600 ? (which only looks back 10m, and deliver missing value if nothing found, most likely due to the lagging of cloudwatch api).

The only drawback I can think of is, yace or prometheus has to process 3 times the amount of data. But that is nothing in most of the case.

no token found on elb sum metrics

Hey! Thanks for the great tool :)

Prometheus (we run 2.3.2) has some issues with scraping elb metrics with _sum suffix. My config looks like this:

  config.yml: |-
    discovery:
      exportedTagsOnMetrics:
        elb:
        - kubernetes.io/service-name
        - kubernetes.io/cluster/CLUSTER_NAME
      jobs:
      - type: "elb"
        region: eu-west-1
        searchTags:
          - Key: kubernetes.io/cluster/CLUSTER_NAME
            Value: owned
        metrics:
          - name: HealthyHostCount
            statistics:
            - 'Average'
            period: 60
            length: 300
          - name: UnHealthyHostCount
            statistics:
            - 'Average'
            period: 60
            length: 300     
          - name: HTTPCode_Backend_3XX
            statistics:
            - 'Sum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true                   
          - name: HTTPCode_Backend_2XX
            statistics:
            - 'Sum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true
          - name: HTTPCode_Backend_4XX
            statistics:
            - 'Sum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true
          - name: HTTPCode_Backend_5XX
            statistics:
            - 'Sum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true
          - name: HTTPCode_ELB_5XX
            statistics:
            - 'Sum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true
          - name: RequestCount
            statistics:
            - 'Sum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true
          - name: BackendConnectionErrors
            statistics:
            - 'Sum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true
          - name: EstimatedProcessedBytes
            statistics:
            - 'Sum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true
          - name: Latency
            statistics:
            - 'Maximum'
            - 'Average'
            - 'Minimum'
            period: 60
            length: 900
            delay: 300
            nilToZero: true

You can check the resulting metrics with:

cat metrics.txt | promtool check-metrics

which reveals this issue:

aws_elb_backend_connection_errors_sum: non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_estimated_processed_bytes_sum: non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_httpcode_backend_2_xx_sum: non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_httpcode_backend_3_xx_sum: non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_httpcode_backend_4_xx_sum: non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_httpcode_backend_5_xx_sum: non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_httpcode_elb_5_xx_sum: non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_request_count_sum: non-histogram and non-summary metrics should not have "_sum" suffix

Can we append some static suffix via config, or always append something to _sum*?

Evaluation of new cloudwatch search expression

I am not 100% sure if this is already in the API but this could allow us to get data more easy.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/search-expression-examples.html

Log error when credentials can't be used

I made a mistake configuring the IAM permissions for yace, yet it ran without logging and did nothing.
Instead it should log an error if the credentials aren't set/can't be retrieved from the metadata service.

Tags on individual metrics

Could yace be modified to add tags as labels on the individual metrics instead of (or as well as) the aws_<service>_info meta-metric?

For yace, graphing e.g. the request rate for all ELBs in a service would currently look like:

rate(aws_elb_requestcount_sum[5m]) + on (name) group_left(tag_kubernetes_io_service_name) aws_elb_info{tag_service="something"}

This is quite a long query to do filtering on a tag, and I expect that once there's a large amount of cloudwatch metrics ingested into Prometheus, this will be much slower than a query like rate(aws_elb_requestcount_sum{tag_service="something"}[5m]), since Prometheus will calculate the 5-minute rate for every ELB in its database before filtering down to a specific service.

(We had previously been using a machine_info metric similar to the aws_<service>_info metric for a while, but noticed there were performance problems when trying to graph metrics filtering by tags - e.g. graphing rate(node_vmstat_pgpgin{service="something",otherlabel="somethingelse"}[5m]) is an order of magnitude or 2 faster than graphing rate(node_vmstat_pgpgin[5m]) * on (instance) machine_info{service="something",otherlabel="somethingelse"}. It seems like when Prometheus is asked to join 2 metrics and only filter one of them, it isn't smart enough to optimise that join and it ends up reading data for every metric that exists for the unfiltered side.)

I could solve this with a recording rule in Prometheus for each metric that yace returns that joins the metric to the aws_<service>_info metric, but it would be cleaner if the metrics started out with the labels.

Is this a change you'd be interested in?

Improve documentation regarding AWS ENVS

Does yace support IAM user authetication by passing AWS_ACCESS_KEY_ID and SECRET_KEY as enviroment variables?

Feature request: Support for aws custom metrics

we want to export custom metrics in cloudwatch into prometheus through yace.

Because custom metrics have

undetermined value of dimension which doesn't come from resource arn, so it can not be exported through "discovery" block.
it has too many options for value, so it is hard to register into "static" block manually.

so we need some new mechanism to export custom metrics in yace, as far as I have understood.

Fix ci issues raised by new version handling

Currently release process does not work.

This blocks release of 0.4.0-alpha version to test.

New Discovery Type for yace

We forked yace in our internal project. Since more discovery type is required in our use case, we have implemented type sqs. So the following questions in our mind

would that be proper to contribute our work to this project ?
is that correct only "AWS Resource Group Tagging" supported services (https://docs.aws.amazon.com/resourcegroupstagging/latest/APIReference/Welcome.html) can be added to supported services list ?

thanks a lot

Use GetMetricData instead of GetMetricStatistics?

Currently yace with a relatively simple configuration makes 300 requests to cloudwatch for my 20 loadbalancers, retrieving each metric for each loadbalancer individually.

Since AWS supports bulk metric retrieval with its GetMetricData API, could this be used instead of GetMetricStatistics? It would reduce the number of API calls from 300 down to about 3 (there is a limit of 100 metrics per request)

My simple configuration:

discovery:
  - type: elb
    region: us-west-2
    searchTags:
      - Key: Service
        Value: abc
    metrics:
      - name: BackendConnectionErrors
        statistics: [Sum]
        period: 60
        length: 600
      - name: HealthyHostCount
        statistics: [Average]
        period: 60
        length: 600
      - name: HTTPCode_Backend_2XX
        statistics: [Sum]
        period: 60
        length: 600
      - name: HTTPCode_Backend_3XX
        statistics: [Sum]
        period: 60
        length: 600
      - name: HTTPCode_Backend_4XX
        statistics: [Sum]
        period: 60
        length: 600
      - name: HTTPCode_Backend_5XX
        statistics: [Sum]
        period: 60
        length: 600
      - name: HTTPCode_ELB_4XX
        statistics: [Sum]
        period: 60
        length: 600
      - name: HTTPCode_ELB_5XX
        statistics: [Sum]
        period: 60
        length: 600
      - name: Latency
        statistics: [Average]
        period: 60
        length: 600
      - name: RequestCount
        statistics: [Sum]
        period: 60
        length: 600
      - name: SpilloverCount
        statistics: [Sum]
        period: 60
        length: 600
      - name: SurgeQueueLength
        statistics: [Maximum, Sum]
        period: 60
        length: 600
      - name: UnHealthyHostCount
        statistics: [Average]
        period: 60
        length: 600
  - type: alb
    region: us-west-2
    searchTags:
      - Key: Service
        Value: abc
    metrics:
      - name: ActiveConnectionCount
        statistics: [Sum]
        period: 60
        length: 600
      - name: UnHealthyHostCount
        statistics: [Average]
        period: 60
        length: 600

Feature request: Add type ecs

We really would appreciate adding this type. We run many ECS/Fargate and would be amazing to auto discover them.

Not getting BucketSizeBytes from dynamic S3 metrics (but getting NumberOfObjects OK)

Config file:

    discovery:
      exportedTagsOnMetrics:
        ec2:
          - Name
        elb:
          - Name
        es:
          - Name
      jobs:
      - region: us-east-2
        type: "ec2"
        searchTags:
          - Key: environment
            Value: ^ci$
        metrics:
          - name: CPUUtilization
            statistics:
            - 'Average'
            period: 60
            length: 600
          - name: DiskReadOps
            statistics:
            - 'Average'
            period: 60
            length: 600
          - name: DiskWriteOps
            statistics:
            - 'Average'
            period: 60
            length: 600
          - name: StatusCheckFailed
            statistics:
            - 'Maximum'
            period: 60
            length: 600
      - region: us-east-2
        type: "elb"
        searchTags:
          - Key: KubernetesCluster
            Value: devtest.k8s.local
        metrics:
          - name: BackendConnectionErrors
            statistics:
            - Sum
            period: 60
            length: 600
          - name: HealthyHostCount
            statistics:
            - Average
            period: 60
            length: 600
          - name: HTTPCode_Backend_5XX
            statistics:
            - Sum
            period: 60
            length: 600
          - name: HTTPCode_ELB_5XX
            statistics:
            - Sum
            period: 60
            length: 600
          - name: Latency
            statistics:
            - Average
            - Maximum
            period: 60
            length: 600
          - name: RequestCount
            statistics:
            - Sum
            period: 60
            length: 600
          - name: SpilloverCount
            statistics:
            - Sum
            period: 60
            length: 600
          - name: SurgeQueueLength
            statistics:
            - Maximum
            - Sum
            period: 60
            length: 600
      - region: us-east-2
        type: "es"
        searchTags:
          - Key: Name
            Value: ci-logs
        metrics:
          - name: CPUUtilization
            statistics:
            - Average
            period: 60
            length: 600
          - name: FreeStorageSpace
            statistics:
            - Minimum
            period: 60
            length: 600
          - name: ClusterIndexWritesBlocked
            statistics:
            - Maximum
            period: 60
            length: 600
          - name: JVMMemoryPressure
            statistics:
            - Maximum
            period: 60
            length: 600
      - region: us-east-2
        type: "s3"
        searchTags:
          - Key: Environment
            Value: ci
        metrics:
          - name: BucketSizeBytes
            statistics:
            - Average
            period: 86400
            length: 172800
            disableTimestamp: true
          - name: NumberOfObjects
            statistics:
            - Average
            period: 86400
            length: 172800
            disableTimestamp: true

I am getting this in the metrics (as well as all the other ones requested):

# HELP aws_s3_info Help is not implemented yet.
# TYPE aws_s3_info gauge
aws_s3_info{name="arn:aws:s3:::<redacted 1>",tag_Environment="ci"} 0
aws_s3_info{name="arn:aws:s3:::<redacted 2>",tag_Environment="ci"} 0
# HELP aws_s3_number_of_objects_average Help is not implemented yet.
# TYPE aws_s3_number_of_objects_average gauge
aws_s3_number_of_objects_average{name="arn:aws:s3:::<redacted 1>"} 10468
aws_s3_number_of_objects_average{name="arn:aws:s3:::<redacted 2>"} 411004

The logs look clean:

2019/02/26 18:44:25 Parse config..
2019/02/26 18:44:25 Startup completed

Set timestamps for the exported metrics

We plan to add support to expose the exact timestamps for the exported metrics.

For this to work we will:

upgrade the go client library to at least 0.9.0 (they added support for timestamped metrics)
add a configuration option to set the timestamps (defaults to true)

Is this something you would be happy with?
Should we open separate PRs to upgrade the go client and add the timestamp feature?
Do we need the config option or would it be fine to always expose the timestamps?

Helm chart

This looks epic!

Could we get a helm chart to make installation easy?

Implement dynamodb

Released in 0.13.0-alpha @prakharjoshi

Only some Application ELB Metrics collected

Hey,

We've been using the tool to get our classic ELB CloudWatch metrics into Prometheus. Works great thanks a lot!! 👏
As our architecture evolves we now have application ELBs as well whose metrics we want to track.

In order to collect those I noticed an additional IAM permission had to be granted (could adjust that in the Readme via PR if this is intended, just let me know)
Unfortunately this still did not quite fix it for us. Only a few of the desired metrics are exported.

In the AWS CloudWatch dashboard I do see values for all the metrics configured. I added the configmap below. Is there anything we do wrong?

Best regards and keep up the good work!
Johannes

apiVersion: v1
kind: ConfigMap
metadata:
  name: cloudwatch-exporter-config
data:
  config.yml: |-
    discovery:
      exportedTagsOnMetrics:
        elb:
        - kubernetes.io/service-name
        - kubernetes.io/cluster/my-cluster
        alb:
        - kubernetes.io/service-name
        - kubernetes.io/cluster/my-cluster
      jobs:
      - type: "elb" 
        region: eu-west-1
        searchTags:
          - Key: kubernetes.io/cluster/my-cluster
            Value: owned
        metrics:
          - name: RequestCount
            statistics:
            - 'Sum'
            period: 60
            length: 300
          - name: BackendConnectionErrors
            statistics:
            - 'Sum'
            period: 60
            length: 300
      - type: "alb" 
        region: eu-west-1
        searchTags:
          - Key: kubernetes.io/cluster/my-cluster
            Value: owned
        metrics:
          - name: ActiveConnectionCount
            statistics:
            - 'Sum'
            period: 60
            length: 600
          - name: HTTPCode_ELB_5XX_Count
            statistics:
            - 'Sum'
            period: 60
            length: 300
          - name: HTTPCode_ELB_4XX_Count
            statistics:
            - 'Sum'
            period: 60
            length: 300
          - name: HTTPCode_ELB_3XX_Count
            statistics:
            - 'Sum'
            period: 60
            length: 300
          - name: RejectedConnectionCount
            statistics:
            - 'Sum'
            period: 60
            length: 300
          - name: TargetResponseTime
            statistics:
            - 'Sum'
            period: 60
            length: 300
          - name: RequestCount
            statistics:
            - 'Sum'
            period: 60
            length: 300
          - name: ClientTLSNegotiationErrorCount
            statistics:
            - 'Sum'
            period: 60
            length: 300

Turn off timestamp as default to remove ux bug

With yace version 0.12 and prometheus 2.7.2 I got a scrape error expected timestamp or new record, got "MNAME". Yace run without message or warning, everything is ok but prometheus fail to scrape and note the target down

With yace version 0.11 and prometheus 2.7.2 everything is ok

Not thing change in our yace-config.yml between yace version 0.11 and yace version 0.12

Note, promtool check metrics report some metric name warning

curl http://172.31.68.219:5000/metrics | /opt/prometheus/prometheus-2.7.2.linux-amd64/promtool check metrics
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3261 100 3261 0 0 74113 0 --:--:-- --:--:-- --:--:-- 74113
aws_elb_backend_connection_errors_sum non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_httpcode_backend_2_xx_sum non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_httpcode_backend_4_xx_sum non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_httpcode_backend_5_xx_sum non-histogram and non-summary metrics should not have "_sum" suffix
aws_elb_request_count_sum non-histogram and non-summary metrics should not have "_sum" suffix

Add Dimensions as Labels

Hey, I’m looking at switching from cloudwatch_exporter to yace to be able to get tags into our metrics (thanks btw!). I’m just comparing the differences between them and one potential issue is that yace doesn’t add dimensions as labels like cloudwatch_exporter.

Labels for EC2 CPUCreditBalance:

cloudwatch_exporter

{job="aws_ec2",instance="",instance_id=“some-instance-id“,}

yace

{name="arn:aws:ec2:eu-west-1:some-owner:instance/some-instance-id”, tag_Foo=“bar”}

Labels for ES CPUUtilization

cloudwatch_exporter

{job="aws_es",instance="",domain_name=“some-domain-name”,client_id="some-client-id",}

yace

{name="arn:aws:es:eu-west-1:some-client-id:domain/some-domain-name", tag_Foo=“bar”}

The data is there in the name label but unless you know how to break it down it’s not clear.

Is this something you’d be happy adding? I'm also happy to raise a PR? Could prefix the labels with dimension_ similar to tag_?

avoid collecting data that is not fully coverage

Hi,

We are using Yace it to collect elb sum request metrics. we are scraping every 30s.
many time over past few days we have noticed we are getting values that didn't match with values in cloudwatch monitoring dashboard over the same period(i.e 60s)

eg. we are seeing from cloudwatch that all data point for 60s period is 120k but from Yace we are getting values like 70k in between and next value will be again 120k.

It seems cloudwatch aggregates data before showing it on the graph and hence it's more than the value which we are getting form yace.

so I think if we can add some kind of delay before fetching the data for particular timestamp then this problem can be solved?

I have tested it with length 60s and 300s.

and if I specify length as 300s then Yace should return 5 data point but i only see one value in /metrics path?

Add rds provisioned storage size as own metric

Hello,

Is there any way to get some metrics directly from RDS? For example the size of provisioned storage. So, If we will know it we will be able to calculate the percent of RDS storage usage.

EC2 metrics intermittent

for some reason EC2 metrics are intermittent:

at the same time metrics for RDS behave as expected:

here is the config for both resources:
jobs:
- region: {{ .Values.exporter.region }}
type: "rds"
searchTags:
- Key: Product
Value: .*
metrics:
- name: CPUUtilization
statistics:
- 'Average'
period: 60
length: 300
- name: FreeStorageSpace
statistics:
- 'Minimum'
period: 60
length: 300
- name: DatabaseConnections
statistics:
- 'Maximum'
period: 60
length: 300

  - region: {{ .Values.exporter.region }}
    type: "ec2"
    searchTags:
      - Key: Product
        Value: .*
    metrics:
      - name: CPUUtilization
        statistics:
        - 'Average'
        period: 60
        length: 300
      - name: NetworkIn
        statistics:
        - 'Average'
        period: 60
        length: 300
      - name: NetworkOut
        statistics:
        - 'Average'
        period: 60
        length: 300

And I use 0.10.0 version of the exporter. Could you please advise where I'm wrong

Missing differences against prometheus/cloudwatch_exporter

Add some "TODO" to README.md which helps to see if people can use this project already.

Feature request: Add api data to metrics

This would allow e.g. adding availability zone, instance size as labels.

The downside more api calls. Maybe better than tagging api.

#58

Or even export new metrics based on API data to allow easy monitoring
#48

Add discovery through tags for AWS VPN

Used prometheus/cloudwatch_exporter before and occasionally found this project. Going to use it but need a Cloudwatch VPN metrics support

[panic serving] [ALB] inconsistent label names or help strings for the same fully-qualified name

Hello

Hello Thomas

I've got weird crash error on ALB metric,
This problem is present on version 0.13.3, 0.13.4 and 0.13.6
Please have a look and comment is this a bug or configuration problem:

The probe crash completely and is unable to service any metric

A metric type "up" or "info" for yace would be appreciated to facilitate the creation of alert on the availability of yace himself

Error message

Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: 2019/07/02 08:29:45 http: panic serving 172.21.3.213:36324: descriptors reported by collector have inconsistent label names or help strings for the same fully-qualified name, offender is Desc{fqName: "aws_albtg_request_count_sum", help: "Help is not implemented yet.", constLabels: {dimension_LoadBalancer="app/awseb-AWSEB-B8MZZJGQT11I/cceca96b84631d38",dimension_TargetGroup="targetgroup/awseb-AWSEB-XE5JR40S8X8U/77238dfc7754bac2",name="arn:aws:elasticloadbalancing:eu-west-1:687944316825:targetgroup/awseb-AWSEB-XE5JR40S8X8U/77238dfc7754bac2",tag_Name="vc-pp-pim",tag_Project="pim"}, variableLabels: []}
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: goroutine 23360 [running]:
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: net/http.(*conn).serve.func1(0xc003bd80a0)
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /usr/local/go/src/net/http/server.go:1769 +0x139
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: panic(0xa18bc0, 0xc002f16d20)
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /usr/local/go/src/runtime/panic.go:522 +0x1b5
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc001d6e230, 0xc001fb1c48, 0x1, 0x1)
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:391 +0xad
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: main.metricsHandler(0xc23fc0, 0xc00027e2a0, 0xc002b10900)
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /github/workspace/main.go:51 +0x32f
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: net/http.HandlerFunc.ServeHTTP(0xaf4e58, 0xc23fc0, 0xc00027e2a0, 0xc002b10900)
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /usr/local/go/src/net/http/server.go:1995 +0x44
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: net/http.(*ServeMux).ServeHTTP(0x107a7e0, 0xc23fc0, 0xc00027e2a0, 0xc002b10900)
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /usr/local/go/src/net/http/server.go:2375 +0x1d6
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: net/http.serverHandler.ServeHTTP(0xc00017e750, 0xc23fc0, 0xc00027e2a0, 0xc002b10900)
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /usr/local/go/src/net/http/server.go:2774 +0xa8
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: net/http.(*conn).serve(0xc003bd80a0, 0xc26440, 0xc002bd2840)
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /usr/local/go/src/net/http/server.go:1878 +0x851
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]: created by net/http.(*Server).Serve
Jul 02 08:29:45 vc-prod-infra-exporter-04 yace_exporter[29797]:         /usr/local/go/src/net/http/server.go:2884 +0x2f4

Configuration

discovery:
  exportedTagsOnMetrics:
    vpn:
      - Project
      - Name
    alb:
      - Project
      - Name
    elb:
      - Project
      - Name
    rds:
      - Project
      - Name
    ec:
     - Project
     - Name
    lambda:
     - Project
     - Name
    kinesis:
     - Project
     - Name
    s3:
     - Project
     - Name
    dynamodb:
     - Project
     - Name
  jobs:
  - region: {{ ec2_region }}
    type: "vpn"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: TunnelState
        statistics:
        - 'Maximum'
        period: 60
        length: 60
        nilToZero: true
      - name: TunnelDataOut
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: TunnelDataIn
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
  - region: {{ ec2_region }}
    type: "lambda"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: Throttles
        statistics:
        - 'Average'
        period: 60
        length: 60
        nilToZero: true
      - name: Invocations
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: Errors
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: Duration
        statistics:
        - 'Average'
        period: 60
        length: 60
        nilToZero: true
      - name: Duration
        statistics:
        - 'Minimum'
        period: 60
        length: 60
        nilToZero: true
      - name: Duration
        statistics:
        - 'Maximum'
        period: 60
        length: 60
        nilToZero: true
  - region: {{ ec2_region }}
    type: "ec"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: NewConnections
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: GetTypeCmds
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: SetTypeCmds
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: CacheMisses
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: CacheHits
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
  - region: {{ ec2_region }}
    type: "rds"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: DatabaseConnections
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: ReadIOPS
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: WriteIOPS
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
  - region: {{ ec2_region }}
    type: "alb"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: RequestCount
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: HealthyHostCount
        statistics:
        - 'Minimum'
        period: 60
        length: 60
        nilToZero: true
      - name: UnHealthyHostCount
        statistics:
        - 'Maximum'
        period: 60
        length: 60
        nilToZero: true
      - name: Latency
        statistics:
        - 'Average'
        period: 60
        length: 60
        nilToZero: true
      - name: BackendConnectionErrors
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: HTTPCode_Backend_2XX
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: HTTPCode_Backend_5XX
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: HTTPCode_Backend_4XX
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
  - region: {{ ec2_region }}
    type: "elb"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: RequestCount
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: HealthyHostCount
        statistics:
        - 'Minimum'
        period: 60
        length: 60
        nilToZero: true
      - name: UnHealthyHostCount
        statistics:
        - 'Maximum'
        period: 60
        length: 60
        nilToZero: true
      - name: Latency
        statistics:
        - 'Average'
        period: 60
        length: 60
        nilToZero: true
      - name: BackendConnectionErrors
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: HTTPCode_Backend_2XX
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: HTTPCode_Backend_5XX
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
      - name: HTTPCode_Backend_4XX
        statistics:
        - 'Sum'
        period: 60
        length: 60
        nilToZero: true
  - region: {{ ec2_region }}
    type: "kinesis"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: PutRecords.Success
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: GetRecords.Success
        statistics:
        - 'Sum'
        period: 60
        length: 300    
  - region: eu-west-1
    type: "s3"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    additionalDimensions:
      - name: StorageType
        value: StandardStorage
    metrics:
      - name: NumberOfObjects
        statistics:
          - Average
        period: 86400
        length: 172800
        addCloudwatchTimestamp: true
      - name: BucketSizeBytes
        statistics:
          - Average
        period: 86400
        length: 172800
        addCloudwatchTimestamp: true
  - region: {{ ec2_region }}
    type: "dynamodb"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: ReturnedItemCount
        statistics:
        - 'Sum'
        period: 60
        length: 300

Labels with colons are not sanitized

Discovering EC2 where there is an autoscale group panics with the error message:
is invalid: "tag_aws:autoscaling:groupName" is not a valid label name

The Prometheus Go library expects labels to be sanitized and match "^[a-zA-Z_][a-zA-Z0-9_]*$".

Throttling: Rate exceeded

Hello,

faced with panic in one of the regions:
`panic: Throttling: Rate exceeded
status code: 400, request id: 2eac2d08-4014-11e9-8a89-991f83fb4554

goroutine 8238 [running]:
main.cloudwatchInterface.get(0xa3d040, 0xc420276018, 0xc4215abe80, 0xc422c00c60, 0xc42009f150, 0xe)
/go/src/yace/aws_cloudwatch.go:112 +0x17d
main.scrapeDiscoveryJob.func1.1(0xc42025d680, 0xc420d00410, 0xc4259bba00, 0x2, 0x2, 0xc422c3d6d0, 0x1, 0x1, 0xa3d040, 0xc420276018, ...)
/go/src/yace/abstract.go:140 +0x1dd
created by main.scrapeDiscoveryJob.func1
/go/src/yace/abstract.go:123 +0x15d`

Probably it makes sense to handle this via some configuration option. Like: "max concurrent requests" or something.

Feature request: Add task role as credential provider for secure ecs usage

Hello,

the YACE plugin runs in a docker container in an AWS ECS. The Container can get credentials by using the instance profile. For security reasons the Container should not use the instance Profile. Instead it should use a task role, that is attached to the ECS task. But this configuration results in following error:

Couldn't describe resources: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors

It seems the task role is not checked as a credential provider.

Coud you add the task role as a possible credential provider?

Best regards
Matthias

Help: Docker config file location

Im getting error
Couldn't read config.yml:open config.yml: no such file or directory
Where should I mount/copy config.yml in docker

Missing dimension label on static metric

Hi,

It seems the static metrics don't contain dimension labels. To reproduce it I'm using the AWS/AutoScaling namespace as in the documentation example (renamed my ASG to Test on the output for consistency)

config.yml

static:
  - namespace: AWS/AutoScaling
    region: ap-southeast-2
    dimensions:
     - name: AutoScalingGroupName
       value: Test
    metrics:
      - name: GroupInServiceInstances
        statistics:
        - 'Minimum'
        period: 60
        length: 300

yace metrics output

# HELP aws_auto_scaling_group_in_service_instances_minimum Help is not implemented yet.
# TYPE aws_auto_scaling_group_in_service_instances_minimum gauge
aws_auto_scaling_group_in_service_instances_minimum{name=""} 1
# HELP yace_cloudwatch_requests_total Help is not implemented yet.
# TYPE yace_cloudwatch_requests_total counter
yace_cloudwatch_requests_total 3

If I understand correctly I would expect aws_auto_scaling_group_in_service_instances_minimum to include a label dimension_AutoScalingGroupName

Debug log

2019/06/28 15:16:48 Parse config..
2019/06/28 15:16:48 Startup completed
Name=AutoScalingGroupName,Value=Test
2019/06/28 15:17:18 CLI helper - aws cloudwatch get-metric-statistics --metric-name GroupInServiceInstances --dimensions Name=AutoScalingGroupName,Value=Test --namespace AWS/AutoScaling --statistics Minimum --period 60 --start-time 2019-06-28T15:12:18+12:00 --end-time 2019-06-28T15:17:18+12:00
2019/06/28 15:17:18 {
  Dimensions: [{
      Name: "AutoScalingGroupName",
      Value: "Test"
    }],
  EndTime: 2019-06-28 15:17:18.126139411 +1200 NZST m=+29.407366351,
  MetricName: "GroupInServiceInstances",
  Namespace: "AWS/AutoScaling",
  Period: 60,
  StartTime: 2019-06-28 15:12:18.12614161 +1200 NZST m=-270.592631493,
  Statistics: ["Minimum"]
}
2019/06/28 15:17:18 {
  Dimensions: [{
      Name: "AutoScalingGroupName",
      Value: "Test"
    }],
  EndTime: 2019-06-28 15:17:18.126139411 +1200 NZST m=+29.407366351,
  MetricName: "GroupInServiceInstances",
  Namespace: "AWS/AutoScaling",
  Period: 60,
  StartTime: 2019-06-28 15:12:18.12614161 +1200 NZST m=-270.592631493,
  Statistics: ["Minimum"]
}
2019/06/28 15:17:18 {
  Datapoints: [
    {
      Minimum: 1,
      Timestamp: 2019-06-28 03:13:00 +0000 UTC,
      Unit: "None"
    },
    {
      Minimum: 1,
      Timestamp: 2019-06-28 03:14:00 +0000 UTC,
      Unit: "None"
    },
    {
      Minimum: 1,
      Timestamp: 2019-06-28 03:15:00 +0000 UTC,
      Unit: "None"
    },
    {
      Minimum: 1,
      Timestamp: 2019-06-28 03:16:00 +0000 UTC,
      Unit: "None"
    },
    {
      Minimum: 1,
      Timestamp: 2019-06-28 03:12:00 +0000 UTC,
      Unit: "None"
    }
  ],
  Label: "GroupInServiceInstances"
}

PS Thank you for this awesome exporter!

[0.13.0-alpha] IAM Policies

Since 0.13.0-alpha the IAM policy needed is this one

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "tag:GetResources",
                "cloudwatch:GetMetricStatistics",
                "cloudwatch:ListMetrics"
            ],
            "Resource": "*"
        }
    ]
}

If not you have this error message

panic: AccessDenied: User: arn:aws:sts::687944316825:assumed-role/vc-prod-setting-role/i-0beed10f8be074647 is not authorized to perform: cloudwatch:ListMetrics
	status code: 403, request id: 5b0b4c8f-513e-11e9-9035-1bb5a5f5a512


goroutine 161 [running]:
main.getResourceValue(0x994f55, 0xc, 0xc4201ae448, 0x1, 0x1, 0xc4201eaf40, 0xa3d040, 0xc42000c358, 0x8d00e0)
	/go/src/yace/aws_cloudwatch.go:177 +0xfa
main.queryAvailableDimensions(0xc420067924, 0x36, 0xc4201eaf40, 0xa3d040, 0xc42000c358, 0x14, 0xc42006790d, 0x9)
	/go/src/yace/aws_cloudwatch.go:193 +0x453
main.getDimensions(0xc4200692d0, 0xc4201ea7c0, 0xa3d040, 0xc42000c358, 0x47f378, 0xc4201bcaf8, 0x1)
	/go/src/yace/aws_cloudwatch.go:218 +0x110c
main.scrapeDiscoveryJob.func1(0xc420229a10, 0xa3d040, 0xc42000c358, 0xc420023d00, 0x9, 0xc420023cfc, 0x3, 0x0, 0x0, 0xc4201ec640, ...)
	/go/src/yace/abstract.go:120 +0x66
created by main.scrapeDiscoveryJob
	/go/src/yace/abstract.go:119 +0x247

List of dimensions to be fetched with metrics

I needed the ELB and ALB metrics with load balancer name and AvailabilityZone dimension.
according to docs, I cannot specify list dimension without specifying the value.

even with value, if I try the same dimension with multiple values it's only giving metrics with last value. rest are ignored.
in below example, I will get metrics with one dimension value i.e which is specified last (ap-south-1a is ignored)

 discovery:
   jobs:
     - type: "elb"
       region: ap-south-1
       searchTags:
         - Key: kubernetes.io/service-name
           Value: .*
       metrics:
         - name: HealthyHostCount
           statistics:
           - 'Minimum'
           period: 60
           length: 60
           delay: 120
           additionalDimensions:
             - name: AvailabilityZone
               value: "ap-south-1a"
             - name: AvailabilityZone
               value: "ap-south-1b"
         - name: UnHealthyHostCount
           statistics:
           - 'Maximum'
           period: 60
           length: 60
           delay: 120
         - name: RequestCount
           statistics:
           - 'Sum'
           period: 60
           length: 60
           delay: 120
           nilToZero: true

aws_elb_healthy_host_count_minimum{dimension_AvailabilityZone="ap-south-1b",dimension_LoadBalancerName="xxxxx",name="arn:aws:elasticloadbalancing:ap-south-1:xxxx:loadbalancer/xxxx"} 29

One more thing according to docs additional dimensions should work at job level but it's only working at each metrics level.

this works.

       metrics:
         - name: HealthyHostCount
           statistics:
           - 'Minimum'
           period: 60
           length: 60
           delay: 120
           additionalDimensions:
             - name: AvailabilityZone
               value: "ap-south-1a"
             - name: AvailabilityZone
               value: "ap-south-1b"

This doesn't work

  jobs:
    - type: "elb"
      region: ap-south-1
      searchTags:
        - Key: kubernetes.io/service-name
          Value: .*
      additionalDimensions:
        - name: AvailabilityZone
          value: "ap-south-1a"
        - name: AvailabilityZone
          value: "ap-south-1b"
      metrics:
.....

Add aws cli command to debug output to test easily why queries are not working

Currently debugging is difficult. If we would output the corresponding aws cli command people could debug more easily themselves.

incorrect dimension value in case of alb in discovery config

Hi,
in discovery config, when alb instances are searched based on tags and dimensions are made from these searched values, dimension value which passed in this case is a bit incorrect. After making the changes, this works for me and have created pull request for the same: #41
Please let me know regarding this.

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html#load-balancer-metric-dimensions-alb
here it specifies: in case of 'LoadBalancer' dimension, its value should be 'app/load-balancer-name/1234567890123456'

[0.13.0-alpha] ALB panic

Hello

With version 0.13.0-alpha for ALB we have this error message and the probing fail

Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: 2019/03/27 16:49:34 http: panic serving 172.21.32.190:42236: descriptors reported by collector have inconsistent label names or help strings for the same fully-qualified name, offender is Desc{fqName: "aws_alb_request_count_sum", help: "Help is not implemented yet.", constLabels: {dimension_LoadBalancer="app/awseb-AWSEB-W3MG4A5Z5SW0/7978fae386c94fa3",name="arn:aws:elasticloadbalancing:eu-west-1:687944316825:loadbalancer/app/awseb-AWSEB-W3MG4A5Z5SW0/7978fae386c94fa3",tag_Name="vc-prod-rum",tag_Project="rum"}, variableLabels: []}
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: goroutine 1208 [running]:
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: net/http.(*conn).serve.func1(0xc4209f2140)
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /usr/local/go/src/net/http/server.go:1726 +0xd0
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: panic(0x8f4d40, 0xc420a36a60)
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /usr/local/go/src/runtime/panic.go:502 +0x229
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: yace/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc420a640a0, 0xc420667cd0, 0x1, 0x1)
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /go/src/yace/vendor/github.com/prometheus/client_golang/prometheus/registry.go:391 +0x9e
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: main.metricsHandler(0xa340e0, 0xc42044a540, 0xc420176400)
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /go/src/yace/main.go:50 +0x2e2
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: net/http.HandlerFunc.ServeHTTP(0x9b3580, 0xa340e0, 0xc42044a540, 0xc420176400)
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /usr/local/go/src/net/http/server.go:1947 +0x44
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: net/http.(*ServeMux).ServeHTTP(0xcaec60, 0xa340e0, 0xc42044a540, 0xc420176400)
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /usr/local/go/src/net/http/server.go:2340 +0x130
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: net/http.serverHandler.ServeHTTP(0xc420173040, 0xa340e0, 0xc42044a540, 0xc420176400)
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /usr/local/go/src/net/http/server.go:2697 +0xbc
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: net/http.(*conn).serve(0xc4209f2140, 0xa35ae0, 0xc420238040)
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /usr/local/go/src/net/http/server.go:1830 +0x651
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]: created by net/http.(*Server).Serve
Mar 27 16:49:34 vc-prod-rum-exporter-01 yace_exporter[29952]:         /usr/local/go/src/net/http/server.go:2798 +0x27b

Our setting for ALB is

discovery:
  exportedTagsOnMetrics:
    vpn:
      - Project
      - Name
    elb:
      - Project
      - Name
    rds:
      - Project
      - Name
    ec:
     - Project
     - Name
    lambda:
     - Project
     - Name
    alb:
     - Project
     - Name

  jobs:
  - region: {{ ec2_region }}
    type: "vpn"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: TunnelState
        statistics:
        - 'Maximum'
        period: 60
        length: 300
      - name: TunnelDataOut
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: TunnelDataIn
        statistics:
        - 'Sum'
        period: 60
        length: 300
  - region: {{ ec2_region }}
    type: "lambda"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: Throttles
        statistics:
        - 'Average'
        period: 60
        length: 300
      - name: Invocations
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: Errors
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: Duration
        statistics:
        - 'Average'
        period: 60
        length: 300
      - name: Duration
        statistics:
        - 'Minimum'
        period: 60
        length: 300
      - name: Duration
        statistics:
        - 'Maximum'
        period: 60
        length: 300
  - region: {{ ec2_region }}
    type: "ec"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: NewConnections
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: GetTypeCmds
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: SetTypeCmds
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: CacheMisses
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: CacheHits
        statistics:
        - 'Sum'
        period: 60
        length: 300
  - region: {{ ec2_region }}
    type: "rds"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: DatabaseConnections
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: ReadIOPS
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: WriteIOPS
        statistics:
        - 'Sum'
        period: 60
        length: 300
  - region: {{ ec2_region }}
    type: "elb"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: RequestCount
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: HealthyHostCount
        statistics:
        - 'Minimum'
        period: 60
        length: 300
      - name: UnHealthyHostCount
        statistics:
        - 'Maximum'
        period: 60
        length: 300
      - name: HTTPCode_Backend_2XX
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: HTTPCode_Backend_5XX
        statistics:
        - 'Sum'
        period: 60
        length: 300
      - name: HTTPCode_Backend_4XX
        statistics:
        - 'Sum'
        period: 60
        length: 300
  - region: {{ ec2_region }}
    type: "alb"
    searchTags:
      - Key: {{ prometheus_yace_tag_filter }}
        Value: {{ prometheus_yace_tag_value }}
    metrics:
      - name: RequestCount
        statistics:
        - 'Sum'
        period: 600
        length: 600
      - name: HealthyHostCount
        statistics:
        - 'Minimum'
        period: 600
        length: 600
      - name: UnHealthyHostCount
        statistics:
        - 'Maximum'
        period: 600
        length: 600
      - name: HTTPCode_Target_2XX
        statistics:
        - 'Sum'
        period: 600
        length: 600
      - name: HTTPCode_Target_5XX
        statistics:
        - 'Sum'
        period: 600
        length: 600
      - name: HTTPCode_Target_4XX
        statistics:
        - 'Sum'
        period: 60
        length: 60

It's more stable when we use

        period: 60
        length: 60

But, in this case, we have data gaps

Statsd support

is statsd support gonna be made available?
would you accept prs for the same?

regards

RequestError: send request failed

Hey,
i've tried yace as an alternative to prom. Launch it in Docker container on AWS instance, while i get http response:

HELP yace_cloudwatch_requests_total Help is not implemented yet.

TYPE yace_cloudwatch_requests_total counter

yace_cloudwatch_requests_total 0

it still doesnt work, shows logs:

2018/09/25 05:50:41 Couldn't describe resources: RequestError: send request failed
caused by: Post https://tagging.eu-west-1.amazonaws.com/: dial tcp 52.94.216.112:443: i/o timeout
2018/09/25 05:51:01 Couldn't describe resources: RequestError: send request failed
caused by: Post https://tagging.eu-west-1.amazonaws.com/: dial tcp 52.94.220.75:443: i/o timeout
2018/09/25 05:51:21 Couldn't describe resources: RequestError: send request failed
caused by: Post https://tagging.eu-west-1.amazonaws.com/: dial tcp 52.94.220.75:443: i/o timeout
2018/09/25 05:51:37 Couldn't describe resources: RequestError: send request failed
caused by: Post https://tagging.eu-west-1.amazonaws.com/: dial tcp 52.94.220.75:443: i/o timeout
2018/09/25 05:51:41 Couldn't describe resources: RequestError: send request failed
caused by: Post https://tagging.eu-west-1.amazonaws.com/: dial tcp 52.94.220.75:443: i/o timeout

i tried to start with a simple config, can you suggest what is wrong?
discovery:

region: eu-west-1
type: "elb"
searchTags:
- Key: KubernetesCluster
  Value: kubernetes
  metrics:
- name: HealthyHostCount
  statistics:
  - 'Minimum'
    period: 60
    length: 300
- name: UnHealthyHostCount
  statistics:
  - 'Minimum'
  period: 60
  length: 300
- name: RequestCount
  statistics:
  - 'Sum'
    period: 60
    length: 900
    nilToZero: true
- name: Latency
  statistics:
  - 'Sum'
    period: 60
    length: 900
    nilToZero: true

Support for changing tag_ prefix

Hi!

We need in our integration of the exporter, tags exported without the prefix tag_.

What do you think of supporting an env variable, a parameter inside the config.yml or something similar for changing the prefix to whatever you want?

Add gometalinter as CI step and fix all issues thrown by it

This should improve Code quality a lot.

Feature Request: Add type emr

Dear yace team,

we are trying to extend yace with additional type "emr". But seemingly emr has no resource arn, is that the reason why you haven't included it as a type ?

Is it possible to implement type emr elegantly based on yace ?

Thanks a lot for the help

Not getting BucketSizeBytes from dynamic S3 metrics

With YACE version 0.13.6

Config

  1 discovery:
  2   exportedTagsOnMetrics:
  3     s3:
  4      - Project
  5      - Name
  6   jobs:
  7   - region: eu-west-1
  8     type: "s3"
  9     searchTags:
 10       - Key: Project
 11         Value: gitlab
 12     additionalDimensions:
 13       - name: StorageType
 14         value: StandardStorage
 15     metrics:
 16       - name: NumberOfObjects
 17         statistics:
 18           - Average
 19         period: 86400
 20         length: 172800
 21         # addCloudwatchTimestamp: true
 22       - name: BucketSizeBytes
 23         statistics:
 24           - Average
 25         period: 86400
 26         length: 172800
 27         addCloudwatchTimestamp: true

Debug

Name=BucketName,Value=S3-BUCKET
Name=BucketName,Value=S3-BUCKETName=StorageType,Value=AllStorageTypes
2019/07/08 15:32:38 CLI helper - aws cloudwatch get-metric-statistics --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=S3-BUCKETName=StorageType,Value=AllStorageTypes --namespace AWS/S3 --statistics Average --period 86400 --start-time 2019-07-06T15:32:38Z --end-time 2019-07-08T15:32:38Z
Name=BucketName,Value=S3-BUCKET
Name=BucketName,Value=S3-BUCKETName=StorageType,Value=AllStorageTypes
2019/07/08 15:32:38 CLI helper - aws cloudwatch get-metric-statistics --metric-name NumberOfObjects --dimensions Name=BucketName,Value=S3-BUCKETName=StorageType,Value=AllStorageTypes --namespace AWS/S3 --statistics Average --period 86400 --start-time 2019-07-06T15:32:38Z --end-time 2019-07-08T15:32:38Z
2019/07/08 15:32:38 {
  Dimensions: [{
      Name: "BucketName",
      Value: "S3-BUCKET"
    },{
      Name: "StorageType",
      Value: "AllStorageTypes"
    }],
  EndTime: 2019-07-08 15:32:38.455505365 +0000 UTC m=+142.172302489,
  MetricName: "BucketSizeBytes",
  Namespace: "AWS/S3",
  Period: 86400,
  StartTime: 2019-07-06 15:32:38.455505729 +0000 UTC m=-172657.827697241,
  Statistics: ["Average"]
}
2019/07/08 15:32:38 {
  Dimensions: [{
      Name: "BucketName",
      Value: "S3-BUCKET"
    },{
      Name: "StorageType",
      Value: "AllStorageTypes"
    }],
  EndTime: 2019-07-08 15:32:38.455519422 +0000 UTC m=+142.172316473,
  MetricName: "NumberOfObjects",
  Namespace: "AWS/S3",
  Period: 86400,
  StartTime: 2019-07-06 15:32:38.455519594 +0000 UTC m=-172657.827683375,
  Statistics: ["Average"]
}
2019/07/08 15:32:38 {
  Dimensions: [{
      Name: "BucketName",
      Value: "S3-BUCKET"
    },{
      Name: "StorageType",
      Value: "AllStorageTypes"
    }],
  EndTime: 2019-07-08 15:32:38.455519422 +0000 UTC m=+142.172316473,
  MetricName: "NumberOfObjects",
  Namespace: "AWS/S3",
  Period: 86400,
  StartTime: 2019-07-06 15:32:38.455519594 +0000 UTC m=-172657.827683375,
  Statistics: ["Average"]
}
2019/07/08 15:32:38 {
  Datapoints: [{
      Average: 4105,
      Timestamp: 2019-07-06 15:32:00 +0000 UTC,
      Unit: "Count"
    },{
      Average: 4105,
      Timestamp: 2019-07-07 15:32:00 +0000 UTC,
      Unit: "Count"
    }],
  Label: "NumberOfObjects"
}
2019/07/08 15:32:38 {
  Datapoints: [{
      Average: 13430,
      Timestamp: 2019-07-06 15:32:00 +0000 UTC,
      Unit: "Count"
    },{
      Average: 13128,
      Timestamp: 2019-07-07 15:32:00 +0000 UTC,
      Unit: "Count"
    }],
  Label: "NumberOfObjects"
}
2019/07/08 15:32:38 {
  Label: "BucketSizeBytes"
}
2019/07/08 15:32:38 {
  Label: "BucketSizeBytes"
}

CLI

From the helper

aws cloudwatch get-metric-statistics --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=S3-BUCKET  Name=StorageType,Value=AllStorageTypes  --namespace AWS/S3 --statistics Average --period 172800 --start-time 2019-07-05T14:23:58Z --end-time 2019-07-08T14:23:58Z --unit Bytes --region eu-west-1
BucketSizeBytes

From the helper with correction

aws cloudwatch get-metric-statistics --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=S3-BUCKET  Name=StorageType,Value=StandardStorage  --namespace AWS/S3 --statistics Average --period 172800 --start-time 2019-07-05T14:23:58Z --end-time 2019-07-08T14:23:58Z --unit Bytes --region eu-west-1
BucketSizeBytes
DATAPOINTS  264621376991.0  2019-07-05T14:23:00Z  Bytes
DATAPOINTS  236125209092.0  2019-07-07T14:23:00Z  Bytes

remark

The debug return

{
      Name: "StorageType",
      Value: "AllStorageTypes"
    }

While the configuration is

 12     additionalDimensions:
 13       - name: StorageType
 14         value: StandardStorage

Problem: AddCloudwatchTimestamp null pointer

I got the following error

2019/05/08 10:23:49 http: panic serving 192.168.64.14:60592: runtime error: invalid memory address or nil pointer dereference
goroutine 5 [running]:
net/http.(*conn).serve.func1(0xc0001c00a0)
        /usr/local/go/src/net/http/server.go:1769 +0x139
panic(0xafe9e0, 0x124f330)
        /usr/local/go/src/runtime/panic.go:522 +0x1b5
main.migrateCloudwatchToPrometheus(0xc0005b4000, 0x8a5, 0xa00, 0x12, 0xc0001b4340, 0x3)
        /opt/aws_cloudwatch.go:435 +0x1203
main.metricsHandler(0xd34ca0, 0xc000266000, 0xc0001bea00)
        /opt/main.go:59 +0xa9
net/http.HandlerFunc.ServeHTTP(0xbf0178, 0xd34ca0, 0xc000266000, 0xc0001bea00)
        /usr/local/go/src/net/http/server.go:1995 +0x44
net/http.(*ServeMux).ServeHTTP(0x125d8a0, 0xd34ca0, 0xc000266000, 0xc0001bea00)
        /usr/local/go/src/net/http/server.go:2375 +0x1d6
net/http.serverHandler.ServeHTTP(0xc0001a68f0, 0xd34ca0, 0xc000266000, 0xc0001bea00)
        /usr/local/go/src/net/http/server.go:2774 +0xa8
net/http.(*conn).serve(0xc0001c00a0, 0xd38ca0, 0xc00024e500)
        /usr/local/go/src/net/http/server.go:1878 +0x851
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:2884 +0x2f4

which turns out c.AddCloudwatchTimestamp == nil is true. Seemingly input AddCloudwatchTimestamp is a must, there is default value for this is null.

because it is defined as a pointer

type cloudwatchData struct {
	ID                     *string
	Metric                 *string
	Service                *string
	Statistics             []string
	Points                 []*cloudwatch.Datapoint
	NilToZero              *bool
	AddCloudwatchTimestamp *bool
	CustomTags             []tag
	Tags                   []tag
	Dimensions             []*cloudwatch.Dimension
}

would things be easier if it is a type bool instead of *bool ?

P.S. there is a "hole" in the README doc . The "AddCloudwatchTimestamp" metric atttribute is called "disableTimestamp" in config.yml

@tsupertramp

nerdswords / yet-another-cloudwatch-exporter Goto Github PK

yet-another-cloudwatch-exporter's People

Contributors

Stargazers

Watchers

Forkers

yet-another-cloudwatch-exporter's Issues

Labels for EC2 CPUCreditBalance:

Labels for ES CPUUtilization

Hello

Error message

Configuration

HELP yace_cloudwatch_requests_total Help is not implemented yet.

TYPE yace_cloudwatch_requests_total counter

Config

Debug

CLI

remark

Recommend Projects

Recommend Topics

Recommend Org