Giter Club home page Giter Club logo

nagios-cloudwatch-metrics's Introduction

Nagios Cloudwatch Metrics plugin

This plugin allows you to check certain AWS Cloudwatch metrics and set alerts on certain values.

The script is written in bash. It is tested on OSX and Ubuntu 16.04 and 18.04.

This plugin fetches the data from X minutes back until now.

Dependencies

To run this script you should have installed the following packages:

  • jq - json processor
  • awscli - AWS command line interface
  • bc - used for working with floating point numbers

We assume that the user who execute this script has configured his account so that he/she can connect to Amazon.

If not, please do this first. See here for more info: http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

Parameters

See the help message:

    -h or --help           Show this message

    -v or --verbose        Optional: Show verbose output

    --profile=x            Optional: Which AWS profile should be used to connect to aws?

    --namespace=x          Required: Enter the AWS namespace where you want to check your metrics for. The "AWS/" prefix can be
                           left out, this is the default namespace prefix. See below. Example: "CloudFront", "EC2" or "Firehose".
                           More information: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/aws-namespaces.html

    --namespace-prefix=x   The prefix for the namespace which should be used. By default this is "AWS". If you do not want to use this prefix
                           you should pass this parameter with an empty value.
                           Default: AWS

    --mins=x               Required: Supply the minutes (time window) of which you want to check the AWS metrics. We will fetch the data
                           between NOW-%mins and NOW.

    --region=x             Required: Enter the AWS region which we need to use. For example: "eu-west-1"

    --metric=x             Required: The metric name which you want to check. For example "IncomingBytes"

    --timeout=x            Optional: Specify the max duration in seconds of this script.
                           When the timeout is reached, we will return a UNKNOWN alert status.

    --statistics=x         Required: The statistics which you want to fetch.
                           Possible values: Sum, Average, Maximum, Minimum, SampleCount
                           Default: Average

    --dimensions=x         Required: The dimensions which you want to fetch.
                           Examples:
                              Name=DBInstanceIdentifier,Value=i-1235534
                              Name=DeliveryStreamName,Value=MyStream
                           See also: http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#dimension-combinations

    --warning=x:x          Required: The warning threshold. You can supply min:max or just max value. Use the format: [@]min:max
                           When no minimal value is given, a default min value of 0 is used.
                           By default we will raise a warning alert when the value is outside the given range. You can start the range
                           with an @ sign to change this logic. We then will alert when the value is inside the range.
                           See below for some examples.

    --critical=x:x         Required: The critical threshold. You can supply min:max or just max value. Use the format: [@]min:max
                           When no minimal value is given, a default min value of 0 is used.
                           By default we will raise a critical alert when the value is outside the given range. You can start the range
                           with an @ sign to change this logic. We then will alert when the value is inside the range.
                           See below for some examples.

    --default="x"          When no data points are returned, it could be because there is no data. By default this script will return
                           the nagios state UNKNOWN. You could also supply a default value here (like 0). In that case we will work
                           with that value when no data points are returned.


    --http_proxy="x"       When you use a proxy to connect to the AWS Cli, you can use this option. See for more information
                           this link: http://docs.aws.amazon.com/cli/latest/userguide/cli-http-proxy.html

    --https_proxy="x"      When you use a proxy to connect to the AWS Cli, you can use this option. See for more information
                           this link: http://docs.aws.amazon.com/cli/latest/userguide/cli-http-proxy.html

    --last-known           When given, we will fetch the last known values up to 20 minutes ago. Cloudwatch metrics are not always up to date.
                           By specifying this option we will walk back in 1 minute steps when no data is known for max 20 minutes.
                     


Example threshold values:

--critical=10
We will raise an alert when the value is < 0 or > 10

--critical=5:10
We will raise an alert when the value is < 5 or > 10

--critical=@5:10
We will raise an alert when the value is >= 5 and <= 10

--critical=~:10
We will raise an alert when the value is > 10 (there is no lower limit)

--critical=10:~
We will raise an alert when the value is < 10 (there is no upper limit)

--critical=10:
(Same as above) We will raise an alert when the value is < 10 (there is no upper limit)

--critical=@1:~
Alert when the value is >= 1. Zero is OK.

--critical=@~:0
Alert when the value is <= 0. So 0.1 or higher is okay.


See for more info: https://www.monitoring-plugins.org/doc/guidelines.html#THRESHOLDFORMAT

AWS Credentials

This plugin uses the AWS Command Line Interface to retrieve the metrics data from Amazon. To make this plugin work you should make sure that the user who execute's this plugin can use the Amazon CLI.

The AWS CLI will automatically search for your credentials (access key id and secret access key) in a few places. See also here: http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#config-settings-and-precedence

I would suggest that you add the credentials in a file like ~/.aws/credentials, where ~ is the home directory of the user who will execute the plugin. This will likely be the nagios user, so then the file will be ~nagios/.aws/credentials.

If you run nagios on an EC2 machine you can also apply a IAM role to the machine with the correct security rights.

Installation

To make use of this plugin, you should checkout this script in your nagios plugins directory.

cd /usr/lib/nagios/plugins/
git clone https://github.com/level23/nagios-cloudwatch-metrics.git

Then you should define a command to your nagios configuration. Some example commands:

#
# Generic cloudwatch_check
# $ARG1$: Namespace (i.e., ELB, EC2, RDS, etc.)
# $ARG2$: Metric
# $ARG3$: Dimension (i.e., InstanceId)
# $ARG4$: Dimension Value (i.e., i-1a2b3c4d)
# $ARG5$: Warning Level
# $ARG6$: Critical Level
# $ARG7$: Time Interval
# $ARG8$: Default (0 if null)
define command {
       command_name	check_aws
       command_line     $USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=us-east-1 --namespace="$ARG1$" --metric="$ARG2$" --statistics="Average" --mins=$ARG7$ --dimensions="Name=$ARG3$,Value=$ARG4$" --warning=$ARG5$ --critical=$ARG6$ --default=$ARG8$
}

#
# Check check_aws_firehose
# $ARG1$: Metric, for example: IncomingBytes
# $ARG2$: DeliveryStreamName
# $ARG3$: Warning value
# $ARG4$: Critical value
define command {
	command_name	check_aws_firehose
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=eu-west-1 --namespace="Firehose" --metric="$ARG1$" --statistics="Average" --mins=15 --dimensions="Name=DeliveryStreamName,Value=$ARG2$" --warning=$ARG3$ --critical=$ARG4$
}

#
# Check check_aws_lambda
# $ARG1$: Metric, for example: Duration
# $ARG2$: FunctionName
# $ARG3$: Warning value
# $ARG4$: Critical value
define command {
	command_name	check_aws_lambda
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=eu-west-1 --namespace="Lambda" --metric="$ARG1$" --statistics="Average" --mins=15 --dimensions="Name=FunctionName,Value=$ARG2$" --warning=$ARG3$ --critical=$ARG4$
}


#
# Check check_aws_sqs
# $ARG1$: Metric, for example: NumberOfMessagesReceived
# $ARG2$: QueueName
# $ARG3$: Warning value
# $ARG4$: Critical value
define command {
	command_name	check_aws_sqs
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=eu-west-1 --namespace="SQS" --metric="$ARG1$" --statistics="Sum" --mins=15 --dimensions="Name=QueueName,Value=$ARG2$" --warning=$ARG3$ --critical=$ARG4$
}


#
# Check check_aws_sns
# $ARG1$: Metric, for example: NumberOfNotificationsFailed
# $ARG2$: TopicName
# $ARG3$: Warning value
# $ARG4$: Critical value
define command {
	command_name	check_aws_sns
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=eu-west-1 --namespace="SNS" --metric="$ARG1$" --statistics="Sum" --mins=15 --dimensions="Name=TopicName,Value=$ARG2$" --warning=$ARG3$ --critical=$ARG4$
}

#
# Check check_aws_elb
# $ARG1$: Metric, for example: UnHealthyHostCount or HTTPCode_ELB_5XX
# $ARG2$: LoadBalancerName
# $ARG3$: Warning value
# $ARG4$: Critical value
define command {
	command_name	check_aws_elb
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=eu-west-1 --namespace="ELB" --metric="$ARG1$" --statistics="Maximum" --mins=1 --dimensions="Name=LoadBalancerName,Value=$ARG2$" --warning=$ARG3$ --critical=$ARG4$
}

#
# Check check_aws_elasticache
# $ARG1$: Metric, for example: CPUUtilization
# $ARG2$: CacheClusterId
# $ARG3$: Warning
# $ARG4$: Critical value
define command {
	command_name	check_aws_elasticache
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=eu-west-1 --namespace="ElastiCache" --metric="$ARG1$" --statistics="Average" --mins=15 --dimensions="Name=CacheClusterId,Value=$ARG2$" --warning=$ARG3$ --critical=$ARG4$
}

#
# Check check_aws_cloudfront
# $ARG1$: Metric, for example: 4xxErrorRate
# $ARG2$: DistributionId
# $ARG3$: Warning
# $ARG4$: Critical value
define command {
	command_name	check_aws_cloudfront
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=us-east-1 --namespace="CloudFront" --metric="$ARG1$" --statistics="Average" --mins=15 --dimensions="Name=DistributionId,Value=$ARG2$ Name=Region,Value=Global" --warning=$ARG3$ --critical=$ARG4$
}

#
# Check check_aws_rds over last 5 minutes
# $ARG1$: Metric, for example: CPUUtilization
# $ARG2$: ClusterId
# $ARG2$: READER/WRITER
# $ARG3$: Warning
# $ARG4$: Critical value
define command {
	command_name	check_aws_rds
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --region=eu-west-1 --namespace="RDS" --metric="$ARG1$" --statistics="Average" --mins=5 --dimensions="Name=DBClusterIdentifier,Value=$ARG2$ Name=Role,Value=$ARG3$" --warning=$ARG4$ --critical=$ARG5$
}

# Defined in HOST configuration are:
# $HOSTALIAS$ = InstanceId
# $HOSTNOTES$ = region
#
# Check check_aws_ec2
# $ARG1$: Metric, for example CPUUtilization
# $ARG2$: Data statistics. Possible values: Maximum, Minimum, Sum, Average
# $ARG3$: Minutes timewindow
# $ARG4$: Warning value
# $ARG5$: Critical value
define command {
	command_name	check_aws_ec2
	command_line	$USER1$/nagios-cloudwatch-metrics/check_cloudwatch.sh --timeout=30 --region="$HOSTNOTES$" --namespace="EC2" --metric="$ARG1$" --statistics="$ARG2$" --mins="$ARG3$" --dimensions="Name=InstanceId,Value=$HOSTALIAS$" --warning=$ARG4$ --critical=$ARG5$
}

In these examples we have hard-coded defined our region and the X minutes time window.

Then, you can configure your nagios services like this:

#
# We assume that there is at least an average of 100 bytes per minute for myStream. If lower, then a warning.
# If lower than 50 Bytes, then it's critical and we should receive an SMS!
#
define service {
        use                         generic-service
        hostgroup_name              cloudwatch
        service_description         Firehose: Incoming Bytes for myStream
        max_check_attempts          2
        normal_check_interval       5
        retry_check_interval        5
        contact_groups              group_sms
        notification_interval       30
        check_command               check_aws_firehose!IncomingBytes!myStream!100!50
}

#
# We assume that myFunction does not run longer than 60000 ms (60s). If so, trigger a warning.
# If it runs longer than 120000 ms (120s), trigger an critical notification.
#
define service {
        use                         generic-service
        hostgroup_name              cloudwatch
        service_description         Lambda: duration of myFunction
        max_check_attempts          2
        normal_check_interval       5
        retry_check_interval        5
        contact_groups              group_sms
        notification_interval       30
        check_command               check_aws_lambda!Duration!myFunction!0:60000!0:120000
}

# etc.

nagios-cloudwatch-metrics's People

Contributors

binbash486 avatar edwardofclt avatar gmqnqwo5kfjjdt avatar pkrohn-sfdc avatar sherwind avatar strofimovsky avatar teyeheimans avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nagios-cloudwatch-metrics's Issues

Scientific values displayed for ELB Latency metrics

We have created an alarm for ELB Latency metrics and we see that the value which is returned from AWS Cloudwatch via this script is represented as a scientific value (9.610487596832052e-06) as the actual value of ELB Latency metrics is very less than 0 and by default the unit is considered as seconds.

Warning: >0.2
Critical: > 0.5

As a result the script is throwing a Critical alarm all the time and does not work with very low values for Latency metrics. The script even has the limitation of not allowing to specify the units of the metrics in Milli-seconds.

Below is the command we are using in Nagios:
bash -x check_cloudwatch.sh --region=eu-west-1 --namespace="ELB" --metric="Latency" --statistics="Average" --mins=5 --dimensions="Name=LoadBalancerName,Value=Test-PREPROD-endpoints-SinatraLB" --warning=:0.2 --critical=:0.5 --https_proxy=http://85.205.94.78:8085

Below is the snippet of the debug output for the nagios-cloudwatch plugin script:

++ aws cloudwatch get-metric-statistics --region eu-west-1 --namespace AWS/ELB --metric-name Latency --output json --start-time 2018-04-19T08:13:29 --end-time 2018-04-19T08:18:00 --period 300 --statistics Average --dimensions Name=LoadBalancerName,Value=Test-PREPROD-endpoints-SinatraLB

  • RESULT='{
    "Label": "Latency",
    "Datapoints": [
    {
    "Timestamp": "2018-04-19T08:13:00Z",
    "Average": 9.610487596832052e-06,
    "Unit": "Seconds"
    }
    ]
    }'
    ++ echo '{' '"Label":' '"Latency",' '"Datapoints":' '[' '{' '"Timestamp":' '"2018-04-19T08:13:00Z",' '"Average":' 9.610487596832052e-06, '"Unit":' '"Seconds"' '}' ']' '}'
    ++ jq '.Datapoints[0].Average'
  • METRIC_VALUE=9.610487596832052e-06
    ++ jq -r '.Datapoints[0].Unit'
    ++ echo '{' '"Label":' '"Latency",' '"Datapoints":' '[' '{' '"Timestamp":' '"2018-04-19T08:13:00Z",' '"Average":' 9.610487596832052e-06, '"Unit":' '"Seconds"' '}' ']' '}'
  • UNIT=Seconds
  • verbose 'Raw result: {
    "Label": "Latency",
    "Datapoints": [
    {
    "Timestamp": "2018-04-19T08:13:00Z",
    "Average": 9.610487596832052e-06,
    "Unit": "Seconds"
    }
    ]
    }'
  • [[ 0 -eq 1 ]]
  • verbose 'Unit: Seconds'
  • [[ 0 -eq 1 ]]
    ++ echo '{' '"Label":' '"Latency",' '"Datapoints":' '[' '{' '"Timestamp":' '"2018-04-19T08:13:00Z",' '"Average":' 9.610487596832052e-06, '"Unit":' '"Seconds"' '}' ']' '}'
    ++ jq .Label
  • DESCRIPTION='"Latency"'
  • verbose 'Metric value: 9.610487596832052e-06'
  • [[ 0 -eq 1 ]]
  • [[ 9.610487596832052e-06 == \n\u\l\l ]]
  • MESSAGE='All ok. '
  • EXIT=0
  • shouldAlert '~:0.5' 9.610487596832052e-06
  • THRESHOLD='~:0.5'
  • METRIC_VALUE=9.610487596832052e-06
  • THRESHOLD_MIN=0
  • THRESHOLD_MAX=0
  • THRESHOLD_INSIDE=0
  • MESSAGE=Unknown
  • EXIT=0
  • verbose ''
  • [[ 0 -eq 1 ]]
  • verbose '--- ~:0.5, test with value: 9.610487596832052e-06 ---'
  • [[ 0 -eq 1 ]]
  • [[ -9.610487596832052e-06- == -\n\u\l\l- ]]
  • [[ -9.610487596832052e-06- == -- ]]
    ++ echo '~:0.5'
    ++ head -c 1
  • [[ ~ == @ ]]
  • [[ ! :0.5 = ^([0-9.]+:?|:)([0-9.]*)?$ ]]
  • [[ :0.5 == : ]]
    ++ echo '
    :0.5'
    ++ awk -F: '{print $1}'
  • THRESHOLD_MIN=''
    ++ echo '
    :0.5'
    ++ awk -F: '{print $2}'
  • THRESHOLD_MAX=0.5
  • [[ -z ~ ]]
  • [[ -z 0.5 ]]
  • [[ 0 == \1 ]]
  • verbose 'Running in OUTSIDE mode (alert if value is outside range {~ ... 0.5})'
  • [[ 0 -eq 1 ]]
  • [[ 0.5 == ~ ]]
  • [[ 0.5 == ~ ]]
  • [[ ~ == ~ ]]
    ++ echo '9.610487596832052e-06 <= 0.5'
    ++ bc
    (standard_in) 1: syntax error
  • [[ 1 -eq '' ]]
  • MESSAGE='VALUE is too high. The value SHOULD BE <= 0.5'
  • EXIT=1
  • verbose 'Should alert: 1 - VALUE is too high. The value SHOULD BE <= 0.5'
  • [[ 0 -eq 1 ]]
  • return 1
  • crit=1
  • [[ 1 == \1 ]]
  • EXIT=2
  • [[ 0 -eq 1 ]]
  • PERFDATA='9.610487596832052e-06Seconds;:0.2;:0.5;0.000000'
  • BODY='Name=LoadBalancerName,Value=Test-PREPROD-endpoints-SinatraLB Latency (5 min Average): 9.610487596832052e-06 Seconds - VALUE is too high. The value SHOULD BE <= 0.5 | perf=9.610487596832052e-06Seconds;:0.2;:0.5;0.000000'
  • verbose 'Name=LoadBalancerName,Value=Test-PREPROD-endpoints-SinatraLB Latency (5 min Average): 9.610487596832052e-06 Seconds - VALUE is too high. The value SHOULD BE <= 0.5 | perf=9.610487596832052e-06Seconds;:0.2;:0.5;0.000000'
  • [[ 0 -eq 1 ]]
  • case ${EXIT} in
  • echo 'CRITICAL - Name=LoadBalancerName,Value=Test-PREPROD-endpoints-SinatraLB Latency (5 min Average): 9.610487596832052e-06 Seconds - VALUE is too high. The value SHOULD BE <= 0.5 | perf=9.610487596832052e-06Seconds;:0.2;:0.5;0.000000'
    CRITICAL - Name=LoadBalancerName,Value=Test-PREPROD-endpoints-SinatraLB Latency (5 min Average): 9.610487596832052e-06 Seconds - VALUE is too high. The value SHOULD BE <= 0.5 | perf=9.610487596832052e-06Seconds;:0.2;:0.5;0.000000
  • exit 2

Usage() called when parameters are present

Found when running from Nagios with parameters I was still getting the Usage() spiel.

In my version of the script I removed line 203:

	usage ;

Which fixed the issue.

Also, there's a typo on line 18, should read "This script is meant for Nagios"

empty result produces confusing message

The API is returning an empty result:

{
    "Datapoints": [], 
    "Label": "CPUCreditBalance"
}

Yet check_cloudwatch.sh reports 'VALUE is ok. It is inside the range'

./check_cloudwatch.sh --region=xyz --metric="CPUCreditBalance" \ 
--namespace="EC2"  --dimensions="Name=DEVWEB,Value=i-xxxxxxxxxx" \
--warning=100  --critical=50 --mins=15

UNKNOWN - Name=DEVWEB,Value=i-xxxxxxxxxx CPUCreditBalance (15 min Average): null null - VALUE is ok. It is inside the range {0 ... 100} | perf=nullnull;100;50;0.000000

The UNKNOWN result is correct, but it'd be good to change the message to something else.

Support for disk space metrics?

Hi guys,

Do you provide support for System\Linux namespace metrics such as DiskSpaceUtilization e.g.:

./check_cloudwatch.sh --region=ap-southeast-2
--namespace-prefix="System"
--namespace="Linux"
--metric="DiskSpaceUtilization"
--statistics="Average"
--mins=15
--dimensions="Name=InstanceId,Value=i-082e14ae3ebad7a07 Name=MountPoint,Value=/ Name=FileSystem,Value=/dev/nvme0n1p1"
--warning=90
--critical=95
--verbose

This currently doesn't work as it looks like your script doesn't support this type of metric, is there any chance of support in a new version? .Cheers

Threshold_Min and Threshhold_Max Values not set properly causing Critical error

Hi ,

This script is giving error for the ELB metrics (HealthyHostCount) for which we have configured Nagios alarms. Even though the metric value is well within the Threshold values , we are getting CRITICAL alarm in Nagios.

We tried to debug the script and found that the THRESHOLD_MIN & THRESHOLD_MAX values are not set curreclty even though the correct metric values are fetched from Cloudwatch.

Metric value: 2
Warning: <2
Critical: 0

Command:
bash -x check_cloudwatch.sh --region=eu-west-1 --namespace="ELB" --metric="HealthyHostCount" --statistics="Average" --mins=5 --dimensions="Name=LoadBalancerName,Value=Test-PREPROD-endpoints-Clb" --warning=2:~ --critical=0 --https_proxy=http://xx.xx.xx.xx:8085

Attached is the Verbose output of script.

how to select the unit (GB, MB, Bytes) for each metric output?

Hello,
thanks for this script, it works like a charm.
I have a question about how to change the unit size coming from the CloudWatch metrics.

In example:
FreeableMemory (5 min Average): 15606355558.400000000 Bytes - VALUE is wrong. It SHOULD BE inside the range {50 ...50}``

I would like to get Giga rather than Bytes...is there a way to change it for the output?

handle critical less than warning

When checking, for example EC2 instance CPUCreditBalance, the critical threshold will be a lower number than the warning threshold e.g.

./check_cloudwatch.sh --region=xyz --metric="CPUCreditBalance" \
--namespace="EC2"  --dimensions="Name=InstanceId,Value=i-xxxxxxxxx" \
--warning=100  --critical=50 --mins=5

OK - Name=InstanceId,Value=i-xxxxxxxxxx CPUCreditBalance (5 min Average): 44.92 Count - VALUE is ok. It is inside the range {0 ... 100} | perf=44.92Count;100;50;0.000000

This should instead produce CRITICAL result

occassional UNKNOWN state because of variable name SECONDS used

As SECONDS is a special variable in bash which reflects the time since script start/assignment to SECONDS it might happen (and happens in our case) that the value changes between assignment of desired value and building of aws command. Then AWS CloudWatch returns error as periond is not an exakt mutiple of 30.
So I suggest to rename variable to PERIOD. (Alternatively it should be possible to unset SECONDS before using it as normal variable as unsetting should remove special meaning, but I like the approach of using an unreserved name more.)

Dimension names with white spaces.

Hello,

the script doesn't seems work with the AWS/Kafka namespace because the dimension names contain white spaces.

$ aws cloudwatch list-metrics --namespace AWS/Kafka  
...
"Dimensions": [
                {
                    "Name": "Cluster Name",
                    "Value": "cluster-x"
                },
                {
                    "Name": "Broker ID",
                    "Value": "1"
                }
            ]
...

Example:

$ ./cloudwatch-metrics.sh --region="eu-west-1" --namespace="Kafka" --metric="KafkaDataLogsDiskUsed" --statistics="Average" --dimensions='Name="Cluster Name",Value=cluster-x Name="Broker ID",Value=1' --warning=50 --critical=100 --verbose --mins=30 

Namespace: AWS/Kafka
Metric name: KafkaDataLogsDiskUsed
Period (Seconds): 0
Dimensions: Name="Cluster Name",Value=cluster-x Name="Broker ID",Value=1
---- ATTEMPT 1 ----
Start time: 2020-02-20T14:26:00
Stop time: 2020-02-20T14:56:00
Minutes window: 30
COMMAND: aws cloudwatch get-metric-statistics --region eu-west-1 --namespace AWS/Kafka --metric-name KafkaDataLogsDiskUsed --output json --start-time 2020-02-20T14:26:00 --end-time 2020-02-20T14:56:00 --period 1800 --statistics Average --dimensions Name="Cluster Name",Value=cluster-x Name="Broker ID",Value=1
----------------

Error parsing parameter '--dimensions': Expected: '<double quoted>', received: '<none>' for input:
Name="Cluster
     ^
Raw result:
Unit:
Metric value: 0,000000000

--- 100, test with value: 0,000000000 ---
Running in OUTSIDE mode (alert if value is outside range {0 ... 100})
(standard_in) 1: syntax error
(standard_in) 1: syntax error
Should alert: 0 - VALUE is ok. It is inside the range {0 ... 100}

--- 50, test with value: 0,000000000 ---
Running in OUTSIDE mode (alert if value is outside range {0 ... 50})
(standard_in) 1: syntax error
(standard_in) 1: syntax error
Should alert: 0 - VALUE is ok. It is inside the range {0 ... 50}
Name="Cluster Name",Value=cluster-x Name="Broker ID",Value=1 KafkaDataLogsDiskUsed (30 min Average): 0,000000000 - VALUE is ok. It is inside the range {0 ... 50} | perf=0,000000000;50;100;0.000000
OK - Name="Cluster Name",Value=cluster-x Name="Broker ID",Value=1 KafkaDataLogsDiskUsed (30 min Average): 0,000000000  - VALUE is ok. It is inside the range {0 ... 50} | perf=0,000000000;50;100;0.000000%                           

but running the command manually works:

$ aws cloudwatch get-metric-statistics --region eu-west-1 --namespace AWS/Kafka --metric-name KafkaDataLogsDiskUsed --output json --start-time 2020-02-20T14:26:00 --end-time 2020-02-20T14:56:00 --period 1800 --statistics Average --dimensions Name="Cluster Name",Value=cluster-x Name="Broker ID",Value=1

{
    "Label": "KafkaDataLogsDiskUsed",
    "Datapoints": [
        {
            "Timestamp": "2020-02-20T13:54:00Z",
            "Average": 0.42896826666666676,
            "Unit": "Percent"
        }
    ]
}

custom namespaces and metrics

I Have deployed a custom namespace that have some custom metric.
(ii is the results of what described here: https://github.com/aws-samples/amazon-cloudwatch-monitor-for-sap-netweaver )

Now If I run the command
"aws cloudwatch list-metrics --profile="
I have as results something like this - That is a New custom namespace and new custom metrics

[...]
{
            "Namespace": "sap-monitor",
            "MetricName": "ST03_RFC_CPU_TIME_PERC_SNAP",
            "Dimensions": [
                {
                    "Name": "by SID",
                    "Value": "BWD"
                }
            ]
        },
        {
            "Namespace": "sap-monitor",
            "MetricName": "SM37_CANCELLED_JOBS",
            "Dimensions": [
                {
                    "Name": "by SID",
                    "Value": "BWD"
                }
            ]
        },
        {
            "Namespace": "sap-monitor",
            "MetricName": "ST03_RFC_AVG_DB_SEQ_AVG_SNAP",
            "Dimensions": [
                {
                    "Name": "by SID",
                    "Value": "BWD"
[...]

How can I use these info to build a COMMAND line that catch data for these metrics ?
I try with something like this:

./check_cloudwatch.sh --region=eu-south-1 --namespace="sap-monitor" --metric="SM37_CANCELLED_JOBS" --statistics="Maximum" --mins=180 --dimensions="Name=by SID,Value=BWD" --warning=1 --critical=10 --profile=TEST

But receive the error:

Error parsing parameter '--dimensions': Expected: '=', received: ',' for input:
SID,Value=BWD
   ^
OK - Name=by SID,Value=BWD SM37_CANCELLED_JOBS (180 min Maximum): 0.000000000  - VALUE is ok. It is inside the range {0 ... 1} | perf=0.000000000;1;10;0.000000

ELB Issue

Hello , I am getting the below error while executing the command

root@tools-server:/usr/local/nagios/libexec#./check_cloudwatch.sh --region=us-east-2 --namespace="ELB" --metric="RequestCount" --statistics="Sum" --mins=5 --dimensions="Name=LoadBalancerName,Value=deploy" --warning=5 --critical=10

Error :-
UNKNOWN - Name=LoadBalancerName,Value=deploy RequestCount (5 min Sum): null null - No metric value known. | perf=nullnull;5;10;0.000000

How to resolve this error , aws credentials are already present there still i am getting error ?
mine load balancer name is deploy

Bin script grammar problem that always arise.

All configuration or setting are no problem,but the following error still appears
(No output on stdout) stderr: /bin/sh: -c: line 0: unexpected EOF while looking for matching `"'

2 spelling errors in script

Script: Line 18, spelling change to 'meant' for Nagios
Script line 150, comment spelling change compair to compare

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.