Giter Club home page Giter Club logo

cloudwatch-fluent-metrics's Introduction

FluentMetrics

IMPORTANT: When using unique stream IDs, you have the potential to create a large number of metrics. Please make sure to review the current AWS CloudWatch Custom Metrics pricing before proceeding.

Overview

FluentMetrics is an easy-to-use Python module that makes logging CloudWatch custom metrics a breeze. The goal is to provide a framework for logging detailed metrics with a minimal footprint. When you look at your code logic, you want to see your actual code logic, not line after line of metrics logging. FluentMetrics lets you maximize your metrics footprint while minimizing your metrics code footprint.

Installation

You can install directly from PyPI:

pip install cloudwatch-fluent-metrics

'Fluent' . . . what is that?

Fluent describes an easy-to-read programming style. The goal of fluent development is to make code easier to read and reduce the amount of code required to build objects. It's easier to take a look a comparison between fluent and non-fluent style.

Non-Fluent Example

g = Game()
f = Frame(Name='Tom')
f.add_score(7)
f.add_score(3)
g.add_frame(f)
f = Frame(Name='Tom')
f.add_strike()
g.add_frame(f)

Non-Fluent Example with Constructor

g = Game()
g.add_frame(Frame(Name='Tom', Score1=7, Score2=3)
g.add_frame(Frame(Name='Tom', Score1=10)

Fluent Example

g = Game()
g.add_frame(Frame().with_name('Tom').score(3).spare())
g.add_frame(Frame().with_name('Tom').strike())

While the difference may seem to be nitpicking, a frame is really just a constructed object. In the first example, we're taking up three lines of code to create the object--there's nothing wrong with that. However, in the second example, we're using constructors. This is slightly more readable, but there's a great deal of logic bulked up in our constructor. In the third example, we're using fluent-style code as it starts at creating the frame and fluently continues until it's created the entire frame in a single line. And more importantly, it's readable. We're not just creating an object with a massive constructor or spending several lines of code just to create a single object.

Terminology Quickstart

Namespaces

Every metric needs to live in a namespace. Since you are logging your own custom metrics, you need to provide a custom namespace for your metric. Click here for a list of the standard AWS namespaces. Example: In this example, we're creating a simple FluentMetric in a namespace called Performance. This means that every time we log a metric with m, we will automatically log it to the Performance namespace.

from fluentmetrics import FluentMetric
m = FluentMetric().with_namespace('Performance')

Metric Names

The metric name is the thing you are actually logging. Each value that you log must be tied to a metric name. When you log a custom metric with a new metric name, the name will automatically be created if it doesn't already exist. Click here to see existing metrics that can help you define names for your custom metrics. Example: In this example, we're logging two metrics called StartupTime and StuffTime to the Performance namespace (we only needed to define the namespace once).

m = FluentMetric().with_namespace('Performance')
m.log(MetricName='StartupTime', Value=27, Unit='Seconds')
do_stuff()
m.log(MetricName='StuffTime', Value=12000, Unit='Milliseconds')

Values

Obviously we need to log a value with each metric. This needs to be a number since we convert this value to a float before sending to CloudWatch. IMPORTANT: When logging multiple values for the same custom metric within a minute, CloudWatch aggregates an average over a minute. Click here for more details.

Storage Resolution

The PutMetricData function now accepts an optional StorageResolution parameter. Set this parameter to 1 to publish high-resolution metrics; omit it (or set it to 60) to publish at standard 1-minute resolution. Example: In this example, we're logging metric at one-second resolution:

m = FluentMetric().with_namespace('Application/MyApp')
                  .with_storage_resolution(1)
m.log(MetricName='Transactions/Sec', Value=trans_count, Unit='Count/Sec')
```sh
#### Dimensions
A dimension defines how you want to slice and dice the metric. These are simply name-value pairs and you can define up to 10 per metric. Click [here](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html#usingDimensions) for more details on using dimensions.
**IMPORTANT:** When you define multiple dimensions, CloudMetrics attaches all of those dimensions to the metric as a single combined dimension set--think of them as an aggregate primary key. For example, if you log a metric with the dimensions `os = 'linux'` and `flavor='ubunutu'` you will only be able to aggregate by **both** `os` and `flavor`. You **cannot** aggregate only by just `os` or just `flavor`. `FluentMetrics` solves this problem by automatically logging three metrics--one for `os`, one for `flavor` and then one for the combied dimensions, giving you maximum flexibility.
*Example*:
In this example, we're logging boot/restart time metrics. When this code executes, we will end up with 6 metrics:
* `BootTime` and `RestartTime` for `os`
* `BootTime` and `RestartTime` for `instance-id`
* `BootTime` and `RestartTime` for 'os` and `instance-id`
```sh
m = FluentMetric().with_namespace('Performance/EC2') \
                  .with_dimension('os', 'linux'). \
                  .with_dimension('instance-id', 'i-123456')
boot_time = start_instance()
m.log(MetricName='BootTime', Value=boot_time, Unit='Milliseconds')
restart_time = restart_instance()
m.log(MetricName='RestartTime', Value=restart_time, Unit='Milliseconds')

Units

CloudWatch has built-in logic to provide meaning to the metric values. We're not just logging a value--we're logging a value of some unit. By defining the unit type, CloudWatch will know how to properly present, aggregate and compare that value with other values. For example, if you submit a value with unit Milliseconds, then it can properly aggregate it up to seconds, minutes or hours. This is a list of the most current valid list of units. A more up-to-date list should be available here under the Unit section,.

"Seconds"|"Microseconds"|"Milliseconds"|"Bytes"|"Kilobytes"|"Megabytes"|
"Gigabytes"|"Terabytes"|"Bits"|"Kilobits"|"Megabits"|"Gigabits"|"Terabits"|
"Percent"|"Count"|"Bytes/Second"|"Kilobytes/Second"|"Megabytes/Second"|
"Gigabytes/Second"|"Terabytes/Second"|"Bits/Second"|"Kilobits/Second"|
"Megabits/Second"|"Gigabits/Second"|"Terabits/Second"|"Count/Second"|"None"
Unit Shortcut Methods

If you don't want to type out the individual unit name, there are shortcut methods for each unit.

m = FluentMetric().with_namespace('Performance/EC2') \
                  .with_dimension('os', 'linux'). \
                  .with_dimension('instance-id', 'i-123456')
m.seconds(MetricName='CompletionInSeconds', Value='1000')
m.microseconds(MetricName='CompletionInMicroseconds', Value='1000')
m.milliseconds(MetricName='CompletionInMilliseconds', Value='1000')
m.bytes(MetricName='SizeInBytes', Value='1000')
m.kb(MetricName='SizeInKb', Value='1000')
m.mb(MetricName='SizeInMb', Value='1000')
m.gb(MetricName='SizeInGb', Value='1000')
m.tb(MetricName='SizeInTb', Value='1000')
m.bits(MetricName='SizeInBits', Value='1000')
m.kbits(MetricName='SizeInKilobits', Value='1000')
m.mbits(MetricName='SizeInMegabits', Value='1000')
m.gbits(MetricName='SizeInGigabits', Value='1000')
m.tbits(MetricName='SizeInTerabits', Value='1000')
m.pct(MetricName='Percent', Value='20')
m.count(MetricName='ItemCount', Value='20')
m.bsec(MetricName='BandwidthBytesPerSecond', Value='1000')
m.kbsec(MetricName='BandwidthKilobytesPerSecond', Value='1000')
m.mbsec(MetricName='BandwidthMegabytesPerSecond', Value='1000')
m.gbsec(MetricName='BandwidthGigabytesPerSecond', Value='1000')
m.tbsec(MetricName='BandwidthTerabytesPerSecond', Value='1000')
m.bitsec(MetricName='BandwidthBitsPerSecond', Value='1000')
m.kbitsec(MetricName='BandwidthKilobitsPerSecond', Value='1000')
m.mbitsec(MetricName='BandwidthMegabitsPerSecond', Value='1000')
m.gbitsec(MetricName='BandwidthGigabitsPerSecond', Value='1000')
m.tbitsec(MetricName='BandwidthTerabitsPerSecond', Value='1000')
m.countsec(MetricName='ItemCountsPerSecond', Value='1000')

Timers

One of the most common uses of logging is measuring performance. FluentMetrics allows you to activate multiple built-in timers by name and log the elapsed time in a single line of code. NOTE: The elapsed time value is automatically stored as unit Milliseconds. Example: In this example, we're starting timers workflow and job1 at the same time. Timers start as soon as you create them and never stop running. When you call elapsed, FluentMetrics will log the number of elapsed milliseconds with the MetricName.

m = FluentMetric()
m.with_timer('workflow').with_timer('job1')
do_job1()
m.elapsed(MetricName='Job1CompletionTime', TimerName='job1')
m.with_timer('job2')
do_job2()
m.elapsed(MetricName='Job2CompletionTime', TimerName='job2')
finish_workflow()
m.elapsed(MetricName='WorkflowCompletionTime', TimerName='workflow')

Metric Stream ID

A key feature of FluentMetrics is the metric stream ID. This ID will be added as a dimension and logged with every metric. The benefit of this dimension is to provide a distinct stream of metrics for an end-to-end operation. When you create a new instance of FluentMetric, you can either pass in your own value or FluentMetrics will generate a GUID. In CloudWatch, you can then see all of the metrics for a particular stream ID in chronological order. A metric stream can be a job, or a server or any way that you want to unique group a contiguous stream of metrics. Example: In this example, we'll have two metrics in the Performance namespace, each with metric stream ID of abc-123. We can then go to CloudWatch and filter by that stream ID to see the entire operation performance at a glance.

m = FluentMetric().with_namespace('Performance').with_stream_id('abc-123')
m.log(MetricName='StartupTime', Value=100, Unit='Seconds')
do_work()
m.log(MetricName='WorkCompleted', Value=1000, Unit='Milliseconds')

Use Case Quickstart

#1: Least Amount of Code Required to Log a Metric

This is the minimal amount of work you need to log--create a FluentMetric with a namespace, then log a value. Result: This code will log a single value 100 for ActiveServerCount in the Stats namespace.

from fluentmetrics import FluentMetric
m = FluentMetric().with_namespace('Stats')
m.log(MetricName='ActiveServerCount', Value='100', Unit='Count')

#2: Logging Multiple Metrics to the Same Namespace

If you are logging multiple metrics to the same namespace, this is a great use case for FluentMetrics. You only need to create one instance of FluentMetric and specify a different metric name when you call log. Result: This code will log a single value 100 for ActiveServerCount in the Stats namespace.

from fluentmetrics import FluentMetric   
m = FluentMetric().with_namespace('Stats')
m.log(MetricName='ActiveServerCount', Value='10', Unit='Count') \
 .log(MetricName='StoppedServerCount', Value='20', Unit='Count') \
 .log(MetricName='ActiveLinuxCount', Value='50', Unit='Count') \
 .log(MetricName='ActiveWindowsCount', Value='50', Unit='Count')

#3: Logging Counts

In the previous example, we logged a metric and identified the unit Count. Instead of specifying the unit, you can specify the type of object Result: This code will log a single value 100 for ActiveServerCount in the Stats namespace.

from fluentmetrics import FluentMetric
m = FluentMetric().with_namespace('Stats')
m.count(MetricName='ActiveServerCount', Value='10')

BufferedFluentMetric

Normally, with FluentMetric, metrics are sent immediately when log is called (or count, milliseconds, etc). This can result in a lot of put_metric_data calls to CloudWatch that are not full. When you use BufferedFluentMetric instead of FluentMetric, it waits until it has the maximum (20) metrics before calling put_metric_data. This optimizes traffic to cloudwatch.

In general, BufferedFluentMetric behaves identically to FluentMetric, except that now it is possible to "forget" to send some metrics. The BufferedFluentMetric.flush() method pushes out all metrics immediately (clears the buffer). It is often best to do this at the end of a request (or some other obviously bounded interval).

Here is an example of how it works in Flask:

from flask import g
from fluentmetrics import BufferedFluentMetric

@app.before_request
def start_request():
	g.metrics = BufferedFluentMetric()
	g.metrics.with_namespace('MyApp')
	g.metrics.with_timer('RequestLatency')

@app.after_request
def end_request(response):
	def error_counter(hundred):
		if response.status_code / 100 == hundred:
			return 1
		else:
			return 0

	g.metrics.count(MetricName='4xxError', Value=error_counter(400))
	g.metrics.count(MetricName='5xxError', Value=error_counter(500))
	g.metrics.count(MetricName='Availability', Value=(1 - error_counter(500)))
	g.metrics.elapsed(MetricName='RequestLatency', TimerName='RequestLatency')

	# Finally, ensure that all metrics end up in CloudWatch before this request finally ends.
	g.metrics.flush()

License

This library is licensed under the Apache 2.0 License.

cloudwatch-fluent-metrics's People

Contributors

charleswhchan avatar hyandell avatar jamesiri avatar johnius avatar soloman1124 avatar troylar avatar zach-data avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cloudwatch-fluent-metrics's Issues

Is this project actively maintained and usable?

I have a usecase to publish custom metrics to CW and found this library. Looks like there is not much activity and some issues were reported and open PRs. Is this a reliable library to use?

AttributeError: 'str' object has no attribute 'put_metric_data'

When I try to put data points to CW metric by using m.log(). I got the following error:
m.log(MetricName='CookieCountTest_Pass', Value=1, Unit='Count')
File "/usr/local/lib/python3.7/site-packages/fluentmetrics/metric.py", line 361, in log
self._record_metric(md)
File "/usr/local/lib/python3.7/site-packages/fluentmetrics/metric.py", line 366, in _record_metric
self.client.put_metric_data(
AttributeError: 'str' object has no attribute 'put_metric_data'

I am not sure why it complains about str object as client should be a boto object.

Also, is there an easy way to specify the region where a metric will be created? for example by following the same way that boto does --- region_name="us-east-1"

MissingRequiredParameterException: An error occurred (MissingParameter) when calling the PutMetricData operation: The parameter MetricData is required.

Hi there,

I'm getting:

An error occurred (MissingParameter) when calling the PutMetricData operation: The parameter MetricData is required.: MissingRequiredParameterException
Traceback (most recent call last):
File "/var/task/spire/__init__.py", line 13, in decorated_function
res = func(*args, **kwargs)
File "/var/task/program_engine/user_event_engine.py", line 51, in handle
program_engine.check_events(list(user_events))
File "/var/task/program_engine/engine.py", line 194, in check_events
self.metric.flush()
File "/var/task/libs/fluentmetrics/buffer.py", line 73, in flush
FluentMetric._record_metric(self, page)
File "/var/task/libs/fluentmetrics/metric.py", line 368, in _record_metric
MetricData=metric_data,
File "/var/runtime/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
MissingRequiredParameterException: An error occurred (MissingParameter) when calling the PutMetricData operation: The parameter MetricData is required.

When using BufferedFluentMetric, my guess at the moment is that if all the data in the buffers has been sent, i.e.: page is None here:

"libs/fluentmetrics/buffer.py" line 26 of 87 --29%-- col 1
   53     def flush(self, send_partial = True):
   54         '''Sends as much data as possible to CloudWatch. If send_partial is set to False,
   55         this only sends full pages. This way, it minimizes the API usage at the cost of
   56         delaying data.
   57         '''
   58         for namespace, buffer in self.buffers.items():
   59             full_pages = len(buffer) / PAGE_SIZE
   60             for i in range(full_pages):
   61                 start = i * PAGE_SIZE
   62                 end = (i + 1) * PAGE_SIZE
   63                 page = buffer[start:end]
   64
   65                 # ship it
   66                 FluentMetric._record_metric(self, page)
   67
   68             start = full_pages * PAGE_SIZE
   69             end = len(buffer) % PAGE_SIZE
   70             if send_partial:
   71                 # ship remaining items
   72                 page = buffer[start:end]
   73                 FluentMetric._record_metric(self, page)

and

"libs/fluentmetrics/metric.py" line 368 of 374 --98%-- col 14
364     def _record_metric(self, metric_data):
365         logger.debug('log: {}'.format(metric_data))
366         self.client.put_metric_data(
367                 Namespace=self.namespace,
368                 MetricData=metric_data,
369         )

MetricData=metric_data, is None then, which is probably why the API complains back.

Got any better ideas about this?

Thanks!

Luis

Hangs when run from lambda

I run this code from my desktop and works great, but the same code, even with Admin role, hangs.

Below are the lines where I use FluentMetric

    m = FluentMetric().with_namespace(CWNAMESPACE).with_stream_id(streamid)
    m.log(MetricName='HealthyServerCount-Port'+str(port), Value=numpassed, Unit='Count')```

python3 support

Please support python3. Should be straight forward.

full_pages = len(buffer) / PAGE_SIZE
      for i in range(full_pages):

->

fluentmetrics/buffer.py in flush at line 60:
'float' object cannot be interpreted as an integer

Provide a Way to customize log stream name with Lambda Cloudwatch Logger

I apologize that this issue is not strictly on topic for this project. Perhaps it could be forwarded to the right Cloudwatch devs?

I posted this on the AWS Developer Forums but didn't receive any replies.

I have copied it below:

From my understanding anything written to stdout/stderr during the invocation of a Lambda will be written to a log stream corresponding to that invocation and a log group corresponding to what lambda is running.

Currently the format of the default log stream name looks like:

"2020/08/21/($LATEST)579c5fcbef68455492a8a1a034475fdd"

Which is probably some type of template like:

"$DATE/($VERSION)/$HASH"

(For some reason square brackets were being rendered as a URL so I replaced them with parentheses)

It would be super helpful if there was a way, possibly only when using a custom runtime if that makes it easier to implement, that one could specify the log stream name template that the lambda auto-generates during an invocation. I don't think that the async lambda log aggregator/uploader that reads stdout/stderr is open source, so I am not sure how it actually works but maybe it could be configured early on in the custom runtime initialization using something like a system wide environment variable?

When browsing logs it would be so awesome to at-a-glance be able to see certain identifiers in the log stream name to narrow down what I'm looking for.

Another possible way to implement this would be to allow for adding "tags" to a log stream like we can currently do for log groups.

If anyone knows a way to implement something like this currently, let me know. I don't think there is though and so this is more of a feature request.

Thanks very much.

Does cloudwatch-fluent-metrics supports High-Resolution Custom Metrics?

Announcement:
https://aws.amazon.com/about-aws/whats-new/2017/07/amazon-cloudwatch-introduces-high-resolution-custom-metrics-and-alarms/

I want to use cloudwatch-fluent-metrics to log Kinesis consumer stats at 1s resolution, but then I saw the following in README:

Values

Obviously we need to log a value with each metric. This needs to be a number since we convert this value to a float before sending to CloudWatch. IMPORTANT: When logging multiple values for the same custom metric within a minute, CloudWatch aggregates an average over a minute. Click here for more details.

I see boto3's CloudWatch put_metric_data() has a StorageResolution field

StorageResolution (integer) --
Valid values are 1 and 60. Setting this to 1 specifies this metric as a high-resolution metric, so that CloudWatch stores the metric with sub-minute resolution down to one second. Setting this to 60 specifies this metric as a regular-resolution metric, which CloudWatch stores at 1-minute resolution. Currently, high resolution is available only for custom metrics. For more information about high-resolution metrics, see High-Resolution Metrics in the Amazon CloudWatch User Guide .

This field is optional, if you do not specify it the default of 60 is used.

Does cloudwatch-fluent-metrics support metrics @ 1s resolution?

Thanks

Documentation doesn't match source

Hi there,

I notice the documentation and the source are not in line.
For example elapsed helper check TimerName where your example uses JobTimerName
Same with count where documentation uses Value where source expects Count.

Regards

Call out that unique StreamID is treated as unique CloudWatch metric in README

I would suggest call out that unique StreamID (unique dimension) is treated as unique CloudWatch metric in README which will incur cost from CloudWatch.
Currently, it seems that if user do not override the StreamID, FluentMetric will generate unique IDs for dimension StreamID for each metric that it try to put. In which case, it generates way more metrics that the user intent to create.

typo in readme

In the units section: "We're not just logging a value--we're looking a value of some unit."

"looking" should be "logging"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.