awslabs / amazon-kinesis-aggregators Goto Github PK
View Code? Open in Web Editor NEWAmazon Kinesis Aggregators provides a simple way to create real time aggregations of data on Amazon Kinesis.
License: Other
Amazon Kinesis Aggregators provides a simple way to create real time aggregations of data on Amazon Kinesis.
License: Other
The README documents the lastWriteSeq field in the aggregated DynamoDB table, so my impression was that this consumer application implements an idempotency mechanism to handle the scaling and failover cases in a KCL application.
However while browsing the source, I noticed that DefaultIdempotencyCheck's doProcess method returns true unconditionally, and looks like a placeholder implementation of IIdempotencyCheck.
I'm not clear if a full implementation of IIdempotencyCheck was simply omitted by mistake, or if the framework intends for the user to provide her own implementation. If a full implementation that uses lastWriteSeq was omitted, it should be added. If the framework intends for the user to supply an implementation, it should be documented as such.
Without one or the other, users may be lead to believe the aggregators are safe to use in scaling and failover scenarios when they're not.
The examples in the README, such as https://s3-eu-west-1.amazonaws.com/meyersi-ire-aws/KinesisDynamicAggregators/sample/regex-aggregator.json, don't appear to be public and are returning 403.
I have an EC2 instance running an aggregator app. It reads fine from a Kinesis Stream but the aggregated tables are generated in an incorrect region.
How do I specify the region of the DynamoDB tables?
BTW, the KCL app is creating tables in the right region.
I have just cloned the repo as of yesterday, and am trying to run the JAR file directly via command line on my local laptop (for testing).
I have configured the JSON config file to aggregate on MINNUTE, HOUR and FOREVER (see below)
[
{
"namespace":"TestJsonConfigApp",
"labelItems":["EmailAddress"],
"type":"COUNT",
"timeHorizons":["MINUTE","HOUR", "FOREVER"],
"dataExtractor":"JSON",
"dateItem":"EventDateTime",
"tableName":"TestTable",
"emitMetrics" : true,
"readIOPS":20,
"writeIOPS":40,
"IDataStore":"com.amazonaws.services.kinesis.aggregators.datastore.DevNullDataStore"
}
]
I have then added some events through the KCL libraries (which works fine), then I can see errors in the aggregator logs saying:
Unable to Parse Date Value H-1970-01-18 02:00:00
I can see this value in the DynamoDB table, but it looks like the data that the aggregator adds, not from me... what am I doing wrong?
The same happens for the minute aggregations, such as m-1970-01-18 ...
As far as I can see current distribution jars are not published to Maven central repository. Are there any plans to do this?
From https://forums.aws.amazon.com/thread.jspa?messageID=585971:
Generating Sensor 1
Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.transform.JsonUnmarshallerContext.getCurrentToken()Lcom/fasterxml/jackson/core/JsonToken;
at com.amazonaws.services.kinesis.model.transform.PutRecordResultJsonUnmarshaller.unmarshall(PutRecordResultJsonUnmarshaller.java:40)
at com.amazonaws.services.kinesis.model.transform.PutRecordResultJsonUnmarshaller.unmarshall(PutRecordResultJsonUnmarshaller.java:31)
at com.amazonaws.http.JsonResponseHandler.handle(JsonResponseHandler.java:104)
at com.amazonaws.http.JsonResponseHandler.handle(JsonResponseHandler.java:41)
at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:730)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:417)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:245)
at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2326)
at com.amazonaws.services.kinesis.AmazonKinesisClient.putRecord(AmazonKinesisClient.java:557)
at producer.SensorReadingProducer.run(SensorReadingProducer.java:151)
at producer.SensorReadingProducer.main(SensorReadingProducer.java:167)
Add support for Aggregators to be able to read not only from a Kinesis Stream, but also from a Dynamo DB Update Stream. Support existing serialisation models with content of Dynamo Stream Images.
Hi,
Is there a built in class for working with base64 encoded object? If not how would we go about supporting that?
Regards
Paul
Couple of ideas, In terms of the as you put it, more common "Querying for Data by Date"
What do think about an optional consistent read on the aggregate data like dateQuery?consistent=true
Any thoughts about being able to configure additional indexes in configuration, to query externally?
Or related suggestions for coordinating additional processing on the aggregate data; Just wondering...
17-Jun-2015 20:19:15.970 SEVERE [pool-1-thread-1] com.amazonaws.services.kinesis.aggregators.cache.AggregateCache.flush Metrics Emitter Exception - Aggregate Cache will NOT terminate
17-Jun-2015 20:19:15.970 SEVERE [pool-1-thread-1] com.amazonaws.services.kinesis.aggregators.cache.AggregateCache.flush java.text.ParseException: Unparseable date: "*"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.