arithmetric / lambda-stash Goto Github PK
View Code? Open in Web Editor NEWAWS Lambda script to ship data from S3 to other data stores, like Elasticsearch
License: MIT License
AWS Lambda script to ship data from S3 to other data stores, like Elasticsearch
License: MIT License
Since AWS does not yet support logstash agents, the only way to use logstash many filters is by starting an agent on EC2 machine.
I've seen java based lambda functions to parse the logs and extract many business specific attributes. but i believe logstash is a lot maintainable.
I was wondering now about lambda-stash if it combines the benefits of both worlds. does it?
I'm looking for grok filters, custom grok patterns, matching/tagging routes with business cases.
This is not an issue but a question!
I've got this working perfectly for CloudTrail logs - it's great! I'm trying to find a way to ship S3 bucket access logs using it - I've tried a combo of parseSpaces with the CloudFront type and no unzipping, and this does shove the data into ES, but there's no handling of the format to speak of. I've tried forcing it through an index template in ES but that's also pretty messy.
Is there an optimal way of shipping S3 access logs using lambda-stash or am I out of luck?
Hi,
I'm starting with lambda function and I want to stream cloudtrail logs (stored in S3 bucket) to Elastic Search (same aws account). So i found you project and i did as follow:
1 - Clone you git repository
2 - Modify the example/index.js with this:
var shipper = require('lambda-stash');
var config = {
elasticsearch: {
host: 'https://my-elasticsearch-endpoint....es.amazon.com',
region: 'us-east-1',
useAWS: true
},
mappings: [
{
bucket: 'my-bucket-name-that-contain-cloudtrail-logs',
processors: [
'decompressGzip',
'parseJson',
'formatCloudtrail',
'shipElasticsearch'
],
elasticsearch: {
index: 'logs',
type: 'cloudtrail'
},
dateField: 'date'
}
]
};
shipper.handler(config, event, context, callback);
3 - Create a zip file name package.zip (include clone of your repo with custom example/index.js file as shown above
├── index.js
├── package.json
├── test
├── handlers
├── ...
├── example
│ ├── index.js
│ ├── package.json
4 - Create a lambda blank function, add S3 trigger with event type (Object Created all)
5 - Chose node4.3 and upload my zip file create earlier
6 - Left the rest of options by default and click create
7 - Create a role to let lambda access to S3, cloudwatch and elasticcache and attach this role to my freshly create lambda function
8 - My lambda function is created but i'm unsure what i'm supposed to do next ?
I don't see new indices being created on my ES cluster, i probably have missed ton of things like for example the handler which i let by default to index.handler, any chance you could give me a little hint on it please?
EDIT: on my cloudwatch logs,/aws/lambda/myfirstTest it give the following result:
Unable to import module 'index': Error
at Function.Module._resolveFilename (module.js:325:15)
at Function.Module._load (module.js:276:25)
at Module.require (module.js:353:17)
at require (internal/module.js:12:17)
at Object.<anonymous> (/var/task/index.js:1:71)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Module.require (module.js:353:17)
EDIT 3:
1 - I cloned you repo
2 - navigate to example and change the index.js as follow:
var shipper = require('lambda-stash');
var config = {
elasticsearch: {
host: 'https://my-elasticsearch-endpoint....es.amazon.com',
region: 'us-east-1',
useAWS: true
},
mappings: [
{
bucket: 'my-bucket-name-that-contain-cloudtrail-logs',
processors: [
'decompressGzip',
'parseJson',
'formatCloudtrail',
'shipElasticsearch'
],
elasticsearch: {
index: 'logs',
type: 'cloudtrail'
},
dateField: 'date'
}
]
};
shipper.handler(config, event, context, callback);
3 - run npm install (it downloaded all the dependencies)
4 -create a zip file and uploaded to to lambda
The error is now:
{
"errorMessage": "event is not defined",
"errorType": "ReferenceError",
"stackTrace": [
"Object.Module._extensions..js (module.js:416:10)",
"Module.load (module.js:343:32)",
"Function.Module._load (module.js:300:12)",
"Module.require (module.js:353:17)",
"require (internal/module.js:12:17)"
]
}
Still no luck :(
Thanks a ton in advance
Would it be difficult to implement support for ELB logs? This would be a great addition to the types of AWS logs supported.
When talking to AWS Elasticsearch: Error occurred while preparing to ship data: { [Error: Content-Type header [application/x-ldjson] is not supported]
. This is with elasticsearch.js 11.0.1.
application/json
works fine. It's just a question of how to get lambda-stash to do it. I tried changing shipElasticsearch.js to set the headers, but it didn't work:
var esConfig = {
host: config.elasticsearch.host,
requestTimeout: config.elasticsearch.requestTimeout,
headers: {
'Accept': 'application/json',
'Content-Type': 'application/json'
}
};
For now, I've modified the bottom of node_modules/elasticsearch/src/lib/serializers/json.js to use application/json
.
Hi,
I need to implement custom handler for processing CloudWatch logs.
For example if I need to add metadata to CloudWatch logs I'm doing this:
var addMetadata = function(config) {
console.log('addMetadata');
config.environment = "qa";
return Promise.resolve(config);
};
and then added it to processors:
mappings: [
{
processors: [
'formatCloudwatchLogs',
addMetadata,
'outputJsonLines',
'shipTcp'
],
dateField: 'date',
type: 'CloudWatch'
}
]
But I've got error: Cannot find module './handlers/addMetadata'
because it's not in handlers folder.
How can I specify to load my custom handler from the other folder?
I think if lambda-stash could parse some fields like 'time-duration' and 'time' as number and time, logs would be more useful for elasticsearch. Users might want to see 'time-duration' by different regions. I can do some changes to achieve this.
If I'm not mistaken, currently lambda-stash doesn't support SSL/TLS. It would be great if the shippers support that.
Supporting HTTP basic authentication is also needed (e.g. for using with Logstash HTTP input).
First, I'm not sure if the following script is correct for shipping logs from CloudWatch Logs to Elasticsearch. After I run the lambda script, it logs "Handling event for CloudWatch logs", but then I get this error: "Event did not match any mappings".
shipper = require('lambda-stash');
exports.handler = function(event, context, callback) {
var config = {
elasticsearch: {
host: 'something.us-west-2.es.amazonaws.com',
index: 'logs',
region: 'us-west-2',
useAWS: true
},
mappings: [
{
processors: [
'formatCloudwatchLogs',
'shipElasticsearch'
],
elasticsearch: {
type: 'test'
}
}
]
};
shipper.handler(config, event, context, callback);
};
Then I'd like to know if there is a way to ship the logs to different Elasticsearch indexes, based on their LogGroup.
Seems like "http-aws-es" has been moved to different place. Therefore installation is failing.
I was able to install everything manually, but it takes much more time.
Can you fix it?
RomanCRS:lambda rchyr$ npm i lambda-stash
npm ERR! Error while executing:
npm ERR! /usr/local/bin/git ls-remote -h -t ssh://[email protected]/tphummel/http-aws-es.git
npm ERR!
npm ERR! ERROR: Repository not found.
npm ERR! fatal: Could not read from remote repository.
npm ERR!
npm ERR! Please make sure you have the correct access rights
npm ERR! and the repository exists.
npm ERR!
npm ERR! exited with error code: 128
npm ERR! A complete log of this run can be found in:
npm ERR! /Users/rchyr/.npm/_logs/2018-05-18T08_47_40_421Z-debug.log
I got the following template to push cloudtrail logs stored in S3 to ES.
var shipper = require('lambda-stash');
exports.handler = function(event, context, callback) {
var config = {
elasticsearch: {
host: 'https://my-elastic-search-endpoint.es.amazonaws.com',
region: 'us-east-1',
useAWS: true
},
mappings: [
{
bucket: 'my-bucket-name',
processors: [
'decompressGzip',
'parseJson',
'formatCloudtrail',
'shipElasticsearch'
],
elasticsearch: {
index: 'logs',
type: 'cloudtrail'
},
dateField: 'date'
}
]
};
shipper.handler(config, event, context, callback);
};
But all i got in CLoudwatch logs is:
Task timed out after 3.00 seconds
Am i missing something ?
Thanks a lot :)
I don't see how lambda-stash is being license. If this is open source please add a statement otherwise it the "default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work." -- https://help.github.com/articles/licensing-a-repository/
Hi @arithmetric,
We're using lambda-stash npm package to send logs from CloudWatch Logs to Elasticsearch via Logstash.
To do this we've created a custom handler (as described in https://www.npmjs.com/package/lambda-stash#features) but were surprised it doesn't work.
After investigation, I've found that version 1.0.2 doesn't include the latest commit which implementing custom handler, so we need to reference lambda-stash npm package like this: "lambda-stash": "git+https://github.com/arithmetric/lambda-stash.git#d07ff21"
instead of this: "lambda-stash": "^1.0.2"
(which is far more useful).
Could you please release version 1.0.3 based on latest source code?
When processing larger files, I was getting client timeouts. The default of 30 seconds wasn't enough. I ended up adding requestTimeout: Infinity
to esConfig in shipElasticsearch.js:
var esConfig = {
host: config.elasticsearch.host,
requestTimeout: Infinity
};
I figure I don't really care about the client's timeout since Lambda has it's own timeout.
Maybe this could be exposed as a config option?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.