Giter Club home page Giter Club logo

aws-vpc-flow-log-appender's Introduction

aws-vpc-flow-log-appender

aws-vpc-flow-log-appender is a sample project that enriches AWS VPC Flow Log data with additional information, primarily the Security Groups associated with the instances to which requests are flowing.

This project makes use of several AWS services, including Elasticsearch, Lambda, and Kinesis Firehose. These must be setup and configured in the proper sequence for the sample to work as expected. Here, we describe deployment of the Lambda components only. For details on deploying and configuring other services, please see the accompanying blog post.

The following diagram is a representation of the AWS services and components involved in this sample:

VPC Flow Log Appender Services

NOTE: This project makes use of a free tier of the ipstack geolocation service that enforces a montly limit of 10,000 requests. It is not intended for use in a production environment. We recommend using one of ipstack's paid plans or another commercial source of IP geolocation data if you wish to run this code in such an environment.

Getting Started

To get started, clone this repository locally:

$ git clone https://github.com/awslabs/aws-vpc-flow-log-appender

The repository contains CloudFormation templates and source code to deploy and run the sample application.

Prerequisites

To run the vpc-flow-log-appender sample, you will need to:

  1. Select an AWS Region into which you will deploy services. Be sure that all required services (AWS Lambda, Amazon Elastisearch Service, AWS CloudWatch, and AWS Kinesis Firehose) are available in the Region you select.
  2. Confirm your installation of the latest AWS CLI and that it is properly configured with credentials that have appropriate access to your account.
  3. Install aws-sam-cli.
  4. Install Node.js and NPM.

Configure Geolocation

If you would like to geolocate the source IP address of traffic in your VPC flow logs, you can configure a free account at ipstack.com. Note that the free tier of this service is not intended for production use.

To sign-up for a free account at ipstack.com, visit https://ipstack.com/signup/free to obtain an API key.

Once you have obtained your API key, store it in EC2 Systems Manager Parameter Store as follows (replace MY_API_KEY with your own):

$ aws ssm put-parameter \
      --name ipstack-api-key \
      --value MY_API_KEY \
      --type SecureString

Preparing to Deploy Lambda

Before deploying the sample, install several dependencies using NPM:

$ cd decorator && npm install
$ cd ../ingestor && npm install && cd ..

Deploy Lambda Functions

The deployment of our AWS resources is managed by the AWS SAM CLI using the AWS Serverless Application Model (SAM).

  1. Create a new S3 bucket from which to deploy our source code (ensure that the bucket is created in the same AWS Region as your network and services will be deployed):

    $ aws s3 mb s3://<MY_BUCKET_NAME>
  2. Using the Serverless Application Model, package your source code and serverless stack:

    $ sam package --template-file template.yaml \
                  --s3-bucket <MY_BUCKET_NAME> \
                  --output-template-file packaged.yaml
  3. Once packaging is complete, deploy the stack:

    $ sam deploy --template-file packaged.yaml \
                 --stack-name vpc-flow-log-appender \
                 --capabilities CAPABILITY_IAM

    Or to deploy with the geolocation feature turned on:

    $ sam deploy --template-file packaged.yaml \
                 --stack-name vpc-flow-log-appender \
                 --capabilities CAPABILITY_IAM \
                 --parameter-overrides GeolocationEnabled=true
  4. Once we have deployed our Lambda functions, configure CloudWatch logs to stream VPC Flow Logs to Elasticsearch as described here.

Testing

In addition to running aws-vpc-flow-log-appender using live VPC Flow Log data from your own environment, we can also leverage the Kinesis Data Generator to send mock flow log data to our Kinesis Firehose instance.

To get started, review the Kinesis Data Generator Help and use the included CloudFormation template to create necessary resources.

When ready:

  1. Navigate to your Kinesis Data Generator and login.

  2. Select the Region to which you deployed aws-vpc-flow-log-appender and select the appropriate Stream (e.g. "VPCFlowLogsToElasticSearch"). Set Records per Second to 50.

  3. Next, we will use the AWS CLI to retrieve several values specific to your AWS Account to generate feasible VPC Flow Log data:

    # ACCOUNT_ID
    $ aws sts get-caller-identity --query 'Account'
    
    # ENI_ID (e.g. "eni-1a2b3c4d")
    $ aws ec2 describe-instances \
              --query 'Reservations[0].Instances[0].NetworkInterfaces[0].NetworkInterfaceId'
    
  4. Finally, we can build a template for KDG using the following. Be sure to replace <<ACOUNT_ID>> and <<ENI_ID>> with the values your captured in step 3 (do not include quotes).

    2 <<ACCOUNT_ID>> <<ENI_ID>> {{internet.ip}} 10.100.2.48 45928 6379 6 {{random.number(1)}} {{random.number(600)} 1493070293 1493070332 ACCEPT OK
    
  5. Returning back to KDG, copy and paste the mock VPC Flow Log data in Template 1. Then click the "Send data" button.

  6. Stop KDG after a few seconds by clicking "Stop" in the popup.

  7. After a few minutes, check CloudWatch Logs and your Elasticsearch cluster for data.

A few notes on the above test procedure:

  • While our example utilizes the ENI ID of an EC2 instance, you may use any ENI available in the AWS Region in which you deployed the sample code.
  • Feel free to tweak the mock data template if needed, this is only intended to be an example.
  • Do not modify values in double curly braces, these are part of the KDG template and will automatically be filled.

Cleaning Up

To clean-up the Lambda functions when you are finished with this sample:

$ aws cloudformation delete-stack --stack-name vpc-flow-log-appender-dev

Updates

  • Aug 2 2018 - Updated decorator function and geocode modue to use ipstacks as previous service is now defunct. Amended README to include new instructions on using ipstacks.
  • Jun 9 2017 - Fixed issue in which decorator did not return all records to Firehose when geocoder was over 15,000 per hour limit. Instead, will return blank geo data. Added Test methodology.

Authors

  • Josh Kahn - Initial work

aws-vpc-flow-log-appender's People

Contributors

chriscoombs avatar jkahn117 avatar prindlefly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-vpc-flow-log-appender's Issues

Copy another region logs

Kudos to the team/project. Its really useful. This currently works for only one region. And to monitor multiple region logs having multiple setup would be difficult.

Instead can we have a single lambda function to copy flow logs from another region VPC to one region and then just load them? Will that work? or do we have any other suggestions on this implementation?

Downtime Handling Part

First of all i'm using this entire setup to monitor AWS vpc flow logs and its easy to use ES for some data insights. We use AWS ES and everything seems to be fine. Few questions in this regard:

  1. What if the ES cluster is red due to some space issue or any other issue? Because retry duration is 60 sec and if my ES downtime is in hours. What will happen to my flow logs? Please provide some details what can be done?

Unable to package from Windows

I have installed the per-requesites...
below is the error when packaging the source code.

Traceback (most recent call last):
File "D:\Users\ABC\AppData\Roaming\Python\Scripts\sam-script.py", line 11, in
load_entry_point('aws-sam-cli==0.6.0', 'console_scripts', 'sam')()
File "c:\python27\lib\site-packages\click\core.py", line 722, in call
return self.main(*args, **kwargs)
File "c:\python27\lib\site-packages\click\core.py", line 697, in main
rv = self.invoke(ctx)
File "c:\python27\lib\site-packages\click\core.py", line 1066, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\python27\lib\site-packages\click\core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\python27\lib\site-packages\click\core.py", line 535, in invoke
return callback(*args, **kwargs)
File "c:\python27\lib\site-packages\click\decorators.py", line 64, in new_func
return ctx.invoke(f, obj, *args[1:], **kwargs)
File "c:\python27\lib\site-packages\click\core.py", line 535, in invoke
return callback(*args, **kwargs)
File "D:\Users\ABC\AppData\Roaming\Python\Python27\site-packages\samcli\commands\package_init
.py", line 22, in cli
do_cli(args) # pragma: no cover
File "D:\Users\ABC\AppData\Roaming\Python\Python27\site-packages\samcli\commands\package_init_.py", line 26, in do_cli
execute_command("package", args)
File "D:\Users\ABC\AppData\Roaming\Python\Python27\site-packages\samcli\lib\samlib\cloudformation_command.py", line 17, in execute_command
subprocess.check_call([aws_cmd, 'cloudformation', command] + list(args))
File "c:\python27\lib\subprocess.py", line 181, in check_call
retcode = call(*popenargs, **kwargs)
File "c:\python27\lib\subprocess.py", line 168, in call
return Popen(*popenargs, **kwargs).wait()
File "c:\python27\lib\subprocess.py", line 390, in init
errread, errwrite)
File "c:\python27\lib\subprocess.py", line 640, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

Failing due to "body size too big"

When working with larger networks the decorator fails due to the response "body size is too long" (response over 6MB).

I've observed this when decorating 11K + items

Support for additional Elasticsearch Types like 'IP'

Hi there,

I can see the index types are defined in this block:

if (match) { let matched = { // default vpc flow log data '@timestamp': new Date(), 'version': Number(match[1]), 'account-id': Number(match[2]), 'interface-id': match[3], 'srcaddr': match[4], 'destaddr': match[5], 'srcport': Number(match[6]), 'dstport': Number(match[7]), 'protocol': Number(match[8]), 'packets': Number(match[9]), 'bytes': Number(match[10]), 'start': Number(match[11]), 'end': Number(match[12]), 'action': match[13], 'log-status': match[14] }

Kibana supports IP CIDR filtering/mapping so it would be great to allow export of these at CIDR types?

Decorator Function Error

When i am streaming from firehose i call the function it says error. this is what i see in the logs.


23:04:08
START RequestId: 867b3d2a-12a4-11e8-bf8e-1928728ff215 Version: $LATEST
START RequestId: 867b3d2a-12a4-11e8-bf8e-1928728ff215 Version: $LATEST

23:04:08
2018-02-15T23:04:08.716Z 867b3d2a-12a4-11e8-bf8e-1928728ff215 Received 11375 records for processing
2018-02-15T23:04:08.716Z 867b3d2a-12a4-11e8-bf8e-1928728ff215 Received 11375 records for processing

23:04:18
2018-02-15T23:04:18.857Z 867b3d2a-12a4-11e8-bf8e-1928728ff215 Finished building ENI to Security Group Mappig and Extracting Records
2018-02-15T23:04:18.857Z 867b3d2a-12a4-11e8-bf8e-1928728ff215 Finished building ENI to Security Group Mappig and Extracting Records

23:04:18
2018-02-15T23:04:18.857Z 867b3d2a-12a4-11e8-bf8e-1928728ff215 Decorating 11375 records
2018-02-15T23:04:18.857Z 867b3d2a-12a4-11e8-bf8e-1928728ff215 Decorating 11375 records

23:04:46
END RequestId: 867b3d2a-12a4-11e8-bf8e-1928728ff215
END RequestId: 867b3d2a-12a4-11e8-bf8e-1928728ff215

23:04:46
REPORT RequestId: 867b3d2a-12a4-11e8-bf8e-1928728ff215 Duration: 38059.39 ms Billed Duration: 38100 ms Memory Size: 128 MB Max Memory Used: 129 MB
REPORT RequestId: 867b3d2a-12a4-11e8-bf8e-1928728ff215 Duration: 38059.39 ms Billed Duration: 38100 ms Memory Size: 128 MB Max Memory Used: 129 MB

23:04:46
RequestId: 867b3d2a-12a4-11e8-bf8e-1928728ff215 Process exited before completing request
RequestId: 867b3d2a-12a4-11e8-bf8e-1928728ff215 Process exited before completing request

Logs not flowing to ES

I did troubleshooting the issues, but here is the error and logs not flowing to ElasticSearch...

Finished building ENI to Security Group Mappig and Extracting Records
Decorating 633 records
No ENI data found for interface undefined
TypeError: Cannot read property 'match' of undefined
{
"errorMessage": "Cannot read property 'match' of undefined",
"errorType": "TypeError",
"stackTrace": [
"isRfc1918Address (/var/task/index.js:152:21)",
"decorateRecords (/var/task/index.js:178:20)",
"Promise.all.then (/var/task/index.js:249:14)",
"",
"process._tickDomainCallback (internal/process/next_tick.js:228:7)"
]
}

Getting error with deploying cloudformation at step 3 of deploy lambda functions

error message is as follows:

$ aws cloudformation deploy --template-file app-sam-output.yaml --stack-name vpc-flow-log-appender-dev --capabilities CAPABILITY_IAM --region eu-west-1
Waiting for changeset to be created..

Failed to create the changeset: Waiter ChangeSetCreateComplete failed: Waiter encountered a terminal failure state Status: FAILED. Reason: User: arn:aws:iam::XXXX:user/YYY is not authorized to perform: cloudformation:CreateChangeSet on resource: arn:aws:cloudformation:eu-west-1:aws:transform/Serverless-2016-10-31

Skip Geo Location Data

Hi,

Since we have a limitation on https://freegeoip.net/(i.e 15K requests per hour). Can we skip that geo location check and process the lamba function without errors.

When we deploy this lamda function in production, we get limit exceed error and it causes invocation errors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.