Giter Club home page Giter Club logo

athena-cloudtrail-partitioner's Introduction

Athena CloudTrail Partitioner

AWS Athena is a serverless query service that helps you query your unstructured S3 data without all the ETL.

Athena allows you to query your CloudTrail log data from your S3 bucket on demand. However, it can be challenging to maintain sensible partitioning on the database over time.

This project helps you periodically add partitions to your Athena/Glue database for each day/month/year/region/account added to your CloudTrail log bucket.

Read more about why we built this, and how it can be used, in this blog post.

Prerequisite - Enable CloudTrail

CloudTrail is an audit log of every action to occur in your AWS Action. It should be on all the time.

You can now enable CloudTrail at the AWS Organization level, which means that CloudTrail for each account will be centrally logged and automatically enabled for all new accounts.

Read about how to create your organization CloudTrail here.

Installation

Install the Athena CloudTrail Partitioner through CloudFormation, either through the AWSCLI:

aws cloudformation deploy \
  --stack-name athena-cloudtrail-partitioner \
  --region ${AWS_DEFAULT_REGION} \
  --template-file cf/template.yml \
  --force-upload \
  --parameter-overrides \
    "OrganizationId=${ORGANIZATION_ID}" \
    "S3BucketName=${S3_BUCKET_NAME}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --no-fail-on-empty-changeset

or click this button to deploy throught the AWS Console:

Launch Stack

athena-cloudtrail-partitioner's People

Contributors

dependabot[bot] avatar em0ney avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

athena-cloudtrail-partitioner's Issues

Lambda permissions policy missing

Hi,

I set up the Athena CloudTrail Partitioner and a bit after I noticed FailedInvocations in the CloudWatch Events metrics for the scheduled rule. Per https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CWE_Troubleshooting.html#LAMfunctionNotInvoked it seems that there was a permissions policy missing on the Lambda function. I added the following CloudFormation resource to the stack and all appears to be well now.

## Permissions for CloudWatch Events to invoke the Lambda function
 PartitionLambdaPolicy:
   Type: AWS::Lambda::Permission
   Properties:
     Action: lambda:InvokeFunction
     FunctionName: !Ref PartitionLambda
     Principal: events.amazonaws.com
     SourceArn: !GetAtt LambdaSchedule.Arn

Throwing this out there in the hope that it might help anyone else.

Ryan

Lambda times out when before getting all partitions in CloudTrail bucket

The code doesn't work on a CloudTrail bucket with lots of data - for example a CloudTrail bucket with a years data from 100+ accounts and all regions.

Before the call in line 17 in the handler.js is finished, the lambda function reaches its execution time limit:
to const partitionTree = await getAllParitions(bucket, path);

Also, please note that you have a minor typo in getAllParitions method - it should probably be getAllPartitions, but since the method is also spelled the same way in s3.js it doesn't really matter.

What does matter is, that it can take more than 15 minutes to enumerate a CloudTrail with lots of data. Is there a way you can store the enumeration data in DynamoDB as well, so multiple runs of the Lambda could allow it to pick up where it left?

Deploy fails outside of ap-southeast-2

It seems that it can't be deployed in regions other than ap-southeast-2 because the code S3 bucket is in that region. I'm going to try and copy the code to a bucket in us-east-1 and use the parameters to override this, but at the very least that process should be documented.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.