Giter Club home page Giter Club logo

terraform-aws-snowflake-loader-ec2's Introduction

Release CI License Registry Source

terraform-aws-snowflake-loader-ec2

A Terraform module which deploys the Snowplow Snowflake Loader on an EC2 node.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

Snowflake Loader loads transformed events from S3 bucket to Snowflake.

For more information on how it works, see this overview.

To configure Snowflake, please refer to the quick start guide.

Duration settings such as folder_monitoring_period or retry_period should be given in the documented duration format.

Example

Normally, this module would be used as part of our quick start guide. However, you can also use it standalone for a custom setup.

See example below:

# Note: This should be the same bucket that is used by the transformer to produce data to load
module "s3_pipeline_bucket" {
  source = "snowplow-devops/s3-bucket/aws"

  bucket_name = "your-bucket-name"
}

# Note: This should be the same queue that is passed to the transformer to produce data to load
resource "aws_sqs_queue" "sf_message_queue" {
  content_based_deduplication = true
  kms_master_key_id           = "alias/aws/sqs"
  name                        = "sf-loader.fifo"
  fifo_queue                  = true
}

module "transformer_wrj" {
  source  = "snowplow-devops/transformer-kinesis-ec2/aws"

  name       = "transformer-server-wrj"
  vpc_id     = var.vpc_id
  subnet_ids = var.subnet_ids

  stream_name             = module.enriched_stream.name
  s3_bucket_name          = module.s3_pipeline_bucket.id
  s3_bucket_object_prefix = "transformed/good/widerow/json"
  window_period_min       = 1
  sqs_queue_name          = aws_sqs_queue.sf_message_queue.name

  transformation_type = "widerow"
  widerow_file_format = "json"

  ssh_key_name     = "your-key-name"
  ssh_ip_allowlist = ["0.0.0.0/0"]

  # Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://your-iglu-server-endpoint/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]
}

module "sf_loader" {
  source = "snowplow-devops/snowflake-loader-ec2/aws"

  name       = "sf-loader-server"
  vpc_id     = var.vpc_id
  subnet_ids = var.subnet_ids

  sqs_queue_name = aws_sqs_queue.sf_message_queue.name

  snowflake_loader_user        = "<USER>"
  snowflake_password           = "<PASSWORD>"
  snowflake_warehouse          = "<WAREHOUSE>"
  snowflake_database           = "<DATABASE>"
  snowflake_schema             = "<SCHEMA>"
  snowflake_region             = "<REGION>"
  snowflake_account            = "<ACCOUNT>"
  snowflake_aws_s3_bucket_name = module.s3_pipeline_bucket.id

  ssh_key_name     = "your-key-name"
  ssh_ip_allowlist = ["0.0.0.0/0"]

  # Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://your-iglu-server-endpoint/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]
}

Requirements

Name Version
terraform >= 1.0.0
aws >= 3.72.0

Providers

Name Version
aws >= 3.72.0

Modules

Name Source Version
instance_type_metrics snowplow-devops/ec2-instance-type-metrics/aws 0.1.2
service snowplow-devops/service-ec2/aws 0.2.0
telemetry snowplow-devops/telemetry/snowplow 0.4.0

Resources

Name Type
aws_cloudwatch_log_group.log_group resource
aws_iam_instance_profile.instance_profile resource
aws_iam_policy.iam_policy resource
aws_iam_policy.sts_credentials_policy resource
aws_iam_role.iam_role resource
aws_iam_role.sts_credentials_role resource
aws_iam_role_policy_attachment.policy_attachment resource
aws_iam_role_policy_attachment.sts_credentials_policy_attachement resource
aws_security_group.sg resource
aws_security_group_rule.egress_tcp_443 resource
aws_security_group_rule.egress_tcp_80 resource
aws_security_group_rule.egress_udp_123 resource
aws_security_group_rule.egress_udp_statsd resource
aws_security_group_rule.ingress_tcp_22 resource
aws_caller_identity.current data source
aws_iam_policy_document.sts_credentials_role data source
aws_region.current data source

Inputs

Name Description Type Default Required
name A name which will be prepended to the resources created string n/a yes
snowflake_account Snowflake account string n/a yes
snowflake_aws_s3_bucket_name AWS bucket name where data to load is stored string n/a yes
snowflake_database Snowflake database name string n/a yes
snowflake_loader_user Snowflake username used by loader to perform loading string n/a yes
snowflake_password Password for snowflake_loader_user used by loader to perform loading string n/a yes
snowflake_region Snowflake region string n/a yes
snowflake_schema Snowflake schema name string n/a yes
snowflake_warehouse Snowflake warehouse name string n/a yes
sqs_queue_name SQS queue name string n/a yes
ssh_key_name The name of the SSH key-pair to attach to all EC2 nodes deployed string n/a yes
subnet_ids The list of subnets to deploy Loader across list(string) n/a yes
vpc_id The VPC to deploy Loader within string n/a yes
amazon_linux_2_ami_id The AMI ID to use which must be based of of Amazon Linux 2; by default the latest community version is used string "" no
associate_public_ip_address Whether to assign a public ip address to this instance bool true no
cloudwatch_logs_enabled Whether application logs should be reported to CloudWatch bool true no
cloudwatch_logs_retention_days The length of time in days to retain logs for number 7 no
custom_iglu_resolvers The custom Iglu Resolvers that will be used by Stream Shredder
list(object({
name = string
priority = number
uri = string
api_key = string
vendor_prefixes = list(string)
}))
[] no
default_iglu_resolvers The default Iglu Resolvers that will be used by Stream Shredder
list(object({
name = string
priority = number
uri = string
api_key = string
vendor_prefixes = list(string)
}))
[
{
"api_key": "",
"name": "Iglu Central",
"priority": 10,
"uri": "http://iglucentral.com",
"vendor_prefixes": []
},
{
"api_key": "",
"name": "Iglu Central - Mirror 01",
"priority": 20,
"uri": "http://mirror01.iglucentral.com",
"vendor_prefixes": []
}
]
no
folder_monitoring_enabled Whether folder monitoring should be activated or not bool false no
folder_monitoring_period How often to folder should be checked by folder monitoring string "8 hours" no
folder_monitoring_since Specifies since when folder monitoring will check string "14 days" no
folder_monitoring_until Specifies until when folder monitoring will check string "6 hours" no
health_check_enabled Whether health check should be enabled or not bool false no
health_check_freq Frequency of health check string "1 hour" no
health_check_timeout How long to wait for a response for health check query string "1 min" no
iam_permissions_boundary The permissions boundary ARN to set on IAM roles created string "" no
instance_type The instance type to use string "t3a.micro" no
java_opts Custom JAVA Options string "-Dorg.slf4j.simpleLogger.defaultLogLevel=info -XX:MinRAMPercentage=50 -XX:MaxRAMPercentage=75" no
retry_period How often batch of failed folders should be pulled into a discovery queue string "10 min" no
retry_queue_enabled Whether retry queue should be enabled or not bool false no
retry_queue_interval Artificial pause after each failed folder being added to the queue string "10 min" no
retry_queue_max_attempt How many attempt to make for each folder number -1 no
retry_queue_size How many failures should be kept in memory number -1 no
sentry_dsn DSN for Sentry instance string "" no
sentry_enabled Whether Sentry should be enabled or not bool false no
snowflake_aws_s3_folder_monitoring_stage_url AWS bucket URL of folder monitoring stage - must be within 'snowflake_aws_s3_bucket_name' (NOTE: must be set if 'folder_monitoring_enabled' is true) string "" no
snowflake_aws_s3_folder_monitoring_transformer_output_stage_url AWS bucket URL of transformer output stage - must be within 'snowflake_aws_s3_bucket_name' (NOTE: must be set if 'folder_monitoring_enabled' is true) string "" no
sp_tracking_app_id App id for Snowplow tracking string "" no
sp_tracking_collector_url Collector URL for Snowplow tracking string "" no
sp_tracking_enabled Whether Snowplow tracking should be activated or not bool false no
ssh_ip_allowlist The list of CIDR ranges to allow SSH traffic from list(any)
[
"0.0.0.0/0"
]
no
statsd_enabled Whether Statsd should be enabled or not bool false no
statsd_host Hostname of StatsD server string "" no
statsd_port Port of StatsD server number 8125 no
stdout_metrics_enabled Whether logging metrics to stdout should be activated or not bool false no
tags The tags to append to this resource map(string) {} no
telemetry_enabled Whether or not to send telemetry information back to Snowplow Analytics Ltd bool true no
user_provided_id An optional unique identifier to identify the telemetry events emitted by this stack string "" no
webhook_collector URL of webhook collector string "" no
webhook_enabled Whether webhook should be enabled or not bool false no

Outputs

Name Description
asg_id ID of the ASG
asg_name Name of the ASG
sg_id ID of the security group attached to the Snowflake Loader servers

Copyright and license

The Terraform AWS Snowflake Loader on EC2 project is Copyright 2022-2023 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

terraform-aws-snowflake-loader-ec2's People

Contributors

jbeemster avatar spenes avatar stanch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.