Giter Club home page Giter Club logo

terraform-aws-cloudbees-ci-eks-addon's Issues

[Blueprints, 02-at-scale]: Kube-Prometheus-Stack: Adding Modern Dashboard for Exploding Node Exporter Data in K8s

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Adding Modern Dashboards per to explode Node Exporter Metrics

Describe the solution you would like

Adding the mentioned dashboard in the grafana helm charts

Describe alternatives you have considered


Additional context


[Blueprints, 03-dr] New blueprint for Disaster Recovery

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Include a Disaster Recovery Blueprint here

Describe the solution you would like

Integrate the content from this repository as new blueprint in this repository. Terraform all code and add it to this workflow.

Describe alternatives you have considered

Additional context

[Blueprints, all] Using Shared Libraries

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Adding Shared Libraries in the Monorepo including Pipeline Templates Catalogs like in

Describe the solution you would like

Describe alternatives you have considered

Additional context

[ci, all] Adding retry mechanism when terraform command fails


1/ Terragrunt comes with extra features that helps to solve transient errors during the pipeline execution like:

It requires to aadd the tool in Docker like


RUN curl -sLO${TG_VERSION}/terragrunt_linux_${ARCH} && \
    mv terragrunt_linux_${ARCH} /usr/bin/terragrunt && \
    chmod +x /usr/bin/terragrunt

And also restructure the code using terragrunt.hcl (see QuickStart) ==> It seems very time costly

2/ Bash approach like explained in this article but it seems not compatible with terraform

[Blueprints, 02-at-scale]: Openldap: Use

Using the following configuration inside the EKS blueprints add-ons

  helm_releases = {
    helm-openldap = {
      namespace        = "openldap-stack-ha"
      create_namespace = true
      chart            = "openldap-stack-ha"
      chart_version    = "4.2.2"
      repository       = ""
      values = [file("k8s/helm-openldap-values.yml")]

Values files


Test LDAP Validation at Operation Center fails with

Authentication: failed for user "Jean Dupond"
User lookup: user "Jean Dupond" may or may not exist.
Does the Manager DN have permissions to perform user lookup?
LDAP Group lookup: could not verify.
Please try with a user that is a member of at least one LDAP group.
The user "Jean Dupond" will be unable to login with the supplied password.
If this is your own account this would mean you would be locked out!
Are you sure you want to save this configuration?
Advanced Configuration

[Blueprints, 02-at-scale] Log Recorders select creationTime > `CREATION_IN_MILLISECONDS_EKS_ADDONS`

Validate Log Recorder to be

aws logs describe-log-streams --log-group-name /aws/containerinsights/cbci-bp02-eks/application --order-by LastEventTime --region us-east-1  --no-descending --query 'logStreams[?creationTime > `xxxxxxxxxx` ]' | . jq

Ref: aws/aws-cli#4227 (comment)

Terraform implementation

[CI, CloudBees plataform] Replace Credentials by Configuring CloudBees OIDC with AWS

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Replace credentials by Configuring CloudBees OIDC with AWS

See CloudBees Internal Doc:

Describe the solution you would like

Describe alternatives you have considered

Additional context

[Blueprints, 02-at-scale] ALB stickiness enabled on OC

Annotations: stickiness.enabled=true
seems odd. We automatically set this attribute on ALB ingresses for HA MCs, because it is a requirement for HA generally that the load balancer be configured with sticky sessions. But that is only because there are multiple backends (replicas) to the service. Why would it need to be set on OC, which does not support multiple replicas?

[CI] Run blueprints only tf files are updated

Community Note

What is the outcome that you are trying to reach?

Do run Blueprints Pipelines ONLY IF:

  • The files from the Main Module get updates
  • The files from the same blueprint are up to date date

Describe the solution you would like

modified_tf_files=$(git show --name-only --oneline HEAD | tail -n +2 | grep '.tf$')

if [ "$modified_tf_files" ]; then
// Run terraform phase

It requires to distinguish it the modification happened: at root, bp 01 or bp02

Describe alternatives you have considered

No other alternatives yet

Additional context

No additional context

[Blueprints, 02-at-scale] EKS blueprints Node Terminator Handler add-ons fails

I would like to use the Node Termination handler for the at-scale blueprint but I am getting the following error

β”‚ Error: creating IAM Policy (aws-node-termination-handler-20231218170201623700000007): MalformedPolicyDocument: Policy statement must contain resources.
β”‚       status code: 400, request id: cc751055-d447-4206-9046-0afc3546c91c
β”‚   with module.eks_blueprints_addons.module.aws_node_termination_handler.aws_iam_policy.this[0],
β”‚   on .terraform/modules/eks_blueprints_addons.aws_node_termination_handler/ line 242, in resource "aws_iam_policy" "this":
β”‚  242: resource "aws_iam_policy" "this" {

The pod is in RUNNING status and I can see in the logs the following

2023/12/18 17:07:50 WRN There was a problem monitoring for events error="AccessDenied: User: arn:aws:sts::324005994172:assumed-role/aws-node-termination-handler-20231218170135041900000006/1702918930359930054 is not authorized to perform: sqs:receivemessage on resource: arn:aws:sqs:us-east-1:324005994172:aws-nth-cbci-bp02-i318-eks because no identity-based policy allows the sqs:receivemessage action\n\tstatus code: 403, request id: 8b956325-4f0e-5082-8761-3edf31a835b4" event_type=SQS_MONITOR

[CI] Terraform Randomly fails with creating KMS Alias (alias/eks/cbci-bpxx-ci-xx-eks): AlreadyExistsException


From time to time, the terraform command fails with

Error: creating KMS Alias (alias/eks/cbci-bp01-ci-v2-eks): AlreadyExistsException: An alias with the name arn:aws:kms:us-east-1:324005994172:alias/eks/cbci-bp01-ci-v2-eks already exists

with module.eks.module.kms.aws_kms_alias.this["cluster"],
on .terraform/modules/eks.kms/ line 255, in resource "aws_kms_alias" "this":
255: resource "aws_kms_alias" "this" {

It only happens in the CI pipeline that it using a s3 as backend.

If your request is for a new feature, please use the Feature request template.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists


  • Module version [Required]:

  • Terraform version:

  • Provider version(s):

Reproduction Code [Required]

Steps to reproduce the behavior:

Expected behavior

Actual behavior

Terminal Output Screenshot(s)

Additional context

It is similar to what is explained in Two hypotheses for this behaviour:

  • the state is in PENDING DELETION(not removed totally, but still exists with same name/path/arn)
  • the state file was not updated correctly (it might have been overwritten by an older version, it might for some reason have failed to be updated despite the correct applied changes from Terraform,...)

[Blueprints, 02-at-scale] Replace Instance Profile by Pod Identities

[Blueprints, 02-at-scale]: EKS 1.28 fails


Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration. The reproduction MUST be executable by running terraform init && terraform apply without any further changes.

If your request is for a new feature, please use the Feature request template.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists


While upgrading to 3.17108.0

Reproduction Code [Required]

Use K8s version set to 1.28 for Blueprints 02

Expected behavior

The cluster comes up with not issue

Actual behavior

Node groups using Graviton does not finish to complete the creation process. They are pickup an AMI TYPE for 1.27


Terminal Output Screenshot(s)

It does not finish it keeps waiting until finish

Additional context

[Blueprints, 02-at-scale] Adding Windows Node Pool for Agents

[Blueprints, 02-at-scale]: Casc: Install Jenkins Health Advisor

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Installing as At scale plugin
Using Kubernetes Secrets to configure the plugin by using the same mail than the license activation
It will increase the Telemetry data on usage

Describe the solution you would like

Describe alternatives you have considered

Additional context

[Doc] Documented simplest example contains wrong variable names


The current documented simple example shows the variable hostname and temp_license:

module "eks_blueprints_addon_cbci" {
  source = "REPLACE_ME"

  hostname     = ""
  cert_arn     = "arn:aws:acm:us-east-1:0000000:certificate/0000000-aaaa-bbb-ccc-thisIsAnExample"
  temp_license = {
    first_name  = "Foo"
    last_name  = "Bar"
    email = "[email protected]"
    company = "Acme Inc."


But the variable names are actually hosted_zone and trial_license.

[Blueprints, all] Loadbalancer not being deleted causes `tf destroy` to fail

Could possibly be a race condition

β”‚ Warning: EC2 Default Network ACL (acl-0bb1c751ce6d3b468) not deleted, removing from state
β”‚ Error: deleting EC2 Subnet (subnet-0d21d5729923852e9): DependencyViolation: The subnet 'subnet-0d21d5729923852e9' has dependencies and cannot be deleted.
β”‚ 	status code: 400, request id: f381ee78-69dc-4c57-940b-f3aac1ad945f
β”‚ Error: deleting EC2 Subnet (subnet-03d24ee9f4bd6e5be): DependencyViolation: The subnet 'subnet-03d24ee9f4bd6e5be' has dependencies and cannot be deleted.
β”‚ 	status code: 400, request id: 8dbec214-976f-417e-ae73-eeca60abac76
β”‚ Error: deleting EC2 Subnet (subnet-08b859259dae46484): DependencyViolation: The subnet 'subnet-08b859259dae46484' has dependencies and cannot be deleted.
β”‚ 	status code: 400, request id: 349641d0-3c79-431a-9171-673fc1ed45c5
β”‚ Error: deleting EC2 Internet Gateway (igw-0e686cb558c91368b): detaching EC2 Internet Gateway (igw-0e686cb558c91368b) from VPC (vpc-018c607b6927cd288): DependencyViolation: Network vpc-018c607b6927cd288 has some mapped public address(es). Please unmap those public address(es) before detaching the gateway.
β”‚ 	status code: 400, request id: e9835b5c-ea6e-4d94-a0c5-1771e8820b47
β”‚ Error: uninstallation completed with 1 error(s): context deadline exceeded

After deleting the loadbalancer, tf destroy fails with

β”‚ Error: deleting EC2 VPC (vpc-018c607b6927cd288): operation error EC2: DeleteVpc, https response error StatusCode: 400, RequestID: c73aca26-db62-4cb2-bd82-ee1cda52a579, api error DependencyViolation: The vpc 'vpc-018c607b6927cd288' has dependencies and cannot be deleted.

The dependencies are two security groups:

  • [k8s] Shared Backend SecurityGroup for LoadBalancer
  • [k8s] Managed SecurityGroup for LoadBalancer

After those are deleted, tf destroy worked.

[Blueprints, 02-at-scale]: Casc: Credentail Example

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

  • Include credential definition in OC Casc bundle using kubernetes secrets
  • Add Kubernetes Secrets into the K8s diagram for At scale

Describe the solution you would like

Describe alternatives you have considered

Additional context

[Blueprints, 02-at-scale]: Using Fargate Profile for Agent Nodes + Karpenter

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Combining in the same EKS cluster (example)

  • MNG + Autoscaler
  • Fargate + Karpenter (specially for Linux Spot Instances) => Only for Agents

Fargate References:

Describe the solution you would like

Describe alternatives you have considered

Additional context

[Blueprints, 02-at-scale]: Velero: Complete Recommendation from CloudBees

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

  • Use the OSS inject-metadata-velero-plugin to automatically set the RESTORED_FROM_BACKUP environment variable during a restore operation.
  • For information on restarting builds after a restore, refer to Restarting builds after a restore.
  • When creating a backup and a schedule, include --snapshot-volumes.
  • For backing up the cluster-scoped resources, use --include-cluster-resources.
  • When creating a restore, use --restore-volumes.

Describe the solution you would like

Describe alternatives you have considered

Additional context

[Blueprints, all] Tags EBS volumens

  enable_amazon_eks_aws_ebs_csi_driver = true
  amazon_eks_aws_ebs_csi_driver_config = {
    configuration_values = jsonencode(
        controller = {
          extraVolumeTags = {
            cb-environment = "demo"
            cb-owner       = "devops-consultants"
            cb-user        = "${local.derived_user}"

[Blueprints, 02-at-scale] KMS Encryption

[CI, GHA] Terraform.yaml

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Remove continue-on-error: true for terraform fmt

Describe the solution you would like

Describe alternatives you have considered

Additional context

[Blueprints, 02-at-scale] Set Topology for 1 AZ in Storage Class

There are two option to prevent from posible node affinity conflict during controllers restarts when using EBS volumens: make topology aware volume to the same AZs, or designing Autoscaling Groups following what is explained in the AWS article Creating Kubernetes Auto Scaling Groups for Multiple Availability Zones (one ASG per AZ for EBS volume and one single ASG per Multiple AZ for EFS volumes). At the moment of publishing this blueprints, terraform-aws-modules/eks/aws does not support availability_zones atribute for the embedded aws_autoscaling_group resource, then the first option is applied in g3 Storage Class.

Only with setting g3 topology Storage Class is not enough. It is required to assigned Pod to 1 AZ only

[Blueprints, 01-getting-started] Tag Invalid for_each argument

On the first apply, I encountered this error:

β”‚ Error: Invalid for_each argument
β”‚   on .terraform/modules/eks/ line 97, in resource β€œaws_ec2_tag” β€œcluster_primary_security_group”:
β”‚   97:   for_each = { for k, v in merge(var.tags, var.cluster_tags) :
β”‚   98:     k => v if local.create && k != β€œName” && var.create_cluster_primary_security_group_tags && v != null
β”‚   99:   }
β”‚     β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚     β”‚ local.create is true
β”‚     β”‚ var.cluster_tags is empty map of string
β”‚     β”‚ var.create_cluster_primary_security_group_tags is true
β”‚     β”‚ var.tags is map of string with 2 elements
β”‚ The β€œfor_each” map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will
β”‚ identify the instances of this resource.
β”‚ When working with unknown values in for_each, it’s better to define the map keys statically in your configuration and place apply-time results only in the map values.
β”‚ Alternatively, you could use the -target planning option to first apply only the resources that the for_each value depends on, and then apply a second time to fully converge.

I noticed that the tags key-value collections was taken from locals:

locals {
  name   = "cbci-bp01-i${random_integer.ramdom_id.result}"
  tags = merge(var.tags, {
    "tf:blueprint"  =
    "tf:repository" = ""

During the first apply, the value of random_integer.ramdom_id.result is not yet known, which triggers the error above.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.