managedkube / kubernetes-ops Goto Github PK

View Code? Open in Web Editor NEW

296.0 296.0 134.0 35.03 MB

Running Kubernetes in production

License: Apache License 2.0

HCL 81.37% Shell 8.21% Makefile 3.00% Dockerfile 0.68% Go 1.65% Smarty 5.09%

infrastructure kubernetes kubernetes-clusters tutorials

kubernetes-ops's People

Contributors

Stargazers

Watchers

Forkers

senayyakut tunguyen9889 khairajpamnani rajivchirania pmgarcia1205 bharathkallurs meelement nghinv arashkaffamanesh minhsonmta ngocngv baonq243 triicst aniamin knagaraju lokmand collbrain srve4 sisivy pastrylabs dafoo parsable codeshane bgervan rmccomb-bnet grebois venkataratnam19 nivasvand cemyld iac-projects rd-master tushar5525 syllogy ppr-a320 j-monroe mybarretto shubhasish garland-kan-sage ehis0075 emctl avaussant effy-coding naija-automator karedlapk kcoconnor duysmile coltonmcewen bcarranza krishnakumartm exklinxence rjdp vinreddy1990 craig-z suriya786 nasheikh ernesen farmstea lolafaith jpequegn jjordansd werneroscar same7ammar ryohare kdondle ajimen94 damolatoba davidabouhalaka mrwd2009 vigo92 sureshmvp tendai-lino ronilp1 lmcdonough wenesak sunatthegilddotcom shinchan79 williamsunctp kbsinfotech usandoval kknautiyal noodleshc anandzy rajalexander mrtayyabpwc pocheswari techiesantosh mughetti spectree olamide005 terraen renelingofe dyvakumar georgreen algvaldivia almcfarlane1878 shockleyje joymiah bakuppus gabrielehima jorge1j

kubernetes-ops's Issues

Monitoring and alerting on k8s audit logs

How to monitor and alert on k8s audit logs?

Pod role example for accessing S3 bucket

Simple example where policy docs can be defined in repo and role created / applied for specified pods to access an S3 bucket. Similar to EFS example https://github.com/ManagedKube/kubernetes-ops/blob/main/terraform-modules/aws/eks-efs-csi-driver/main.tf#L45

ingress nginx and cert manager timeout issue - K8s 1.22

Hi, I tried to upgrade EKS Kubernetes to 1.22. Auto scaler and cert manager were timing out but after updating the versions of aws and helm terraform providers to latest versions and changing the api version link from alpha to beta, it started working

api_version = "client.authentication.k8s.io/v1beta1"

Now i'm stuck with the Nginx ingress. Possibly because of the Nginx announced changes

apiVersion: networking.k8s.io/v1

here is the error I receive


module.ingress-nginx-external.helm_release.helm_chart: Destroying... [id=ingress-nginx]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 1m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 1m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 1m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Destruction complete after 1m28s
module.ingress-nginx-external.helm_release.helm_chart: Creating...
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [5m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [5m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [5m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [5m30s elapsed]
╷
│ Warning: Helm release "ingress-nginx" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.
│ 
│   with module.ingress-nginx-external.helm_release.helm_chart,
│   on .terraform/modules/ingress-nginx-external/terraform-modules/aws/helm/helm_generic/main.tf line 1, in resource "helm_release" "helm_chart":
│    1: resource "helm_release" "helm_chart" {
│ 
╵

any suggestions? if it is just changing the api version, how can I do that?

Ingress Examples

Make it easy to see how to set up ingress using:
external ELB
internal ELB
external NLB
internal NLB
external/internal ALB

Add support for Graviton instances

Graviton instances are based on ARM 64 bit architecture and offer great price/performance ratios.
I tried adding a new node group (ng2) for Graviton instances:

  node_groups = {
    ng1 = {
      disk_size        = 20
      desired_capacity = 2
      max_capacity     = 4
      min_capacity     = 1
      instance_types   = ["t3.small"]
      capacity_type    = "SPOT"
      additional_tags  = var.tags
      k8s_labels       = {}
    }

    ng2 = {
      disk_size        = 20
      desired_capacity = 1
      max_capacity     = 4
      min_capacity     = 1
      instance_types   = ["t4g.small"]
      capacity_type    = "SPOT"
      additional_tags  = var.tags
      k8s_labels       = {}
    }

  }

Applying the Terraform code results in an error. The error message shows it tries to use x86 Amazon Linux 2 image, which is not valid, since it needs the ARM64 bit image:

│ Error: error creating EKS Node Group (staging:staging-ng2-enhanced-grubworm): InvalidParameterException: [t4g.small] is not a valid instance type for requested amiType AL2_x86_64
│ {
│ RespMetadata: {
│ StatusCode: 400,
│ RequestID: "73318df5-e6c3-4e1e-ad3b-7b209bc182f6"
│ },
│ ClusterName: "staging",
│ Message_: "[t4g.small] is not a valid instance type for requested amiType AL2_x86_64",
│ NodegroupName: "staging-ng2-enhanced-grubworm"
│ }
│
│ with module.eks.module.eks.module.node_groups.aws_eks_node_group.workers["ng2"],
│ on .terraform/modules/eks.eks/modules/node_groups/node_groups.tf line 1, in resource "aws_eks_node_group" "workers":
│ 1: resource "aws_eks_node_group" "workers" {
│

Thank you!

Nginx-ingress module

Turn this into a module for aws so it can get the ACM cert. Make this optional for people who wants to use ACM instead of cert-manager

GCP Cloud IAP for ssh access

Could be interesting to use this for ssh access into private VM instances

https://cloud.google.com/solutions/building-internet-connectivity-for-private-vms

SOC 2 Compliancy

Can we get these clusters and process SOC 2 compliant from the start and produce the evidence of the infrastructure and process being compliant?

A guide to being compliant:
https://pages.datree.io/hubfs/SOC2-compliance-Git-guide-Datree.pdf

Are there open source tools out there to help us here?

Add OAuth2 Proxy Usage

Items like Prometheus does not have authentication. We can use this for authentication.

https://github.com/helm/charts/tree/master/stable/oauth2-proxy

Monitoring and Dashboarding VPC flow logs

How to monitor and dashboard VPC flow logs?

Fargate Module

Create a module to run containers on Fargate within the cluster

Additional Terraform GitHub Actions

This project is very interesting in that it has a bunch of github actions for TF:

https://github.com/dflook/terraform-github-actions

The lint looks like it actually post to the line number that had the offense: https://github.com/dflook/terraform-github-actions#linting

Check for drift:

https://github.com/dflook/terraform-github-actions#checking-for-drift

This is great! We should have this

Add multi_az flag for postgres rds creation

kubernetes-ops/terraform-modules/aws/postgres/variables.tf

Line 13 in fcc6894

Add variable multi_az to module so that it can be false for dev environments

Kubewatch - events notification

This is pretty cool. Should set this up.

https://github.com/bitnami-labs/kubewatch

Secure by default - AWS

We should make these cluster secure by default and have reasonably security measures in place from the start.

For example this analysis is a good start on what we should do and enable: https://blog.cloudsploit.com/a-technical-analysis-of-the-capital-one-hack-a9b43d7c8aea

For AWS:

AWS config
cloud trails
flow logs
IAM conditions statement to limit where AWS API calls can come from
AWS Inspector - https://aws.amazon.com/blogs/security/amazon-inspector-assess-network-exposure-ec2-instances-aws-network-reachability-assessments/

Move VPC module out to it's own repo

Move this module out to it's own repository: https://github.com/ManagedKube/kubernetes-ops/tree/main/terraform-modules/aws/vpc

This will allow us to more cleanly update this module's lifecycle and provide clear releases for it.

Todo

New repo
Setup Terratest Github Actions
Setup Github Action for a git tag release

Add BanzaiCloud Logging-operator

This looks like a very robust and flexible logging-operator that can handle various scenarios like routing the logs from each namespace to a different logging backend (ES, Loki, S3, etc).

https://github.com/banzaicloud/logging-operator

Add this into the kubernetes-ops and operationalize it.

vault dynamic secrets

Show how to use dynamic secrets and create a production workflow for MYSQL RDS

https://www.hashicorp.com/blog/why-we-need-dynamic-secrets

Setup Falco - Container Native Runtime Security

https://falco.org

https://github.com/falcosecurity/falco

k8s audit log config: https://github.com/falcosecurity/falco/tree/dev/examples/k8s_audit_config

Build Kibana dashboards from this data?

Send Prometheus remote_source data to ES

Can probably replicate the SumoLogic's Kubernetes offering

Vault-helm chart by Hashicorp

Hashicorp has release their version of the helm chart. Is this better than the Helm stable/vault-operator by CoreOs?

https://github.com/hashicorp/vault-helm

We should investigate.

AWS EKS K8S 1.23 ingress-nginx and external-dns CrashLoopBackOff

Has anyone had these issue on AWS EKS K8S 1.23 from staging environment.

Thank you very much!

kubernetes-ops/terraform-environments/aws/staging/helm/external-dns

external-dns

Source chart: https://github.com/kubernetes-sigs/external-dns/tree/master/charts/external-dns

kubernetes-ops/terraform-environments/aws/staging/helm/ingress-nginx-external

ECK - Elastic Search Kubernetes

Elasticsearch has an operator for Kubernetes that looks very interesting.

https://www.elastic.co/guide/en/cloud-on-k8s/current/index.html

Create doc with diagram(s) on standard ways traffic can get inbound to the cluster

This seems to be a widely asked questions that I get.

How do we get traffic into the cluster?

How do the containers inside find other containers like consul/redis/mysql?

Use draw.io and create a doc with diagrams explaining how traffic gets in. Also check in the draw.io's xml file.

Use Github actions to deploy?

This would be interesting to look at if most people are on Github anyways.

Compatibility with digital ocean cloud provider could be a great idea to Kube-ops

We would like to add Digital Ocean to the kubernetes-ops so that it can be managed in the same way as AWS.

We would like the same "ease" of management for Digital Ocean.

A status board in Git

We should use this to check for drift on a continuous basis: https://github.com/dflook/terraform-github-actions#checking-for-drift

Then it should update a markdown page of the status on every run to produce something like this:

This would be cool b/c then we know what the status of the cluster is.

Add k8s-spot-rescheduler

Add the k8s spot rescheduler usage into kubernetes-ops. This can be a very good tool to use when spot instances are used.

https://github.com/pusher/k8s-spot-rescheduler

Install order changes

From Craig

Wondering about installation order - it was easier for me to go
ingress-nginx
external-dns
cert-manager
THEN
kube-prometheus-stack
grafana-loki-stack
my-application
Keep numbering scheme the same in repos but make note of order or just leave it alone? I changed it back in the docs to original, but maybe we just change kube-prometheus stack to 55 or 60 and grafana-loki to 56 or 61

nginx-ingress enable modsecurity

Enable mod security by default

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#lua-resty-waf

terraform plan fails with " no valid credential sources for Terraform AWS Provider found."

Version:
Terraform cli: 1.3.2
kubernetes-ops release: v2.0.47
module: eks
Issue:
When trying to run a plan, it errors out with error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found. I've updated my VPC to match release versions and running plans for other modules in the same tenant are working. (credentials are shared across so shouldn't be a credentials issue). I do see that there's a call to the metadata API IP that's getting a connection refused. I don't think I have anything 'incorrect' however I can't see to get it to work either
Full Error message:

Terraform v1.3.2
on linux_amd64
Initializing plugins and modules...
data.terraform_remote_state.vpc: Reading...
data.terraform_remote_state.vpc: Read complete after 1s
╷
│ Warning: Redundant ignore_changes element
│
│   on .terraform/modules/eks.eks/main.tf line 305, in resource "aws_eks_addon" "this":
│  305: resource "aws_eks_addon" "this" {
│
│ Adding an attribute name to ignore_changes tells Terraform to ignore future
│ changes to the argument in configuration after the object has been created,
│ retaining the value originally configured.
│
│ The attribute modified_at is decided by the provider alone and therefore
│ there can be no configured value to compare with. Including this attribute
│ in ignore_changes has no effect. Remove the attribute from ignore_changes
│ to quiet this warning.
╵
╷
│ Error: error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.
│
│ Please see https://registry.terraform.io/providers/hashicorp/aws
│ for more information about providing credentials.
│
│ Error: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, request send failed, Get "http://169.254.169.254/latest/meta-data/iam/security-credentials/": dial tcp 169.254.169.254:80: i/o timeout
│
│
│   with provider["registry.terraform.io/hashicorp/aws"],
│   on main.tf line 35, in provider "aws":
│   35: provider "aws" {
│
╵
Operation failed: failed running terraform plan (exit 1)

Tools - network sniffer with ksniff

This seems like a good tool for network sniffing in kube. We should investigate the usage and write a doc or link to this blog here.

https://blog.getambassador.io/verifying-service-mesh-tls-in-kubernetes-using-ksniff-and-wireshark-454b1e3f4dc9

Metric server example

For metrics-server, is there a different way it should be added to the cluster than just running

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Wondering how the state of that is tracked in TF if at all. This was needed for HPA to work properly when I deployed my app.

Versions not compatible

eks autoscaler
config:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.37.0"
}
helm = {
source = "hashicorp/helm"
version = "2.5.0"
}

module "cluster-autoscaler" {
source = "github.com/ManagedKube/kubernetes-ops//terraform-modules/aws/cluster-autoscaler?ref=v2.0.82"

error:

Error: Unsupported attribute
│
│ on main.tf line 58, in provider "helm":
│ 58: host = data.terraform_remote_state.eks.outputs.cluster_endpoint
│ ├────────────────
│ │ data.terraform_remote_state.eks.outputs is object with no attributes
│
│ This object does not have an attribute named "cluster_endpoint".
╵
╷
│ Error: Unsupported attribute
│
│ on main.tf line 59, in provider "helm":
│ 59: cluster_ca_certificate = base64decode(data.terraform_remote_state.eks.outputs.cluster_certificate_authority_data)
│ ├────────────────
│ │ data.terraform_remote_state.eks.outputs is object with no attributes
│
│ This object does not have an attribute named
│ "cluster_certificate_authority_data".

╷
│ Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
│
│ with module.cluster-autoscaler.module.cluster-autoscaler.helm_release.helm_chart,
│ on .terraform/modules/cluster-autoscaler.cluster-autoscaler/terraform-modules/aws/helm/helm_generic/main.tf line 1, in resource "helm_release" "helm_chart":
│ 1: resource "helm_release" "helm_chart" {
│
╵
Operation failed: failed running terraform apply (exit 1)

Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
│
│ with module.cluster-autoscaler.module.cluster-autoscaler.helm_release.helm_chart,
│ on .terraform/modules/cluster-autoscaler.cluster-autoscaler/terraform-modules/aws/helm/helm_generic/main.tf line 1, in resource "helm_release" "helm_chart":
│ 1: resource "helm_release" "helm_chart" {
│

kube prometheus stack

config:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.37.0"
}
random = {
source = "hashicorp/random"
}
helm = {
source = "hashicorp/helm"
version = "2.5.0"
}
kubectl = {
source = "gavinbunney/kubectl"
version = ">= 1.7.0"
}
}

module "kube-prometheus-stack" {
source = "github.com/ManagedKube/kubernetes-ops//terraform-modules/aws/helm/kube-prometheus-stack?ref=v2.0.82"

error:
Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
│
│ with module.kube-prometheus-stack.helm_release.helm_chart,
│ on .terraform/modules/kube-prometheus-stack/terraform-modules/aws/helm/kube-prometheus-stack/main.tf line 1, in resource "helm_release" "helm_chart":
│ 1: resource "helm_release" "helm_chart" {
│
╵
Operation failed: failed running terraform apply (exit 1)

Infracost Github action

It would be nice to have the cost of the TF plan/apply in the PRs.

We can do this via infracost: https://github.com/infracost/infracost-gh-action

Can we do it with just their open source version without having to sign up and get their key?

Pomeruim for identity proxy

An interesting way to secure an internal application that doesn't have it's open auth like prometheus

https://banzaicloud.com/blog/pomerium-authentication/

Ship ECR images to a central account Example

Do you already have something to copy containers from dev ECR to stage, then stage to prod? Basically so the container gets built and tested in dev, then once it’s good no need to touch it - just move the container into other repos for deployment

https://seethatgo.slack.com/archives/C023NPLHCJD/p1628008537040700

Secrets and service discovery Diagram

"Secrets and service discovery are always something I'm interested in seeing how people approach with containers/kubernetes. From what I can tell, it looks like you're using vault for secrets (love hashicorp) and standard kube services with selectors for discovery. All in all, pretty solid."

https://www.linkedin.com/feed/update/urn:li:activity:6577593558046511104/?commentUrn=urn%3Ali%3Acomment%3A(activity%3A6577593558046511104%2C6577695165530628096)

Restrict traffic in the default security group.

In security scanners, A security group able to pass all the inbound or outbound traffic represent a hole in the matrix by having a default security group that, even though it is not used, allows the entry and exit of all the vpc traffic.

We must try that the VPC module of this project by default, creates a security group with all the cares

vpc module: https://github.com/ManagedKube/kubernetes-ops/tree/main/terraform-modules/aws/vpc

Add E2E test to the TF apply pipeline

On an apply there should be a set of E2E test that runs to make sure everything still works

https://github.com/ManagedKube/kubernetes-ops/actions/runs/1321582892

Something simple like going to the grafana's URL or something is fine

Connecting two clusters with Rancher Submariner

There will be times when there is a need to connect two clusters together. Possibly for DR purposes. This looks like an interesting way of doing it:

https://github.com/submariner-io/submariner

Kafka Operator

Prototype out this operator. Looks to be very complete on ops aspects.

https://github.com/banzaicloud/kafka-operator/blob/master/README.md

Add support for karpenter

In most situations karpenter will give optimal cluster autoscaling when compared to CA, source: https://towardsdev.com/karpenter-vs-cluster-autoscaler-dd877b91629b

Convert AWS VPC Terraform to version 0.12.x

The current AWS VPC Terraform is using version 0.11.x. Using the newer version gets us a good path forward and some more parameterization use cases with Terragrunt that will make using everything easier.

Using Buoyant.io bb tool to simulate traffic

This seems like a good tool to simulate traffic:

https://github.com/BuoyantIO/bb

This can help testing out:

If the ingress is working as expected
Are routes routing through the system correct
If i made a change did it kill any routes?

Syntax issue in kube-prometheus-stack module

In the kube-prometheus-stack TF module - https://github.com/ManagedKube/kubernetes-ops/tree/main/terraform-modules/aws/helm/kube-prometheus-stack

There is a syntax issue that causes Terraform to fail if the module is included as a module dependency in another.
i.e. -

module "kube-prometheus-stack" {
  source = "github.com/ManagedKube/kubernetes-ops/terraform-modules/aws/helm/kube-prometheus-stack"

  helm_values = file("${path.module}/values.yaml")

  depends_on = [
    data.terraform_remote_state.eks
  ]
}

The error that occurs is this:

Waiting for the plan to start...

Terraform v1.2.6
on linux_amd64
Initializing plugins and modules...
╷
│ Error: Invalid function argument
│
│ on .terraform/modules/kube-prometheus-stack/terraform-modules/aws/helm/kube-prometheus-stack/main.tf line 17, in resource "helm_release" "helm_chart":
│ 17: templatefile("./values_local.yaml", {
│ 18: enable_grafana_aws_role = var.enable_iam_assumable_role_grafana
│ 19: aws_account_id = var.aws_account_id
│ 20: role_name = local.k8s_service_account_name
│ 21: }),
│
│ Invalid value for "path" parameter: no file exists at
│ "./values_local.yaml"; this function works only with files that are
│ distributed as part of the configuration source code, so if this file will
│ be created by a resource in this configuration you must instead obtain this
│ result from an attribute of that resource.
╵
Operation failed: failed running terraform plan (exit 1)

This is due to this line of code: https://github.com/ManagedKube/kubernetes-ops/blob/main/terraform-modules/aws/helm/kube-prometheus-stack/main.tf#L17

resource "helm_release" "helm_chart" {
  chart            = "kube-prometheus-stack"
  namespace        = var.namespace
  create_namespace = "true"
  name             = var.chart_name
  version          = var.helm_version
  verify           = var.verify
  repository       = "https://prometheus-community.github.io/helm-charts"

  values = [
    # templatefile("${path.module}/values.yaml", {
--> templatefile("./values_local.yaml", {
      enable_grafana_aws_role = var.enable_iam_assumable_role_grafana
      aws_account_id          = var.aws_account_id
      role_name               = local.k8s_service_account_name
    }),
    var.helm_values,
  ]
}

Changing this line to:

templatefile("${path.module}/values_local.yaml", {

fixes the issue.

Add gVisor

How can we incorporate gVisor into the Kubernetes clusters?

https://github.com/google/gvisor

External DNS - IAM role access to route53 example

For External DNS would be nice to have example of using sealed secrets for that base64 encoded block of credentials for the Route53 IAM user - would be nice for step-by-step to set that up in the context of this module since it’s something we’re applying

Lets give the IAM perm to the external-dns pod so it has access.

Add Rakkess access matrix usage with our cluster

This Review Access - kubectl plugin to show an access matrix for k8s server resources

https://github.com/corneliusweig/rakkess

Create a doc on how to use this with our cluster.

TerraformCloud Workspace creation automation

Create a script to facilitate the creation of different TerraformCloud workspaces with the associated sensitive environment variables for aws access and secrets

How to setup and use SSO authentications

Okta
OneLogin
Auth0

How to monitor logins