Giter Club home page Giter Club logo

kubernetes-ops's People

Contributors

bcarranza avatar coltonmcewen avatar dafoo avatar eddiesimeon avatar garland-kan-sage avatar grebois avatar jasonboukheir avatar mybarretto avatar olamide005 avatar sakruthijupalli avatar seethatgo avatar sekka1 avatar shockleyje avatar tomoyaogura avatar topagae avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-ops's Issues

ingress nginx and cert manager timeout issue - K8s 1.22

Hi, I tried to upgrade EKS Kubernetes to 1.22. Auto scaler and cert manager were timing out but after updating the versions of aws and helm terraform providers to latest versions and changing the api version link from alpha to beta, it started working

api_version = "client.authentication.k8s.io/v1beta1"

Now i'm stuck with the Nginx ingress. Possibly because of the Nginx announced changes

apiVersion: networking.k8s.io/v1

here is the error I receive


module.ingress-nginx-external.helm_release.helm_chart: Destroying... [id=ingress-nginx]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 1m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 1m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still destroying... [id=ingress-nginx, 1m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Destruction complete after 1m28s
module.ingress-nginx-external.helm_release.helm_chart: Creating...
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [1m50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [2m50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [3m50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m30s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m40s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [4m50s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [5m0s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [5m10s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [5m20s elapsed]
module.ingress-nginx-external.helm_release.helm_chart: Still creating... [5m30s elapsed]
╷
│ Warning: Helm release "ingress-nginx" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.
│ 
│   with module.ingress-nginx-external.helm_release.helm_chart,
│   on .terraform/modules/ingress-nginx-external/terraform-modules/aws/helm/helm_generic/main.tf line 1, in resource "helm_release" "helm_chart":
│    1: resource "helm_release" "helm_chart" {
│ 
╵

any suggestions? if it is just changing the api version, how can I do that?

Ingress Examples

Make it easy to see how to set up ingress using:
external ELB
internal ELB
external NLB
internal NLB
external/internal ALB

Add support for Graviton instances

Graviton instances are based on ARM 64 bit architecture and offer great price/performance ratios.
I tried adding a new node group (ng2) for Graviton instances:

  node_groups = {
    ng1 = {
      disk_size        = 20
      desired_capacity = 2
      max_capacity     = 4
      min_capacity     = 1
      instance_types   = ["t3.small"]
      capacity_type    = "SPOT"
      additional_tags  = var.tags
      k8s_labels       = {}
    }

    ng2 = {
      disk_size        = 20
      desired_capacity = 1
      max_capacity     = 4
      min_capacity     = 1
      instance_types   = ["t4g.small"]
      capacity_type    = "SPOT"
      additional_tags  = var.tags
      k8s_labels       = {}
    }

  }

Applying the Terraform code results in an error. The error message shows it tries to use x86 Amazon Linux 2 image, which is not valid, since it needs the ARM64 bit image:

│ Error: error creating EKS Node Group (staging:staging-ng2-enhanced-grubworm): InvalidParameterException: [t4g.small] is not a valid instance type for requested amiType AL2_x86_64
│ {
│ RespMetadata: {
│ StatusCode: 400,
│ RequestID: "73318df5-e6c3-4e1e-ad3b-7b209bc182f6"
│ },
│ ClusterName: "staging",
│ Message_: "[t4g.small] is not a valid instance type for requested amiType AL2_x86_64",
│ NodegroupName: "staging-ng2-enhanced-grubworm"
│ }

│ with module.eks.module.eks.module.node_groups.aws_eks_node_group.workers["ng2"],
│ on .terraform/modules/eks.eks/modules/node_groups/node_groups.tf line 1, in resource "aws_eks_node_group" "workers":
│ 1: resource "aws_eks_node_group" "workers" {

Thank you!

Nginx-ingress module

Turn this into a module for aws so it can get the ACM cert. Make this optional for people who wants to use ACM instead of cert-manager

Fargate Module

Create a module to run containers on Fargate within the cluster

Secure by default - AWS

We should make these cluster secure by default and have reasonably security measures in place from the start.

For example this analysis is a good start on what we should do and enable: https://blog.cloudsploit.com/a-technical-analysis-of-the-capital-one-hack-a9b43d7c8aea

For AWS:

Install order changes

From Craig

Wondering about installation order - it was easier for me to go
ingress-nginx
external-dns
cert-manager
THEN
kube-prometheus-stack
grafana-loki-stack
my-application
Keep numbering scheme the same in repos but make note of order or just leave it alone? I changed it back in the docs to original, but maybe we just change kube-prometheus stack to 55 or 60 and grafana-loki to 56 or 61

terraform plan fails with " no valid credential sources for Terraform AWS Provider found."

Version:
Terraform cli: 1.3.2
kubernetes-ops release: v2.0.47
module: eks
Issue:
When trying to run a plan, it errors out with error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found. I've updated my VPC to match release versions and running plans for other modules in the same tenant are working. (credentials are shared across so shouldn't be a credentials issue). I do see that there's a call to the metadata API IP that's getting a connection refused. I don't think I have anything 'incorrect' however I can't see to get it to work either
Full Error message:

Terraform v1.3.2
on linux_amd64
Initializing plugins and modules...
data.terraform_remote_state.vpc: Reading...
data.terraform_remote_state.vpc: Read complete after 1s
╷
│ Warning: Redundant ignore_changes element
│
│   on .terraform/modules/eks.eks/main.tf line 305, in resource "aws_eks_addon" "this":
│  305: resource "aws_eks_addon" "this" {
│
│ Adding an attribute name to ignore_changes tells Terraform to ignore future
│ changes to the argument in configuration after the object has been created,
│ retaining the value originally configured.
│
│ The attribute modified_at is decided by the provider alone and therefore
│ there can be no configured value to compare with. Including this attribute
│ in ignore_changes has no effect. Remove the attribute from ignore_changes
│ to quiet this warning.
╵
╷
│ Error: error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.
│
│ Please see https://registry.terraform.io/providers/hashicorp/aws
│ for more information about providing credentials.
│
│ Error: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, request send failed, Get "http://169.254.169.254/latest/meta-data/iam/security-credentials/": dial tcp 169.254.169.254:80: i/o timeout
│
│
│   with provider["registry.terraform.io/hashicorp/aws"],
│   on main.tf line 35, in provider "aws":
│   35: provider "aws" {
│
╵
Operation failed: failed running terraform plan (exit 1)

Metric server example

For metrics-server, is there a different way it should be added to the cluster than just running

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Wondering how the state of that is tracked in TF if at all. This was needed for HPA to work properly when I deployed my app.

Versions not compatible

eks autoscaler
config:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.37.0"
}
helm = {
source = "hashicorp/helm"
version = "2.5.0"
}

module "cluster-autoscaler" {
source = "github.com/ManagedKube/kubernetes-ops//terraform-modules/aws/cluster-autoscaler?ref=v2.0.82"

error:

Error: Unsupported attribute

│ on main.tf line 58, in provider "helm":
│ 58: host = data.terraform_remote_state.eks.outputs.cluster_endpoint
│ ├────────────────
│ │ data.terraform_remote_state.eks.outputs is object with no attributes

│ This object does not have an attribute named "cluster_endpoint".


│ Error: Unsupported attribute

│ on main.tf line 59, in provider "helm":
│ 59: cluster_ca_certificate = base64decode(data.terraform_remote_state.eks.outputs.cluster_certificate_authority_data)
│ ├────────────────
│ │ data.terraform_remote_state.eks.outputs is object with no attributes

│ This object does not have an attribute named
│ "cluster_certificate_authority_data".


│ Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"

│ with module.cluster-autoscaler.module.cluster-autoscaler.helm_release.helm_chart,
│ on .terraform/modules/cluster-autoscaler.cluster-autoscaler/terraform-modules/aws/helm/helm_generic/main.tf line 1, in resource "helm_release" "helm_chart":
│ 1: resource "helm_release" "helm_chart" {


Operation failed: failed running terraform apply (exit 1)

Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"

│ with module.cluster-autoscaler.module.cluster-autoscaler.helm_release.helm_chart,
│ on .terraform/modules/cluster-autoscaler.cluster-autoscaler/terraform-modules/aws/helm/helm_generic/main.tf line 1, in resource "helm_release" "helm_chart":
│ 1: resource "helm_release" "helm_chart" {

kube prometheus stack

config:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.37.0"
}
random = {
source = "hashicorp/random"
}
helm = {
source = "hashicorp/helm"
version = "2.5.0"
}
kubectl = {
source = "gavinbunney/kubectl"
version = ">= 1.7.0"
}
}

module "kube-prometheus-stack" {
source = "github.com/ManagedKube/kubernetes-ops//terraform-modules/aws/helm/kube-prometheus-stack?ref=v2.0.82"

error:
Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"

│ with module.kube-prometheus-stack.helm_release.helm_chart,
│ on .terraform/modules/kube-prometheus-stack/terraform-modules/aws/helm/kube-prometheus-stack/main.tf line 1, in resource "helm_release" "helm_chart":
│ 1: resource "helm_release" "helm_chart" {


Operation failed: failed running terraform apply (exit 1)

Convert AWS VPC Terraform to version 0.12.x

The current AWS VPC Terraform is using version 0.11.x. Using the newer version gets us a good path forward and some more parameterization use cases with Terragrunt that will make using everything easier.

Syntax issue in kube-prometheus-stack module

In the kube-prometheus-stack TF module - https://github.com/ManagedKube/kubernetes-ops/tree/main/terraform-modules/aws/helm/kube-prometheus-stack

There is a syntax issue that causes Terraform to fail if the module is included as a module dependency in another.
i.e. -

module "kube-prometheus-stack" {
  source = "github.com/ManagedKube/kubernetes-ops/terraform-modules/aws/helm/kube-prometheus-stack"

  helm_values = file("${path.module}/values.yaml")

  depends_on = [
    data.terraform_remote_state.eks
  ]
}

The error that occurs is this:

Waiting for the plan to start...

Terraform v1.2.6
on linux_amd64
Initializing plugins and modules...

│ Error: Invalid function argument

│ on .terraform/modules/kube-prometheus-stack/terraform-modules/aws/helm/kube-prometheus-stack/main.tf line 17, in resource "helm_release" "helm_chart":
│ 17: templatefile("./values_local.yaml", {
│ 18: enable_grafana_aws_role = var.enable_iam_assumable_role_grafana
│ 19: aws_account_id = var.aws_account_id
│ 20: role_name = local.k8s_service_account_name
│ 21: }),

│ Invalid value for "path" parameter: no file exists at
│ "./values_local.yaml"; this function works only with files that are
│ distributed as part of the configuration source code, so if this file will
│ be created by a resource in this configuration you must instead obtain this
│ result from an attribute of that resource.

Operation failed: failed running terraform plan (exit 1)

This is due to this line of code: https://github.com/ManagedKube/kubernetes-ops/blob/main/terraform-modules/aws/helm/kube-prometheus-stack/main.tf#L17

resource "helm_release" "helm_chart" {
  chart            = "kube-prometheus-stack"
  namespace        = var.namespace
  create_namespace = "true"
  name             = var.chart_name
  version          = var.helm_version
  verify           = var.verify
  repository       = "https://prometheus-community.github.io/helm-charts"

  values = [
    # templatefile("${path.module}/values.yaml", {
--> templatefile("./values_local.yaml", {
      enable_grafana_aws_role = var.enable_iam_assumable_role_grafana
      aws_account_id          = var.aws_account_id
      role_name               = local.k8s_service_account_name
    }),
    var.helm_values,
  ]
}

Changing this line to:

templatefile("${path.module}/values_local.yaml", {

fixes the issue.

External DNS - IAM role access to route53 example

For External DNS would be nice to have example of using sealed secrets for that base64 encoded block of credentials for the Route53 IAM user - would be nice for step-by-step to set that up in the context of this module since it’s something we’re applying

Lets give the IAM perm to the external-dns pod so it has access.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.