As of current master (<a class="commit-link" data-hovercard-type="commit" data-hoverca

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

stub_domains test failed,about terraform-google-modules/terraform-google-kubernetes-engine

Comments (17)

Jberlinsky commented on August 16, 2024

@czka Could you kindly run the tests again and see if you're able to consistently reproduce this error? I just ran the stub-domains test and was unable to reproduce this issue.

from terraform-google-kubernetes-engine.

czka commented on August 16, 2024

Tried the kitchen destroy -> create -> converge -> verify cycle 2 more times (once for stub-domains-local alone, and once more for the whole set of tests). Same issue keeps cropping out:

expected: {"horizontalPodAutoscaling"=>{}, "httpLoadBalancing"=>{}, "kubernetesDashboard"=>{"disabled"=>true}, "networkPolicyConfig"=>{}}
     got: {"horizontalPodAutoscaling"=>{}, "httpLoadBalancing"=>{}, "kubernetesDashboard"=>{"disabled"=>true}, "networkPolicyConfig"=>{"disabled"=>true}}

from terraform-google-kubernetes-engine.

czka commented on August 16, 2024

@Jberlinsky

I have removed .kitchen/ and all the test/fixtures/*/.terraform/ dirs to start afresh. make test_integration_docker completed all fine in around 90 minutes.

But then I was able to reproduce the error again by running make docker_run, kitchen create stub-domains, kitchen converge stub-domains, kitchen verify stub-domains.

from terraform-google-kubernetes-engine.

Jberlinsky commented on August 16, 2024

@czka Thanks for the update; I'll continue to try to reproduce. Can you tell me if running kitchen converge stub-domains twice, instead of just once, resolves the problem?

from terraform-google-kubernetes-engine.

czka commented on August 16, 2024

BTW, is it as expected that root owns the following dirs created during the tests? I ran them as a regular user.

$ ls -ld .kitchen
drwxr-xr-x 3 root root 4096 Jan 14 16:41 .kitchen

$ find . -type d -name .terraform | xargs ls -ld
drwxr-xr-x 4 root root 4096 Jan 14 13:41 ./test/fixtures/deploy_service/.terraform
drwxr-xr-x 4 root root 4096 Jan 14 13:41 ./test/fixtures/node_pool/.terraform
drwxr-xr-x 4 root root 4096 Jan 14 13:41 ./test/fixtures/shared_vpc/.terraform
drwxr-xr-x 4 root root 4096 Jan 14 13:41 ./test/fixtures/simple_regional/.terraform
drwxr-xr-x 4 root root 4096 Jan 14 13:41 ./test/fixtures/simple_zonal/.terraform
drwxr-xr-x 4 root root 4096 Jan 14 13:41 ./test/fixtures/stub_domains/.terraform

$ ls -ld test/fixtures/stub_domains/terraform.tfstate.d/
drwxr-xr-x 3 root root 4096 Jan 14 16:41 test/fixtures/stub_domains/terraform.tfstate.d/

from terraform-google-kubernetes-engine.

czka commented on August 16, 2024

@Jberlinsky kitchen verify stub-domains passed without errors after running kitchen converge stub-domains 2nd time.

from terraform-google-kubernetes-engine.

aaron-lane commented on August 16, 2024

This seems like an issue within the API rather than the Terraform configuration. We may need to simply emphasize the requirement of applying twice a configuration using the module to obtain the expected results. We should also consider raising a ticket against the provider.

from terraform-google-kubernetes-engine.

Jberlinsky commented on August 16, 2024

Agreed -- I'll file a PR today/tomorrow to emphasize the need to run kitchen converge twice.

Thanks for reporting this, @czka!

from terraform-google-kubernetes-engine.

morgante commented on August 16, 2024

I'd like to do some more digging on why the converge needs to happen twice, it's probably not an issue with the API so much as the provider and/or our config.

In particular, we should note what the plan actually shows for the second converge.

from terraform-google-kubernetes-engine.

czka commented on August 16, 2024

@morgante Only now I noticed a double kitchen converge is already hardcoded in the Makefile: https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blame/master/Makefile#L79. A well known issue with a well known workaround ;).

from terraform-google-kubernetes-engine.

morgante commented on August 16, 2024

@Jberlinsky Do you know why we need to converge twice though?

from terraform-google-kubernetes-engine.

Jberlinsky commented on August 16, 2024

I don't recall the specific reason offhand, but the double converge has been present in this repository for quite some time (see

terraform-google-kubernetes-engine/test/integration/gcloud/run.sh

Line 333 in 5cb2b8b

bundle exec kitchen converge # second time to enable network policy

I'll dig into this a bit.

from terraform-google-kubernetes-engine.

Jberlinsky commented on August 16, 2024

For the google_container_cluster.primary resource, the initial plan is as follows:

  + module.example.module.gke.google_container_cluster.primary
      id:                                                         <computed>
      additional_zones.#:                                         <computed>
      addons_config.#:                                            "1"
      addons_config.0.horizontal_pod_autoscaling.#:               "1"
      addons_config.0.horizontal_pod_autoscaling.0.disabled:      "false"
      addons_config.0.http_load_balancing.#:                      "1"
      addons_config.0.http_load_balancing.0.disabled:             "false"
      addons_config.0.kubernetes_dashboard.#:                     "1"
      addons_config.0.kubernetes_dashboard.0.disabled:            "true"
      addons_config.0.network_policy_config.#:                    "1"
      addons_config.0.network_policy_config.0.disabled:           "false"
      cluster_ipv4_cidr:                                          <computed>
      enable_binary_authorization:                                "false"
      enable_kubernetes_alpha:                                    "false"
      enable_legacy_abac:                                         "false"
      enable_tpu:                                                 "false"
      endpoint:                                                   <computed>
      instance_group_urls.#:                                      <computed>
      ip_allocation_policy.#:                                     "1"
      ip_allocation_policy.0.cluster_ipv4_cidr_block:             <computed>
      ip_allocation_policy.0.cluster_secondary_range_name:        "${var.ip_range_pods}"
      ip_allocation_policy.0.services_ipv4_cidr_block:            <computed>
      ip_allocation_policy.0.services_secondary_range_name:       "${var.ip_range_services}"
      logging_service:                                            "logging.googleapis.com"
      maintenance_policy.#:                                       "1"
      maintenance_policy.0.daily_maintenance_window.#:            "1"
      maintenance_policy.0.daily_maintenance_window.0.duration:   <computed>
      maintenance_policy.0.daily_maintenance_window.0.start_time: "05:00"
      master_auth.#:                                              <computed>
      master_version:                                             <computed>
      min_master_version:                                         "1.11.5-gke.5"
      monitoring_service:                                         "monitoring.googleapis.com"
      name:                                                       "${var.name}"
      network:                                                    "${replace(data.google_compute_network.gke_network.self_link, \"https://www.googleapis.com/compute/v1/\", \"\")}"
      network_policy.#:                                           <computed>
      node_config.#:                                              <computed>
      node_pool.#:                                                "1"
      node_pool.0.initial_node_count:                             <computed>
      node_pool.0.instance_group_urls.#:                          <computed>
      node_pool.0.management.#:                                   <computed>
      node_pool.0.max_pods_per_node:                              <computed>
      node_pool.0.name:                                           "default-pool"
      node_pool.0.name_prefix:                                    <computed>
      node_pool.0.node_config.#:                                  "1"
      node_pool.0.node_config.0.disk_size_gb:                     <computed>
      node_pool.0.node_config.0.disk_type:                        <computed>
      node_pool.0.node_config.0.guest_accelerator.#:              <computed>
      node_pool.0.node_config.0.image_type:                       <computed>
      node_pool.0.node_config.0.local_ssd_count:                  <computed>
      node_pool.0.node_config.0.machine_type:                     <computed>
      node_pool.0.node_config.0.oauth_scopes.#:                   <computed>
      node_pool.0.node_config.0.preemptible:                      "false"
      node_pool.0.node_config.0.service_account:                  "project-service-account@berlinsky-pf-gke-fixture-f466.iam.gserviceaccount.com"
      node_pool.0.node_count:                                     <computed>
      node_pool.0.version:                                        <computed>
      node_version:                                               <computed>
      private_cluster:                                            "false"
      project:                                                    "berlinsky-pf-gke-fixture-f466"
      region:                                                     "us-east4"
      remove_default_node_pool:                                   "false"
      subnetwork:                                                 "${replace(data.google_compute_subnetwork.gke_subnetwork.self_link, \"https://www.googleapis.com/compute/v1/\", \"\")}"
      zone:                                                       <computed>

After the first terraform apply, the relevant terraform plan is as follows:

  ~ module.example.module.gke.google_container_cluster.primary
      addons_config.#:                                       "1" => "1"
      addons_config.0.horizontal_pod_autoscaling.#:          "1" => "1"
      addons_config.0.horizontal_pod_autoscaling.0.disabled: "false" => "false"
      addons_config.0.http_load_balancing.#:                 "1" => "1"
      addons_config.0.http_load_balancing.0.disabled:        "false" => "false"
      addons_config.0.kubernetes_dashboard.#:                "1" => "1"
      addons_config.0.kubernetes_dashboard.0.disabled:       "true" => "true"
      addons_config.0.network_policy_config.#:               "1" => "1"
      addons_config.0.network_policy_config.0.disabled:      "true" => "false"

This change takes a fairly long time to apply (~13 min, just now), and does not result in a permadiff.

I'm continuing to dig in a bit, but it's looking like an API-level problem.

from terraform-google-kubernetes-engine.

Jberlinsky commented on August 16, 2024

I've created a cluster via the API with the following payload:

{
  "cluster": {
    "addonsConfig": {
      "horizontalPodAutoscaling": {
        "disabled": false
      },
      "httpLoadBalancing": {
        "disabled": false
      },
      "kubernetesDashboard": {
        "disabled": true
      },
      "networkPolicyConfig": {
        "disabled": false
      }
    },
    "binaryAuthorization": {
      "enabled": false
    },
    "initialClusterVersion": "1.11.6-gke.2",
    "ipAllocationPolicy": {
      "clusterSecondaryRangeName": "cft-gke-test-pods-938k",
      "servicesSecondaryRangeName": "cft-gke-test-services-938k",
      "useIpAliases": true
    },
    "legacyAbac": {
      "enabled": false
    },
    "locations": [
      "us-east4-a",
      "us-east4-c",
      "us-east4-b"
    ],
    "loggingService": "logging.googleapis.com",
    "maintenancePolicy": {
      "window": {
        "dailyMaintenanceWindow": {
          "startTime": "05:00"
        }
      }
    },
    "monitoringService": "monitoring.googleapis.com",
    "name": "stub-domains-cluster-12s2",
    "network": "projects/berlinsky-pf-gke-fixture-f466/global/networks/cft-gke-test-938k",
    "nodePools": [
      {
        "config": {
          "serviceAccount": "project-service-account@berlinsky-pf-gke-fixture-f466.iam.gserviceaccount.com"
        },
        "name": "default-pool"
      }
    ],
    "subnetwork": "projects/berlinsky-pf-gke-fixture-f466/regions/us-east4/subnetworks/cft-gke-test-938k"
  }
}

Once the cluster is created, I query it with gcloud, and find the same problem:

╰ gcloud --project=berlinsky-pf-gke-fixture-f466 container clusters --zone=us-east4 describe stub-domains-cluster-12s2 --format=json | jq '.addonsConfig.networkPolicyConfig'
{
  "disabled": true
}

Looks like an API-level problem to me, unfortunately.

from terraform-google-kubernetes-engine.

morgante commented on August 16, 2024

@Jberlinsky Can you file an internal bug with the details and I'll route it appropriately?

from terraform-google-kubernetes-engine.

Jberlinsky commented on August 16, 2024

I've filed an internal bug, and submitted #71 to make the README more explicit on this matter.

from terraform-google-kubernetes-engine.

morgante commented on August 16, 2024

Closing in favor of #72.

from terraform-google-kubernetes-engine.

stub_domains test failed about terraform-google-kubernetes-engine HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent