Hi, the metrics-server in my cluster is unable to scrape metrics fro

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Metrics-Server unable to scrape metrics about terraform-hcloud-kube-hetzner HOT 24 CLOSED

kube-hetzner commented on August 24, 2024

Metrics-Server unable to scrape metrics

from terraform-hcloud-kube-hetzner.

Comments (24)

phaer commented on August 24, 2024 1

@mysticaltech Could be, I am mostly guessing at this point!

x509: certificate is valid for 127.0.0.1, 88.198.105.71, not 10.2.0.1" node="agent-big-0"

I interpret this error as saying that the metric server (on agent-big-0) is trying to contact the API server on 10.2.0.1, but its certificate is only signed for localhost and the external ip, not the internal control plane ip (10.2.0.1). So SAN sounds suspicious to me.

EDIT: No, thats wrong. It's not about the API server (6443), see port 10250

from terraform-hcloud-kube-hetzner.

phaer commented on August 24, 2024 1

Can confirm that the metrics-server seems to be working in my just-deployed cluster (name-suffixes branch, but that shouldn't matter here)

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024 1

My bad all IPs that are not private must be.

All should be 10.X.0.X...

As soon as you open the file, you will know!

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024 1

The node-ip is wrong in the config.yaml, it should be the private ip of you server so 10.2.0.X form.

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024 1

Thank you so much! Now it looks okay :)

from terraform-hcloud-kube-hetzner.

phaer commented on August 24, 2024

Hi @MartiniMoe,

Looks like your control planes certificate does not include its internal IP. This looks like a bug, but I don't have time to investigate correctly atm. If you do, please try if adding the following line to your control planes k3s config in https://github.com/kube-hetzner/kube-hetzner/blob/master/control_planes.tf#L45

tls-san = module.control_planes[0].private_ipv4_address

and re-provision. This should include the control planes private ip in the cert, but is curently untested.

(We had this in an earlier version of kube-hetzner, but there's been a lot going on in this repo lately, hope that it's going to stabilize soon ;))

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024

Thanks, I will try that!
How can I re-provision easily? Or do I have to takedown everything and start over?

from terraform-hcloud-kube-hetzner.

phaer commented on August 24, 2024

Should be sufficient to taint your first control node in this case.

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024

Thanks. I added the line and let terraform recreate the control-node-0, but the problems persists :/

from terraform-hcloud-kube-hetzner.

phaer commented on August 24, 2024

Does the resulting k3s config look correct? Did you check whether the certificate is generated correctly? Did you try re-creating the cluster?
I sadly can't provide step-by-step instructions, you need to do some of the digging yourself (or wait until someone else does) ;)

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024

@phaer The tls-san defaults to the node-ip, that's why I removed it while fixing another certificate issue that was just the node agents using their public IP as node-ip. So I believe, that tls-san, is not the problem, or is it?

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024

Ahhh.... Yes you are right!

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024

@MartiniMoe you are probably using master from 48h ago... Just pull the latest changes, this is fixed already!

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024

@mysticaltech Thanks, I already pulled and did a terraform apply. Can I fix this without recreating the cluster?

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024

@MartiniMoe Yes probably, login via ssh to each agent (see in the readme).

then:

systemctl stop k3s-agent

Then edit the /etc/rancher/k3s/config.yml

Change the server IP, basically all IPs to the private IP.

systemctl start k3s-agent

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024

@mysticaltech Do I change "server": or "node-ip": or both?

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024

Maybe you need to drain and uncordon the node before.. I think it's best

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024

I'm not sure what happened here. My agent has the private IP "10.1.0.1" in the config file, but actually it has "10.2.0.1" 😕

from terraform-hcloud-kube-hetzner.

phaer commented on August 24, 2024

What's your network_ipv4_subnets and agent_nodepools?

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024

network_ipv4_subnets = {
  control_plane = "10.1.0.0/16"
  agent_big     = "10.2.0.0/16"
#  agent_small   = "10.3.0.0/16"
}

agent_nodepools = {
  agent-big = {
    server_type = "cpx21",
    count       = 1,
    subnet      = "agent_big",
  }
#  agent-small = {
#    server_type = "cpx11",
#    count       = 2,
#    subnet      = "agent_small",
#  }
}

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024

If I change the agents IP in its config to its actual IP the node is shown as not ready afterwards in kubectl get nodes.

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024

Post you config.yaml file here please. You r server IP should be 10.1.0.1. And check the node events, with kubectl describe. What does it say? Also did you drain it and and cordon before?

from terraform-hcloud-kube-hetzner.

mysticaltech commented on August 24, 2024

Any updates on this @MartiniMoe?

from terraform-hcloud-kube-hetzner.

MartiniMoe commented on August 24, 2024

Ah yes, sorry for the late reply, I was a little busy.

So, this is my network

This is the config.yaml on agent-big-0:

static:~ # cat /etc/rancher/k3s/config.yaml
"flannel-iface": "eth1"
"kubelet-arg": "cloud-provider=external"
"node-ip": "10.1.0.1"
"node-label":
- "k3s_upgrade=true"
"node-name": "agent-big-0"
"server": "https://10.1.0.1:6443"
"token": "<token>"

There are no events:

~ ❯ kubectl describe node agent-big-0
[...]
Events:              <none>

Regarding draining and cordon, I tried to do it, but it did not really work, because I had to much workload in the cluster. I then deleted some deployments, tried again and made the changes to config.yaml. To be honest at this point I'm not exactly sure if the node was drained then 😕

from terraform-hcloud-kube-hetzner.

Metrics-Server unable to scrape metrics about terraform-hcloud-kube-hetzner HOT 24 CLOSED

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent