What did you do? I installed prometheus-operator and kube-prometh

If it helps, it looks like some services have no endpoints: <div class="snippet-cl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This is a known issue with GKE <a class="issue-link js-issue-link" data-error-text="Fa

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Alerts firing: ControllerManager, Scheduler and TargetDown about kube-prometheus HOT 22 CLOSED

domcar commented on May 10, 2024 4

Alerts firing: ControllerManager, Scheduler and TargetDown

from kube-prometheus.

Comments (22)

ferpizza commented on May 10, 2024 5

Hi,

I've been dealing with these false positives on GKE. After investigating a little, I realized that GKE doesn't expose the Kubernetes Scheduler nor the Control Manager to end users.

As we are blinded to these services, there is no need for deploying neither the Scheduler Scraper nor the Control Manager Scraper or their respective Alerts.

The easiest way of dealing with these false positive alerts is to disable the Scraping and Alerts related to services managed by GKE on the Values file of the Helm Chart.

kubeControllerManager:
  enabled: false

kubeScheduler:
  enabled: false

This is probably the case for other cloud providers, although I'm not sure about it.

Cheers,

from kube-prometheus.

sandromello commented on May 10, 2024 4

I had a similar issue, but I've used kubeadm to install the cluster. I fixed those alerts editing selector of those services.

If you have kubernetes core components as pods in the kube-system namespace, make sure the label selector of those services match with the labels of the pods.

kubectl get svc kube-prom-exporter-kube-scheduler kube-prom-exporter-kube-controller-manager -n kube-system -o wide
NAME                                         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE       SELECTOR
kube-prom-exporter-kube-scheduler            ClusterIP   None         <none>        10251/TCP   3h        component=kube-scheduler
kube-prom-exporter-kube-controller-manager   ClusterIP   None         <none>        10252/TCP   3h        component=kube-controller-manager

kubectl get po -l component -n kube-system --show-labels
NAME                                                 READY     STATUS    RESTARTS   AGE       LABELS
(...)
kube-apiserver-ip-10-0-41-71.ec2.internal            1/1       Running   0          3h        component=kube-apiserver,tier=control-plane
kube-controller-manager-ip-10-0-41-71.ec2.internal   1/1       Running   0          3h        component=kube-controller-manager,tier=control-plane
kube-scheduler-ip-10-0-41-71.ec2.internal            1/1       Running   0          3h        component=kube-scheduler,tier=control-plane

If any of those components were started bound to 127.0.0.1 you need to change that, please take a look at kubeadm on prometheus for more information.

from kube-prometheus.

hamid2013 commented on May 10, 2024 1

I am also facing same issue, but in my case i have used Azure acs-engine to launch the cluster.

Keep getting the Scheduler and Controller alert.

I can see the pods are running, but there is no corresponding service available there.

from kube-prometheus.

chris530 commented on May 10, 2024 1

I noticed the labels the service was looking for was not returning any pods. After adding the label k8s-app=kube-controller-manager to the control manager, and k8s-app=kube-scheduler to the scheduler the alerts cleared up as the service could find pods now.

from kube-prometheus.

domcar commented on May 10, 2024

If it helps, it looks like some services have no endpoints:

kubectl get endpoints 
kube-system   kube-controller-manager                            <none>                                                           19h
kube-system   kube-prometheus-exporter-kube-scheduler            <none>                                                           24m

from kube-prometheus.

domcar commented on May 10, 2024

@sandromello The problem is that I don't have the Pods kube-scheduler or controller-manager. I think this is the reason why it doesn't work

from kube-prometheus.

ScottBrenner commented on May 10, 2024

This is a known issue with GKE prometheus-operator/prometheus-operator#355 prometheus-operator/prometheus-operator#845. I ended up just deleting the two alerts.

from kube-prometheus.

hameno commented on May 10, 2024

This also seems to be the case for https://github.com/rancher/rke deployments (at least it is happening on my dev cluster)

from kube-prometheus.

gianrubio commented on May 10, 2024

@domcar one way to avoid this issue is to have a flag to control if some dependencies from kube-prometheus will be deployed. Looking on alertmanager example on how it's possible to skip the installation of a dependency.

PR are always welcome :)

from kube-prometheus.

commented on May 10, 2024

I don't have any endpoints for kube controller manager and scheduler then how to monitor them using prometheus and prometheus operator.

Alerts are being triggered from the alert manager

from kube-prometheus.

bonovoxly commented on May 10, 2024

@ScottBrenner what's the best way to delete an alert using helm? Is it possible to cherry-pick out the alerts, or would I need to recreate them all (minus the non-working alerts for GKE)?

from kube-prometheus.

ScottBrenner commented on May 10, 2024

@bonovoxly Was using kube-prometheus, never touched Helm.

from kube-prometheus.

ne1000 commented on May 10, 2024

@domcar @ScottBrenner I also met the same issue, but in my case i have used binary packages to install the cluster , can you give me a piece of advice fix the issue?

from kube-prometheus.

phyllisstein commented on May 10, 2024

I ran into this issue with a cluster deployed through kops in AWS. The solution that worked for me was sitting in an old version of the repo: I had to deploy the services listed here to kube-system. With that done, the alerts went green.

Edit: N.B. that I think you can also generate the requisite files by adding (import 'kube-prometheus/kube-prometheus-kops.libsonnet') to your JSONnet config:

local kp =
  (import 'kube-prometheus/kube-prometheus.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-kops.libsonnet') +
  {
    _config+:: {
      namespace: 'monitoring',
  /* ...etc. */

from kube-prometheus.

commented on May 10, 2024

Same issue with aws eks

from kube-prometheus.

vrathore18 commented on May 10, 2024

I am facing the same issue. I don't have the Pods kube-scheduler or controller-manager. @domcar how did you fixed the issue??

P.S I used helm for installation. CLoud using: AWS

from kube-prometheus.

rpf3 commented on May 10, 2024

@chris530 I had to do something very similar to the service selectors; basically null out the component label and add k8s-app label to the selector for those two services.

from kube-prometheus.

flogfy commented on May 10, 2024

@chris530 how were you able to add these labels to the controller manager and the kube scheduler ? I don't even have the pods and services associated with neither kube-scheduler nor kube-controller-manager. My kubernetes is installed with RKE.

from kube-prometheus.

woody3549 commented on May 10, 2024

Hello,

I am currently using prometheus-stack version 20.0.1
Alerts KubeSchedulerDown and KubeControllerManagerDown are currently being raised for no apparent reason.
Is that also a label issues, please ?
How did you solve it ?

Thanks for your help.
Regards,

from kube-prometheus.

woody3549 commented on May 10, 2024

Hi @ferpizza,

Now I no longer receive alerts for KubeScheduler and KubeControllerManager.
Thanks.

However, a new KubeProxyDown alert now appears.
Can you please point me out what GKE exposes ?
I might have to disable it as well.

Cheers

from kube-prometheus.

ferpizza commented on May 10, 2024

Hello @woody3549,

I haven't found official documentation setting apart those k8s components that are exposed to end-users form the ones that are kept private for Google's management. You can make an assumption based on whether such component is key for ensuring GKE services.

kube-proxy is one of those components, being a critical piece in the networking of your cluster.

When I wrote my first comment I was on version 18.1.1 of the Kube Prometheus Stack helm chart, and that version did not include the kube-proxy alerts or scraper.

Since then I have updated to version 27.1.0, which includes the kube-proxy alert, and was confronted with the same issue regarding false positives.

We can solve this, and the two prior alerts, by adding the following lines to our Values file.

kubeControllerManager:
  enabled: false

kubeScheduler:
  enabled: false

kubeProxy:
  enabled: false

from kube-prometheus.

woody3549 commented on May 10, 2024

Hello,

Ok thanks. This makes sense and is very helpful.

Regards

from kube-prometheus.

Alerts firing: ControllerManager, Scheduler and TargetDown about kube-prometheus HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent