Giter Club home page Giter Club logo

master-dns-operator's Introduction

OpenShift Master DNS Operator

What is this?

This is an operator that manages etcd DNS entries for masters in an OpenShift cluster.

Why is it needed?

Part of the work of recovering a master that has been destroyed or restarted is updating the DNS entry that is used to identify it as a member of the etcd cluster. These are entries in the form of [cluster-name]-etcd-[index].[domain]. In AWS, these are records in the private Route53 zone that belongs to the cluster. Automatically updating these makes the job of recovering the master much simpler.

How does it do it?

The operator leverages external-dns to update DNS records in the cloud provider. It watches cluster-api Machine resources and obtains internal IPs from them. It then uses those IPs to create a custom resource with machine names and addresses.

master-dns-operator's People

Contributors

csrwng avatar enj avatar staebler avatar sttts avatar thrasher-redhat avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

master-dns-operator's Issues

coredns-monitor creates Corefile with 127.0.0.53 forwarder

In my OKD cluster the Corefile in Node master-1 is faulty. Instead of a cluster external DNS resolver it has 127.0.0.53 in the forward declaration.

I am running OKD 4.9.0 IPI on vSphere 6.7:

[root@localhost ocp-install]# oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.okd-2021-11-28-035710   True        False         45d     Cluster version is 4.9.0-0.okd-2021-11-28-035710

One of my customers has the very same symptom (wrong Corefile on master-1) in their cluster and experiences very high CPU load (~2.3 cores) for this exact pod with frequent "i/o timeout" messages in coredns container logs.

When manually correcting the Corefile by replacing 127.0.0.53 with an actual DNS resolver IP (in my case 10.1.0.1), these messages disappear and the cpu load normalized to 0.002 cores.

Related to okd-project/okd/issues/978.

master-1 (bad config)

Pod logs of master-1 coredns-monitor shows that its runtimecfg util is rendering a faulty Corefile with 127.0.0.53 in forward rule.

$ oc logs coredns-lab4-h9zq6-master-1 coredns-monitor
time="2022-01-12T12:59:19Z" level=info msg="Runtimecfg rendering template" path=/etc/coredns/Corefile
time="2022-01-12T13:08:20Z" level=info msg="Node change detected, rendering Corefile" Node Addresses="[{10.1.2.189 lab4-h9zq6-master-0 false} {10.1.2.190 lab4-h9zq6-master-1 false} {10.1.2.188 lab4-h9zq6-master-2 false} {10.1.2.205 lab4-h9zq6-worker-dlr5x false} {10.1.2.203 lab4-h9zq6-worker-k8dfd false} {10.1.2.207 lab4-h9zq6-worker-m5lqk false} {10.1.2.209 lab4-h9zq6-worker-v95g9 false}]"
time="2022-01-12T13:08:20Z" level=info msg=". {"
time="2022-01-12T13:08:20Z" level=info msg="    errors"
time="2022-01-12T13:08:20Z" level=info msg="    bufsize 512"
time="2022-01-12T13:08:20Z" level=info msg="    health :18080"
time="2022-01-12T13:08:20Z" level=info msg="    forward . 127.0.0.53 {"
time="2022-01-12T13:08:20Z" level=info msg="        policy sequential"
time="2022-01-12T13:08:20Z" level=info msg="    }"
time="2022-01-12T13:08:20Z" level=info msg="    cache 30"
time="2022-01-12T13:08:20Z" level=info msg="    reload"
time="2022-01-12T13:08:20Z" level=info msg="    template IN A lab4.company.corp {"
time="2022-01-12T13:08:20Z" level=info msg="        match .*.apps.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        answer \"{{ .Name }} 60 in {{ .Type }} 10.1.4.2\""
time="2022-01-12T13:08:20Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:20Z" level=info msg="    }"
time="2022-01-12T13:08:20Z" level=info msg="    template IN AAAA lab4.company.corp {"
time="2022-01-12T13:08:20Z" level=info msg="        match .*.apps.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:20Z" level=info msg="    }"
time="2022-01-12T13:08:20Z" level=info msg="    template IN A lab4.company.corp {"
time="2022-01-12T13:08:20Z" level=info msg="        match api.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        answer \"{{ .Name }} 60 in {{ .Type }} 10.1.4.1\""
time="2022-01-12T13:08:20Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:20Z" level=info msg="    }"
time="2022-01-12T13:08:20Z" level=info msg="    template IN AAAA lab4.company.corp {"
time="2022-01-12T13:08:20Z" level=info msg="        match api.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:20Z" level=info msg="    }"
time="2022-01-12T13:08:20Z" level=info msg="    template IN A lab4.company.corp {"
time="2022-01-12T13:08:20Z" level=info msg="        match api-int.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        answer \"{{ .Name }} 60 in {{ .Type }} 10.1.4.1\""
time="2022-01-12T13:08:20Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:20Z" level=info msg="    }"
time="2022-01-12T13:08:20Z" level=info msg="    template IN AAAA lab4.company.corp {"
time="2022-01-12T13:08:20Z" level=info msg="        match api-int.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:20Z" level=info msg="    }"
time="2022-01-12T13:08:20Z" level=info msg="    hosts {"
time="2022-01-12T13:08:20Z" level=info msg="        10.1.2.189 lab4-h9zq6-master-0 lab4-h9zq6-master-0.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        10.1.2.190 lab4-h9zq6-master-1 lab4-h9zq6-master-1.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        10.1.2.188 lab4-h9zq6-master-2 lab4-h9zq6-master-2.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        10.1.2.205 lab4-h9zq6-worker-dlr5x lab4-h9zq6-worker-dlr5x.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        10.1.2.203 lab4-h9zq6-worker-k8dfd lab4-h9zq6-worker-k8dfd.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        10.1.2.207 lab4-h9zq6-worker-m5lqk lab4-h9zq6-worker-m5lqk.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        10.1.2.209 lab4-h9zq6-worker-v95g9 lab4-h9zq6-worker-v95g9.lab4.company.corp"
time="2022-01-12T13:08:20Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:20Z" level=info msg="    }"
time="2022-01-12T13:08:20Z" level=info msg="}"

When I run the command of the coredns-monitor pod in the running container, it renders a correct configuration:

$ oc exec -it coredns-lab4-h9zq6-master-1 -c coredns-monitor -- bash
[root@lab4-h9zq6-master-1 /]# runtimecfg render --verbose /var/lib/kubelet/kubeconfig  --api-vip 10.1.4.1 --ingress-vip 10.1.4.2 /config --out-dir /tmp/test/
INFO[0000] . {
INFO[0000]     errors
INFO[0000]     bufsize 512
INFO[0000]     health :18080
INFO[0000]     forward . 10.1.0.1 {
INFO[0000]         policy sequential
INFO[0000]     }
INFO[0000]     cache 30
INFO[0000]     reload
INFO[0000]     template IN A lab4.company.corp {
INFO[0000]         match .*.apps.lab4.company.corp
INFO[0000]         answer "{{ .Name }} 60 in {{ .Type }} 10.1.4.2"
INFO[0000]         fallthrough
INFO[0000]     }
INFO[0000]     template IN AAAA lab4.company.corp {
INFO[0000]         match .*.apps.lab4.company.corp
INFO[0000]         fallthrough
INFO[0000]     }
INFO[0000]     template IN A lab4.company.corp {
INFO[0000]         match api.lab4.company.corp
INFO[0000]         answer "{{ .Name }} 60 in {{ .Type }} 10.1.4.1"
INFO[0000]         fallthrough
INFO[0000]     }
INFO[0000]     template IN AAAA lab4.company.corp {
INFO[0000]         match api.lab4.company.corp
INFO[0000]         fallthrough
INFO[0000]     }
INFO[0000]     template IN A lab4.company.corp {
INFO[0000]         match api-int.lab4.company.corp
INFO[0000]         answer "{{ .Name }} 60 in {{ .Type }} 10.1.4.1"
INFO[0000]         fallthrough
INFO[0000]     }
INFO[0000]     template IN AAAA lab4.company.corp {
INFO[0000]         match api-int.lab4.company.corp
INFO[0000]         fallthrough
INFO[0000]     }
INFO[0000]     hosts {
INFO[0000]         fallthrough
INFO[0000]     }
INFO[0000] }
INFO[0000]
INFO[0000] Runtimecfg rendering template                 path=/tmp/test/Corefile

master-0 (good config) (same for master-2)

For comparison, this is what the logs tell me for coredns-monitor on the other masters. The configuration looks good.

$ oc logs coredns-lab4-h9zq6-master-0 coredns-monitor
time="2022-01-12T12:59:43Z" level=info msg="Runtimecfg rendering template" path=/etc/coredns/Corefile
time="2022-01-12T13:08:43Z" level=info msg="Node change detected, rendering Corefile" Node Addresses="[{10.1.2.189 lab4-h9zq6-master-0 false} {10.1.2.190 lab4-h9zq6-master-1 false} {10.1.2.188 lab4-h9zq6-master-2 false} {10.1.2.205 lab4-h9zq6-worker-dlr5x false} {10.1.2.203 lab4-h9zq6-worker-k8dfd false} {10.1.2.207 lab4-h9zq6-worker-m5lqk false} {10.1.2.209 lab4-h9zq6-worker-v95g9 false}]"
time="2022-01-12T13:08:43Z" level=info msg=". {"
time="2022-01-12T13:08:43Z" level=info msg="    errors"
time="2022-01-12T13:08:43Z" level=info msg="    bufsize 512"
time="2022-01-12T13:08:43Z" level=info msg="    health :18080"
time="2022-01-12T13:08:43Z" level=info msg="    forward . 10.1.0.1 {"
time="2022-01-12T13:08:43Z" level=info msg="        policy sequential"
time="2022-01-12T13:08:43Z" level=info msg="    }"
time="2022-01-12T13:08:43Z" level=info msg="    cache 30"
time="2022-01-12T13:08:43Z" level=info msg="    reload"
time="2022-01-12T13:08:43Z" level=info msg="    template IN A company.corp {"
time="2022-01-12T13:08:43Z" level=info msg="        match .*.apps.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        answer \"{{ .Name }} 60 in {{ .Type }} 10.1.4.2\""
time="2022-01-12T13:08:43Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:43Z" level=info msg="    }"
time="2022-01-12T13:08:43Z" level=info msg="    template IN AAAA company.corp {"
time="2022-01-12T13:08:43Z" level=info msg="        match .*.apps.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:43Z" level=info msg="    }"
time="2022-01-12T13:08:43Z" level=info msg="    template IN A company.corp {"
time="2022-01-12T13:08:43Z" level=info msg="        match api.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        answer \"{{ .Name }} 60 in {{ .Type }} 10.1.4.1\""
time="2022-01-12T13:08:43Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:43Z" level=info msg="    }"
time="2022-01-12T13:08:43Z" level=info msg="    template IN AAAA company.corp {"
time="2022-01-12T13:08:43Z" level=info msg="        match api.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:43Z" level=info msg="    }"
time="2022-01-12T13:08:43Z" level=info msg="    template IN A company.corp {"
time="2022-01-12T13:08:43Z" level=info msg="        match api-int.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        answer \"{{ .Name }} 60 in {{ .Type }} 10.1.4.1\""
time="2022-01-12T13:08:43Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:43Z" level=info msg="    }"
time="2022-01-12T13:08:43Z" level=info msg="    template IN AAAA company.corp {"
time="2022-01-12T13:08:43Z" level=info msg="        match api-int.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:43Z" level=info msg="    }"
time="2022-01-12T13:08:43Z" level=info msg="    hosts {"
time="2022-01-12T13:08:43Z" level=info msg="        10.1.2.189 lab4-h9zq6-master-0 lab4-h9zq6-master-0.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        10.1.2.190 lab4-h9zq6-master-1 lab4-h9zq6-master-1.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        10.1.2.188 lab4-h9zq6-master-2 lab4-h9zq6-master-2.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        10.1.2.205 lab4-h9zq6-worker-dlr5x lab4-h9zq6-worker-dlr5x.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        10.1.2.203 lab4-h9zq6-worker-k8dfd lab4-h9zq6-worker-k8dfd.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        10.1.2.207 lab4-h9zq6-worker-m5lqk lab4-h9zq6-worker-m5lqk.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        10.1.2.209 lab4-h9zq6-worker-v95g9 lab4-h9zq6-worker-v95g9.company.corp"
time="2022-01-12T13:08:43Z" level=info msg="        fallthrough"
time="2022-01-12T13:08:43Z" level=info msg="    }"
time="2022-01-12T13:08:43Z" level=info msg="}"
time="2022-01-12T13:08:43Z" level=info
time="2022-01-12T13:08:43Z" level=info msg="Runtimecfg rendering template" path=/etc/coredns/Corefile

Assign a priority class to pods

Priority classes docs:
https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

Example: https://github.com/openshift/cluster-monitoring-operator/search?q=priority&unscoped_q=priority

Notes: The pre-configured system priority classes (system-node-critical and system-cluster-critical) can only be assigned to pods in kube-system or openshift-* namespaces. Most likely, core operators and their pods should be assigned system-cluster-critical. Please do not assign system-node-critical (the highest priority) unless you are really sure about it.

Missing OWNER file

This repo needs an OWNER file. It is unclear who to ping in urgent situations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.