kubernetes / dns Goto Github PK

Kubernetes DNS service

License: Apache License 2.0

Makefile 6.17% Shell 1.81% Go 88.96% Python 3.06%

dns's Introduction

Kubernetes (K8s)

Kubernetes, also known as K8s, is an open source system for managing containerized applications across multiple hosts. It provides basic mechanisms for the deployment, maintenance, and scaling of applications.

Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.

Kubernetes is hosted by the Cloud Native Computing Foundation (CNCF). If your company wants to help shape the evolution of technologies that are container-packaged, dynamically scheduled, and microservices-oriented, consider joining the CNCF. For details about who's involved and how Kubernetes plays a role, read the CNCF announcement.

To start using K8s

See our documentation on kubernetes.io.

Take a free course on Scalable Microservices with Kubernetes.

To use Kubernetes code as a library in other applications, see the list of published components. Use of the k8s.io/kubernetes module or k8s.io/kubernetes/... packages as libraries is not supported.

To start developing K8s

The community repository hosts all information about building Kubernetes from source, how to contribute code and documentation, who to contact about what, etc.

If you want to build Kubernetes right away there are two options:

You have a working Go environment.

git clone https://github.com/kubernetes/kubernetes
cd kubernetes
make

You have a working Docker environment.

git clone https://github.com/kubernetes/kubernetes
cd kubernetes
make quick-release

For the full story, head over to the developer's documentation.

Support

If you need support, start with the troubleshooting guide, and work your way through the process that we've outlined.

That said, if you have questions, reach out to us one way or another.

Community Meetings

The Calendar has the list of all the meetings in the Kubernetes community in a single location.

Adopters

The User Case Studies website has real-world use cases of organizations across industries that are deploying/migrating to Kubernetes.

Governance

Kubernetes project is governed by a framework of principles, values, policies and processes to help our community and constituents towards our shared goals.

The Kubernetes Community is the launching point for learning about how we organize ourselves.

The Kubernetes Steering community repo is used by the Kubernetes Steering Committee, which oversees governance of the Kubernetes project.

Roadmap

The Kubernetes Enhancements repo provides information about Kubernetes releases, as well as feature tracking and backlogs.

dns's People

Contributors

Stargazers

Watchers

Forkers

bowei mrhohn chentao1596 xialonglee tanshanshan huangdehui2013 zoues corlettb luxas johnbelamaric linux-on-ibm-z upmc-enterprises tonychenl liggitt cmluciano mrcrgl chrisohaver jbeda porridge aveshagarwal dimpavloff wojtek-t shashidharatd callmefoxie timchenxiaoyu mikeln r2d4 csbell inspireso clearbit-boneyard manuelfelipe cndpost alexbrand jkinkead ylallemant alok87 nordstrom fgimenez davaddi jianghaishanyue gemrails ixdy andyxning asac ghostcloud-cn jamiehannaford zhunzhun1988 guillaumerose jackingzhao feiskyer abvarun226 allencloud yaoshaojun gabrielsvinha crimsonfaith91 fisherxu mbssaiakhil yastij rajadeepan xuleicode hchenxa leblancd sun363587351 kerscher randmonkey lyq314 paas-lightning markjacksonfishing seemethere rramkumar1 sstar1314 spiffxp weiwei04 qdsang dmytrokyrychuk scottyjackson ashipilov grobie wu8685 bboreham troy0820 caitong93 biztrology-kd podnov ggaaooppeenngg lcfang fyang13 sak0 haiwangio thockin-tmp ijumps bearabby wrightrocket rajansandeep neujie teamwork langyenan riverzhang adohe-zz niuqg

dns's Issues

docker registry with a port breaks make script

The make script allows you to set a registry from the env; however, if it's something like foo:5000 it will fail with:

rules.mk:125: *** target pattern contains no %'. Stop.`

stale service A records returned by dnsmasq?

kubedns: gcr.io/google_containers/kubedns-amd64:1.8
dnsmasq: gcr.io/google_containers/kube-dnsmasq-amd64:1.4

A service resource say qa-svc1 was created and deleted after some time. the same qa-svc1, if recreated and got assigned a different ClusterIP, we are seeing kube-dns/Cluster-First dns-policy pods continue to see older ClusterIP on dns resolution of qa-svc1. I believe this is from the dnsmasq cache. Should there be a max-cache-ttl setting set on all dnsmasq cached records? or can kube-dns invalidate the cache in dnsmasq?

@bowei @thockin

How to contribute to dns ?

Documentation on how to contribute to kube-dns.

kube-dns should use versioned client-go

kube-dns should use a versioned copy of the client-go repository, rather than HEAD

kubedns 1.13.0 failed to start on v1.6.0-alpha.3 (s390x)

Just created a new v1.6.0-alpha.3 cluster on s390x, tried to deploy kubedns to the cluster, but failed. See below kubectl get pods and describe pods out put. Also attached my kubedns-controller.yaml and my shell script to start kubernetes cluster in zip file.
kubedns-failed-to-start-s390x.zip

root@test-k8s-16-alpha3:/etc/kubernetes/server/bin# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-3552530395-h7s8k 2/3 CrashLoopBackOff 3 1m
root@test-k8s-16-alpha3:/etc/kubernetes/server/bin# kubectl describe pods kube-dns-3552530395-h7s8k -n kube-system
Name: kube-dns-3552530395-h7s8k
Namespace: kube-system
Node: 127.0.0.1/127.0.0.1
Start Time: Sat, 18 Feb 2017 13:54:06 +0000
Labels: k8s-app=kube-dns
pod-template-hash=3552530395
Status: Running
IP: 172.17.0.2
Controllers: ReplicaSet/kube-dns-3552530395
Containers:
kubedns:
Container ID: docker://3a3462b3d280c271f141292961654a70fabf0d6ae199c31739f37b4d84c5cd67
Image: gcr.io/google_containers/k8s-dns-kube-dns-s390x:1.13.0
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-kube-dns-s390x@sha256:49a499ddc7e5ad4ef317cb7a136b033e64f55c191b511926151e344e31fc418a
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-dir=/kube-dns-config
--v=2
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Sat, 18 Feb 2017 13:54:50 +0000
Ready: False
Restart Count: 3
Liveness: http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Volume Mounts:
/kube-dns-config from kube-dns-config (rw)
Environment Variables from:
Environment Variables:
PROMETHEUS_PORT: 10055
dnsmasq:
Container ID: docker://480ed642a294e518821292cb8d74645035d317f59f200d23c5cae9aa9a6a0359
Image: gcr.io/google_containers/k8s-dns-dnsmasq-s390x:1.13.0
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-dnsmasq-s390x@sha256:1eb57c914d85af5a77a9af9632ad144106b3e12f68a8e8a734c5657c917753fd
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--server=/cluster.local/127.0.0.1#10053
--server=/in-addr.arpa/127.0.0.1#10053
--server=/ip6.arpa/127.0.0.1#10053
--log-facility=-
Requests:
cpu: 150m
memory: 10Mi
State: Running
Started: Sat, 18 Feb 2017 13:54:08 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
Environment Variables from:
Environment Variables:
sidecar:
Container ID: docker://90b09674ecc9ff3ba06701d0f5a12240cb7c685e4b8fb9430289199a22806173
Image: gcr.io/google_containers/k8s-dns-sidecar-s390x:1.13.0
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-sidecar-s390x@sha256:6b03af9d65be38542ff6df0c9a569e36e81aa9ee808dbef3a00b58d436455c02
Port: 10054/TCP
Args:
--v=2
--logtostderr
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
Requests:
cpu: 10m
memory: 20Mi
State: Running
Started: Sat, 18 Feb 2017 13:54:08 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
Environment Variables from:
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
kube-dns-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-dns
Optional: true
QoS Class: Burstable
Node-Selectors:
Tolerations: CriticalAddonsOnly=:Exists
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message

1m 1m 1 kubelet, 127.0.0.1 Normal SandboxReceived Pod sandbox received, it will be created.
1m 1m 1 default-scheduler Normal Scheduled Successfully assigned kube-dns-3552530395-h7s8k to 127.0.0.1
1m 1m 1 kubelet, 127.0.0.1 spec.containers{sidecar} Normal Created Created container with id 90b09674ecc9ff3ba06701d0f5a12240cb7c685e4b8fb9430289199a22806173
1m 1m 1 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Created Created container with id 78672547a5404160b7cb483dda74487fe8e32d9a43657577df99b77af8e83042
1m 1m 1 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Started Started container with id 78672547a5404160b7cb483dda74487fe8e32d9a43657577df99b77af8e83042
1m 1m 1 kubelet, 127.0.0.1 spec.containers{dnsmasq} Normal Pulled Container image "gcr.io/google_containers/k8s-dns-dnsmasq-s390x:1.13.0" already present on machine
1m 1m 1 kubelet, 127.0.0.1 spec.containers{dnsmasq} Normal Created Created container with id 480ed642a294e518821292cb8d74645035d317f59f200d23c5cae9aa9a6a0359
1m 1m 1 kubelet, 127.0.0.1 spec.containers{dnsmasq} Normal Started Started container with id 480ed642a294e518821292cb8d74645035d317f59f200d23c5cae9aa9a6a0359
1m 1m 1 kubelet, 127.0.0.1 spec.containers{sidecar} Normal Pulled Container image "gcr.io/google_containers/k8s-dns-sidecar-s390x:1.13.0" already present on machine
1m 1m 1 kubelet, 127.0.0.1 spec.containers{sidecar} Normal Started Started container with id 90b09674ecc9ff3ba06701d0f5a12240cb7c685e4b8fb9430289199a22806173
1m 1m 1 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Created Created container with id 85fa416009ee095b281e4bdacb633346e4483c79e8727061662c7aaf25befe0e
1m 1m 1 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Started Started container with id 85fa416009ee095b281e4bdacb633346e4483c79e8727061662c7aaf25befe0e
1m 1m 3 kubelet, 127.0.0.1 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 10s restarting failed container=kubedns pod=kube-dns-3552530395-h7s8k_kube-system(b6b4170e-f5e1-11e6-9192-fa163ee87680)"

1m 1m 1 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Created Created container with id 96ef011f27e7b00a8cdff6842228246fa29854de1f9c1a62ec0a3166cc8f1542
1m 1m 1 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Started Started container with id 96ef011f27e7b00a8cdff6842228246fa29854de1f9c1a62ec0a3166cc8f1542
59s 53s 2 kubelet, 127.0.0.1 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 20s restarting failed container=kubedns pod=kube-dns-3552530395-h7s8k_kube-system(b6b4170e-f5e1-11e6-9192-fa163ee87680)"

1m 40s 4 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Pulled Container image "gcr.io/google_containers/k8s-dns-kube-dns-s390x:1.13.0" already present on machine
40s 40s 1 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Created Created container with id 3a3462b3d280c271f141292961654a70fabf0d6ae199c31739f37b4d84c5cd67
39s 39s 1 kubelet, 127.0.0.1 spec.containers{kubedns} Normal Started Started container with id 3a3462b3d280c271f141292961654a70fabf0d6ae199c31739f37b4d84c5cd67
1m 6s 9 kubelet, 127.0.0.1 spec.containers{kubedns} Warning BackOff Back-off restarting failed container
38s 6s 4 kubelet, 127.0.0.1 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 40s restarting failed container=kubedns pod=kube-dns-3552530395-h7s8k_kube-system(b6b4170e-f5e1-11e6-9192-fa163ee87680)"

kubectl version
Client Version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.0-alpha.3", GitCommit:"5802799e56c7fcd1638e5848a13c5f3b0b1479ab", GitTreeState:"clean", BuildDate:"2017-02-16T19:27:36Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/s390x"}
Server Version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.0-alpha.3", GitCommit:"5802799e56c7fcd1638e5848a13c5f3b0b1479ab", GitTreeState:"clean", BuildDate:"2017-02-16T19:17:01Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/s390x"}

Are these peaks normal ?

We are seeing in newrelic high response time for kube-dns under transactions. Its around 4-5 seconds for lookup.

Please see below graph
How to debug and fix these kube dns issues. We are running around 8-10 pods of kube-dns with replication controller.

@bowei Added details below -
We are running kubernetes 1.4.5 in aws cloud. Kubernetes was installed using kube-up.sh in our production. That time we did not have expertise on using kops private cluster. It uses aws routing as networking. Issue happens only when the rpm is high and when we switch traffic to our newly deployed pods(its a blue green deployment)

Move kube-dns code over from kubernetes main repo

Need to move the kube-dns files here. See the community proposal:

kubernetes/community#172

nslookup kubernetes.default is stuck whereas nslookup kubernetes works fine

Hi,

I am facing weird issues Kubernetes 1.5.2

Its a single 2 nodes cluster which is setup using "https://kubernetes.io/docs/getting-started-guides/centos/centos_manual_config/"

then i created skydns pods.

[root@masterhost ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-v20-jm7x7 3/3 Running 0 2h
kubernetes-dashboard-255567031-7dn4l 1/1 Running 44 11d

[root@mwhlvatd2d2 ~]# kubectl get services -n kube-system
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns 10.254.0.10 53/UDP,53/TCP 2h
kubelet None 10250/TCP 8d
kubernetes-dashboard 10.254.25.228 80:31000/TCP 24d

I created one busybox pod and tried to do nslookup for kubernetes service as follows

It worked fine.
[root@masterhost ~]# kubectl exec -ti busybox1 -- nslookup kubernetes
Server: 10.254.0.10
Address 1: 10.254.0.10 kube-dns.kube-system.svc.cluster.local

Name: kubernetes
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local

Then i did nslookup for kubernetes.default
[root@masterhost ~]# kubectl exec -ti busybox1 -- nslookup kubernetes.default
Server: 10.254.0.10
Address 1: 10.254.0.10 kube-dns.kube-system.svc.cluster.local

---- Here it got stuck for some time and then returned following error ;

nslookup: can't resolve 'kubernetes.default'

Expected result

[root@masterhost ~]# kubectl exec -ti busybox1 -- nslookup kubernetes.default.svc.cluster.local
Server: 10.254.0.10
Address 1: 10.254.0.10 kube-dns.kube-system.svc.cluster.local

Name: kubernetes.default.svc.cluster.local
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local

Any help here is appreciated.

build/dep.sh and Godeps does not properly import some dependencies

There are 3 dependencies that are not pulled in properly with Godep. I can only get a successful build by adding the dependent libraries to build/dep.sh required packages.

REQUIRED_PKGS="cloud.google.com/go/internal/... golang.org/x/text/internal/... ./pkg/... ./cmd/..."

Packages not pulled in:
github.com/googleapis/gax-go
golang.org/x/text/internal

Steps to reproduce:

Enter container ./build/dep.sh enter -u
Pull latest of a given DEP.
Remove the Godep and vendor folders
Run ./build/dep.sh save
Exit container assuming no complaints from Godep
make build

root@ubuntu-xenial:/vagrant# make build
building : bin/amd64/e2e
vendor/cloud.google.com/go/internal/retry.go:21:2: cannot find package "github.com/googleapis/gax-go" in any of:
	/go/src/k8s.io/dns/vendor/github.com/googleapis/gax-go (vendor tree)
	/usr/local/go/src/github.com/googleapis/gax-go (from $GOROOT)
	/go/src/github.com/googleapis/gax-go (from $GOPATH)
test/e2e/kubedns/kubedns.go:23:2: cannot find package "github.com/onsi/ginkgo" in any of:
	/go/src/k8s.io/dns/vendor/github.com/onsi/ginkgo (vendor tree)
	/usr/local/go/src/github.com/onsi/ginkgo (from $GOROOT)
	/go/src/github.com/onsi/ginkgo (from $GOPATH)
vendor/golang.org/x/text/cases/map.go:16:2: cannot find package "golang.org/x/text/internal" in any of:
	/go/src/k8s.io/dns/vendor/golang.org/x/text/internal (vendor tree)
	/usr/local/go/src/golang.org/x/text/internal (from $GOROOT)
	/go/src/golang.org/x/text/internal (from $GOPATH)

Current dnsmasq metrics seems off the hook.

I have started creating a dashboard for the new dnsmasq metrics and I would like to understand a bit more the different values I am getting.

The misses and insertions are almost the same, while the hits seem very high. I have a 3 node cluster with only the kube-system namespace provisioned.

kube-dns never resolves if a domain returns NOERROR with 0 answer records once

tl;dr If a nameserver replies status=NOERROR with no answer section to a DNS A question, kube-dns always caches this result. If the domain name actually gets an A record after it's queried through kube-dns, it never (I waited a few days) resolves from the pods, but does resolve outside the container (e.g. on my laptop) just fine.

Repro steps

Prerequisites

Have a domain name alp.im and the nameservers are pointed to CloudFlare.
Have nslookup/dig installed on your workstation.
Have a minikube cluster ready on your workstation
- running kubernetes v1.6.0
- kube-dns comes by default, running gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1

Step 1: Domain does not exist, query from your laptop

Note ANSWER: 0, and status: NOERROR

$ dig A z.alp.im

; <<>> DiG 9.8.3-P1 <<>> A z.alp.im
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64978
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;z.alp.im.			IN	A

;; AUTHORITY SECTION:
alp.im.			1799	IN	SOA	ivan.ns.cloudflare.com. dns.cloudflare.com. 2025042470 10000 2400 604800 3600

;; Query time: 196 msec
;; SERVER: 2401:fa00:fa::1#53(2401:fa00:fa::1)
;; WHEN: Thu Jun 29 10:51:35 2017
;; MSG SIZE  rcvd: 99

Step 2: Domain does not exist, query from Pod on Kubernetes

Start a toolbelt/dig container with shell and run the same query:

⚠️ Do not exit this container as you will reuse it later.

Note the response is the same, ANSWER: 0 and NOERROR.

$ kubectl run -i -t --rm --image=toolbelt/dig dig --command -- sh
If you don't see a command prompt, try pressing enter.
/ # dig A z.alp.im

; <<>> DiG 9.11.1-P1 <<>> A z.alp.im
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11209
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;z.alp.im.			IN	A

;; AUTHORITY SECTION:
alp.im.			1724	IN	SOA	ivan.ns.cloudflare.com. dns.cloudflare.com. 2025042470 10000 2400 604800 3600

;; Query time: 74 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Thu Jun 29 17:55:46 UTC 2017
;; MSG SIZE  rcvd: 99

(Also note that SERVER: 10.0.0.10#53 which is kube-dns.)

Step 3: Create an A record for the domain

Here I use CloudFlare as it manages my DNS.

Step 4: Test DNS record from your laptop

Run dig on your laptop (note ;; ANSWER SECTION: and 8.8.8.8 answer):

$ dig A z.alp.im

; <<>> DiG 9.8.3-P1 <<>> A z.alp.im
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37570
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;z.alp.im.			IN	A

;; ANSWER SECTION:
z.alp.im.		299	IN	A	8.8.8.8

;; Query time: 196 msec
;; SERVER: 2401:fa00:fa::1#53(2401:fa00:fa::1)
;; WHEN: Thu Jun 29 10:54:44 2017
;; MSG SIZE  rcvd: 53

Step 5: Test DNS record from Pod on Kubernetes

Run the same command again:

/ # dig A z.alp.im

; <<>> DiG 9.11.1-P1 <<>> A z.alp.im
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45420
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;z.alp.im.			IN	A

;; Query time: 0 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Thu Jun 29 18:00:24 UTC 2017
;; MSG SIZE  rcvd: 37

Note the diff:

still ANSWER: 0 and status: NOERROR (but it resolves just fine outside the cluster)
;; AUTHORITY SECTION: disappeared and AUTHORITY: changed to 0 from the previous time we ran this.
;; Query time: 0 msec (was 79 ms) –I assume it means it's just a cached response.
- Query time stays as 0 ms no matter how many times I run the same command.

What else I tried

Try it on GKE: I tried with k8s v1.5.x and v1.6.4. → Same issue. (cc: @bowei)
Query from a different pod on minikube: I started a new Pod and queried from there → Same issue.

Restart kube-dns Pod → This worked on GKE, but not on minikube.

$ kubectl delete pods -n kube-system -l k8s-app=kube-dns
pod "kube-dns-268032401-69xk5" deleted

Impact

I am not sure why this has not been discovered before. I noticed this behavior while using kube-lego on GKE. Once kube-lego applies for a TLS certificate, it polls the domain name of the service (e.g. example.com/.well-known/<token>) before asking Let's Encrypt to validate it. Before I create an Ingress with kube-lego annotation, I don't have the external IP yet so I can't configure the domain, but the kube-lego Pod already picks it up and starts querying my domain in an infinite loop. It never succeeds because first time it looked up the hostname, the A record didn't exist, so that result is cached forever. After I add A record, it still can't resolve. The moment I delete kube-dns Pods and they get recreated, it immediately starts working and resolves the hostname and completes the kube-lego challenge.

dnsmasq --server flags causing upstream name server calls to hang and time out sporadically

Hello. I had a problem recently where external name resolution, E.g., www.google.com, would not resolve. From the perspective of the client (nslookup) the call would just time out. From looking at the dnsmasq logs, it appears the calls were hanging. Every now and then… they would work… and cache. Once the TTL expired… they would fail… and then sporadically work and cache again.

I validated that calls directly to the upstream nameserver worked fine, E.g., nslookup www.google.com <NAME_SERVER_IP>. If fact, they never failed. I tested this from inside a pod, and on the host where the upstream name servers are configured in ~/etc/resolv.conf. I then tried configuring the name servers using the new ConfigMap. Same result.

Odd thing was, this only happened in the 1.6.x release of K8s. Not the 1.5.4 release that we are currently using.

I looked at the dnsmasq flags in the 1.6.x release and noticed there were three --server flags:

--server=/cluster.local/127.0.0.1#10053
--server=/in-addr.arpa/127.0.0.1#10053
--server=/in6.arpa/127.0.0.1#10053

As opposed to s single server flag in the 1.5.4 release, that just references the kubedns address.

--server=127.0.0.1#10053

I removed the three --server flags, and replaced them with the single --server flag which simply referenced the kubedns ... and suddenly everything worked!

I do not understand what those three server flags are doing differently from the single one (stubDoamins?) but they DID NOT WORK when trying to resolve names outside the cluster.

Again the workaround was to just remove them and replace them with one server flag referencing kubedns.

My cluster was created using kops 1.6.0-alpha2. Tagging @justinsb

Let me know if you need any other information.

Thanks.
M

UPDATE: Also, cluster name lookups ALWAYS worked. Regardless of number of dots. E.g., kubernetes, or kubernetes.default, or kubernetes.detault.svc, etc. No issues there.

stubDomains problem - upstream server is tcp only thanks to aws elb

My configMap, for configuring kube-dns, is loading, with a stubDomain. Because I am working in EC2 I have to use an ELB that does not support UDP, but only support tcp.

Layout

k8s 1.6.1 Two clusters
one in us-east-1, and one in us-west-2
separate dnsmasq server is exposing internal kube-dns on and ELB
each cluster has a "dig" container pod setup for testing
clusters are setup with different domains based on the region

Logs

I0504 00:03:32.944847       1 sync.go:167] Updated stubDomains to map[us-east-1-cluster.local:[52.20.35.216]]
I0504 00:03:32.945039       1 nanny.go:186] Restarting dnsmasq with new configuration
I0504 00:03:32.945063       1 nanny.go:135] Killing dnsmasq
I0504 00:03:32.945092       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/us-west-2-cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/in6.arpa/127.0.0.1#10053 --server /us-east-1-cluster.local/52.20.35.216]
I0504 00:03:32.945491       1 nanny.go:111]
W0504 00:03:32.945534       1 nanny.go:112] Got EOF from stderr
I0504 00:03:33.082404       1 nanny.go:111]
W0504 00:03:33.082500       1 nanny.go:112] Got EOF from stdout
I0504 00:03:33.082561       1 nanny.go:108] dnsmasq[11]: started, version 2.76 cachesize 1000
I0504 00:03:33.082626       1 nanny.go:108] dnsmasq[11]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0504 00:03:33.082662       1 nanny.go:108] dnsmasq[11]: using nameserver 52.20.35.216#53 for domain us-east-1-cluster.local
I0504 00:03:33.082699       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in6.arpa
I0504 00:03:33.082725       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0504 00:03:33.082749       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain us-west-2-cluster.local
I0504 00:03:33.082822       1 nanny.go:108] dnsmasq[11]: reading /etc/resolv.conf
I0504 00:03:33.082851       1 nanny.go:108] dnsmasq[11]: using nameserver 52.20.35.216#53 for domain us-east-1-cluster.local
I0504 00:03:33.082885       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in6.arpa
I0504 00:03:33.082910       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0504 00:03:33.082958       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain us-west-2-cluster.local
I0504 00:03:33.082985       1 nanny.go:108] dnsmasq[11]: using nameserver 172.60.0.205#53
I0504 00:03:33.083012       1 nanny.go:108] dnsmasq[11]: using nameserver 172.60.0.2#53
I0504 00:03:33.083073       1 nanny.go:108] dnsmasq[11]: read /etc/hosts - 7 addresses

Diagnostics

login to the dig pod in west
dig us-east-1-cluster.local - fails
dig +tcp @52.20.35.216 elb - success (52.20.35.216 is an elb / LoadBalancer service in east fronting dnsmasq)

Ideas?

kubedns dnsmasq pod fails with blank IP for the DNS host

Can't get a cluster started using kubeadm on v1.6.0-beta1 (kube-dns v1.14.1). The dnsmasq pod fails with:

I0308 16:30:25.353537       1 nanny.go:108] dnsmasq[11]: bad address at /etc/hosts line 7
I0308 16:30:25.353540       1 nanny.go:108] dnsmasq[11]: read /etc/hosts - 6 addresses

/etc/hosts contains:

127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
	kube-dns-1630391940-grt67

So presumably it can't get the information it needs from the ConfigMap?

Add Vagrantfile to ease development on DNS

I have a Vagrantfile that sets up a Ubuntu 16.04 box to build/test this repository. I've been using it since this repository contains scripts that don't play well on Darwin.

I can open a PR to add this to the root path if we think that it would be beneficial.

fatal error: concurrent map writes

version

1.14.1

fatal output

fatal error: concurrent map writes

goroutine 69 [running]:
runtime.throw(0x162294a, 0x15)
	/usr/local/go/src/runtime/panic.go:566 +0x95 fp=0xc4206335c8 sp=0xc4206335a8
runtime.mapassign1(0x1465c20, 0xc4202ada10, 0xc420633820, 0xc4206337d8)
	/usr/local/go/src/runtime/hashmap.go:458 +0x8ef fp=0xc4206336b0 sp=0xc4206335c8
k8s.io/dns/pkg/dns.(*KubeDNS).generateRecordsForHeadlessService(0xc42018c100, 0xc420675798, 0xc420044cb0, 0x0, 0x0)
	/go/src/k8s.io/dns/pkg/dns/dns.go:504 +0x860 fp=0xc420633850 sp=0xc4206336b0
k8s.io/dns/pkg/dns.(*KubeDNS).addDNSUsingEndpoints(0xc42018c100, 0xc420675798, 0xc420122a50, 0xc420122a48)
	/go/src/k8s.io/dns/pkg/dns/dns.go:420 +0xc2 fp=0xc420633890 sp=0xc420633850
k8s.io/dns/pkg/dns.(*KubeDNS).handleEndpointAdd(0xc42018c100, 0x15e9060, 0xc420675798)
	/go/src/k8s.io/dns/pkg/dns/dns.go:319 +0x52 fp=0xc4206338c0 sp=0xc420633890
k8s.io/dns/pkg/dns.(*KubeDNS).handleEndpointUpdate(0xc42018c100, 0x15e9060, 0xc420675798, 0x15e9060, 0xc420675798)
	/go/src/k8s.io/dns/pkg/dns/dns.go:382 +0x4f8 fp=0xc420633b28 sp=0xc4206338c0
k8s.io/dns/pkg/dns.(*KubeDNS).(k8s.io/dns/pkg/dns.handleEndpointUpdate)-fm(0x15e9060, 0xc420675798, 0x15e9060, 0xc420675798)
	/go/src/k8s.io/dns/pkg/dns/dns.go:246 +0x52 fp=0xc420633b60 sp=0xc420633b28
k8s.io/dns/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(0xc4204ad310, 0xc4204ad320, 0xc4204ad330, 0x15e9060, 0xc420675798, 0x15e9060, 0xc420675798)
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/tools/cache/controller.go:180 +0x5d fp=0xc420633b90 sp=0xc420633b60
k8s.io/dns/vendor/k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate(0xc4203db280, 0x15e9060, 0xc420675798, 0x15e9060, 0xc420675798)
	<autogenerated>:51 +0x8c fp=0xc420633bd8 sp=0xc420633b90
k8s.io/dns/vendor/k8s.io/client-go/tools/cache.NewInformer.func1(0x148db60, 0xc4201a92a0, 0xc4201a92a0, 0x148db60)
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/tools/cache/controller.go:246 +0x335 fp=0xc420633ca8 sp=0xc420633bd8
k8s.io/dns/vendor/k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop(0xc420132370, 0xc4202adb60, 0x0, 0x0, 0x0, 0x0)
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:420 +0x22a fp=0xc420633d80 sp=0xc420633ca8
k8s.io/dns/vendor/k8s.io/client-go/tools/cache.(*Controller).processLoop(0xc420462150)
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/tools/cache/controller.go:131 +0x3c fp=0xc420633dc0 sp=0xc420633d80
k8s.io/dns/vendor/k8s.io/client-go/tools/cache.(*Controller).(k8s.io/dns/vendor/k8s.io/client-go/tools/cache.processLoop)-fm()
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/tools/cache/controller.go:102 +0x2a fp=0xc420633dd8 sp=0xc420633dc0
k8s.io/dns/vendor/k8s.io/client-go/pkg/util/wait.JitterUntil.func1(0xc420633f70)
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/pkg/util/wait/wait.go:96 +0x5e fp=0xc420633e10 sp=0xc420633dd8
k8s.io/dns/vendor/k8s.io/client-go/pkg/util/wait.JitterUntil(0xc420633f70, 0x3b9aca00, 0x0, 0x13e8501, 0xc4203f02a0)
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/pkg/util/wait/wait.go:97 +0xad fp=0xc420633ed8 sp=0xc420633e10
k8s.io/dns/vendor/k8s.io/client-go/pkg/util/wait.Until(0xc420633f70, 0x3b9aca00, 0xc4203f02a0)
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/pkg/util/wait/wait.go:52 +0x4d fp=0xc420633f10 sp=0xc420633ed8
k8s.io/dns/vendor/k8s.io/client-go/tools/cache.(*Controller).Run(0xc420462150, 0xc4203f02a0)
	/go/src/k8s.io/dns/vendor/k8s.io/client-go/tools/cache/controller.go:102 +0x1af fp=0xc420633f90 sp=0xc420633f10
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc420633f98 sp=0xc420633f90
created by k8s.io/dns/pkg/dns.(*KubeDNS).Start
	/go/src/k8s.io/dns/pkg/dns/dns.go:148 +0x9f

cause

func (kd *KubeDNS) generateRecordsForHeadlessService(e *v1.Endpoints, svc *v1.Service) error {
	subCache := treecache.NewTreeCache()
	glog.V(4).Infof("Endpoints Annotations: %v", e.Annotations)
	for idx := range e.Subsets {
		for subIdx := range e.Subsets[idx].Addresses {
			address := &e.Subsets[idx].Addresses[subIdx]
			endpointIP := address.IP
			recordValue, endpointName := util.GetSkyMsg(endpointIP, 0)
			if hostLabel, exists := getHostname(address); exists {
				endpointName = hostLabel
			}
			subCache.SetEntry(endpointName, recordValue, kd.fqdn(svc, endpointName))
			for portIdx := range e.Subsets[idx].Ports {
				endpointPort := &e.Subsets[idx].Ports[portIdx]
				if endpointPort.Name != "" && endpointPort.Protocol != "" {
					srvValue := kd.generateSRVRecordValue(svc, int(endpointPort.Port), endpointName)
					glog.V(2).Infof("Added SRV record %+v", srvValue)

					l := []string{"_" + strings.ToLower(string(endpointPort.Protocol)), "_" + endpointPort.Name}
					subCache.SetEntry(endpointName, srvValue, kd.fqdn(svc, append(l, endpointName)...), l...)
				}
			}

			// Generate PTR records only for Named Headless service.
			if _, has := getHostname(address); has {
				reverseRecord, _ := util.GetSkyMsg(kd.fqdn(svc, endpointName), 0)
				kd.reverseRecordMap[endpointIP] = reverseRecord // concurrent map writes
			}
		}
	}
	subCachePath := append(kd.domainPath, serviceSubdomain, svc.Namespace)
	kd.cacheLock.Lock()
	defer kd.cacheLock.Unlock()
	kd.cache.SetSubCache(svc.Name, subCache, subCachePath...)
	return nil
}

[Question] Documentation on contributing to dns

Is there any document telling how to setup kube dns locally for contribution and debugging ?

dns diagnosis tool proposal: test

Hello everyone, this is a first proposal for implementing the idea suggested in kubernetes/kubernetes#45934, describing the tests to be implemented by the tool. I probably will miss things to check, any ideas are more than welcome! :) I'm a newbie on the internals of kubernetes' DNS, but very keen to learn, please do not hesitate to correct me if I propose silly things :)

Tests to be run

Related to endpoints
- Number of endpoints
- Replicas behind each endpoint
- Restarts of each replica
Related to lookups
- In-cluster
  - For each service (without port associated) and endpoint query for A record and measure latency and dropped packages
  - For each service (with port associated) and endpoint query for SRV record and measure latency and dropped packages
  - For each service and endpoint do a reverse IP adress lookup and measure latency and dropped packages
- Out-of-cluster
  - For each domain name in a predetermined external set and endpoint measure latency and dropped packages

Questions:

In order to make sure that SRV records are checked, should specific services with named ports be created and removed after the test is done? In general, if any of the items to be checked is not present, should the tool setup the check conditions and teardown them after the checks finish?
Should pod's A records be checked?

So, what do you thing? Should any of these be done differently, or totally removed? Are there any relevant checks missing?

Cheers,

Unable to build Kubernetes DNS

Hello,
Currently, I receive the following errors while attempting to run the make build target:
...
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.6/main: temporary error (try again later)
WARNING: Ignoring APKINDEX.84815163.tar.gz: No such file or directory
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
v3.6.1-61-gc32140e9a2 [http://dl-cdn.alpinelinux.org/alpine/v3.6/community]
1 errors; 2892 distinct packages available
make[3]: *** [_output/amd64/dnsmasq] Error 1
make[3]: Leaving directory /tmp/dns/images/dnsmasq' make[2]: *** [build-dnsmasq] Error 2 make[2]: Leaving directory /tmp/dns/images'
make[1]: *** [build] Error 2
make[1]: Leaving directory `/tmp/dns/images'
make: *** [images-build] Error 2

Does anyone have any idea how to resolve this problem? Thank you for your help!

images for s390x

When will the dns images for s390x ready? Along with kubentetes 1.6.0 release?

use flannel and add-on kube-dns， but kube-dns don't work

####here is my procedure of setup kube cluster：

system env:
- HypriotOS (Debian GNU/Linux 8)
- use wifi iface wlan0
  - kube version:
On master
- kubeadm init --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address 192.168.31.199
- kubectl create -f kube-flannel-rbac.yml
- kubectl create --namespace kube-system -f kube-flannel.yml
  - specially node use wifi for connected, so i modify the kube-flannel.yml configure params as below：

image: quay.io/coreos/flannel:v0.7.0-arm
command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr", "--iface=wlan0"]

On slaves
- kubeadm join --token 8bbadd.3a118002a3e82964 192.168.31.199:6443
On each node (include master)
- use the flannel subnet.env /run/flannel/subnet.env config to restart dockerd, a example as below :

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.2.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

*  so dockerd is run as : /usr/bin/dockerd --bip=10.244.2.1/24 --ip-masq=false --mtu=1450 --storage-driver overlay -H fd://

question

upon the above screenshots, you will notice that the kube-dns at the colunm 'ready' is always 2/3,

and kube-dns pod include 3 container： kubedns , dnsmaqs ,carside

there is logs about the kubedns container:

so, the kube-dns pod just keep restart over and over again。

I have been struggling wiht this problem for a couple day........

is that any mistake I have make lead to this problem?
can anyone have the same issue?

Add support for TXT entries

gRPC has a new proposal out for storing service configurations in TXT records:

grpc/proposal#5

If you use gRPC under Kubernetes, you will now have two choices:

use parallel DNS hierarchies, one managed by kube-dns, one by a more traditional DNS server just for the TXT records
extend kube-dns to support custom TXT records

I would prefer the second, by far. This could be accomplished through either an annotation or a new field on the service. Since I have some experience with kube-dns, having implemented ExternalName support, I can help with some/all the code, too. Of course, gRPC is just going to be one use case for the feature.

This seems to be related to, but still independent from kubernetes/kubernetes#6437

Publish metrics of cached entries

It would be helpful if we can get dnsmasq cached entries number from its metrics endpoint.

If this metric is already published, I apologize and please let me know the name of it.

Thank you.

Kube DNS Latency

We have dns pods running in our cluster (cluster details below)
Issue is every 2-3 requests out of 5 is having a latency of 5 seconds because of the dns.

root@my-nginx-858393261-m3bnl:/# time curl http://myservice.central:8080/status
{
  "host": "myservice-3af719a-805113283-x35p1",
  "status": "OK"
}

real	0m5.523s
user	0m0.004s
sys	0m0.000s
root@my-nginx-858393261-m3bnl:/# time curl http://myservice.central:8080/status
{
  "host": "myservice-3af719a-805113283-x35p1",
  "status": "OK"
}

real	0m0.013s
user	0m0.000s
sys	0m0.004s

Cluster details: We are running Kubernetes latest version 1.6.4 installed using kops. Its mutli AZ cluster in aws.

Below are the kube dns details

kubedns: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
dnsmaq: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
sidecar: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1

Our Kube dns is running with below requests

cpu	:	200m	
memory	:	70Mi

Please let us know the issue and how to fix this

nslookup: can't resolve 'kubernetes.default'

Hello I hope this is the right place to post my issue. Forgive me if this isnt and please redirect me to the right place.

I am trying to install a cluster with one master (server-1) and one minion (server-2) running on ubuntu and using flannel for networking and using kubeadm to install master and minion. And I am trying to run the dashboard from the minion server-2 as discussed here. I am very new to kubernetes and not an expert on linux networking setup, so any help would be appreciated. Dashboard is not working and after some investigation seems to be a DNS issue.

kubectl and kubeadm : 1.6.6
Docker: 17.03.1-ce

My DNS service is up and exposing endpoints

ubuntu@server-1:~$ kubectl get svc --all-namespaces
NAMESPACE     NAME                   CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       kubernetes             10.96.0.1       <none>        443/TCP         20h
kube-system   kube-dns               10.96.0.10      <none>        53/UDP,53/TCP   20h
kube-system   kubernetes-dashboard   10.97.135.242   <none>        80/TCP          3h

ubuntu@server-1:~$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                     AGE
kube-dns   10.244.0.4:53,10.244.0.4:53   17h

I created a busy-box pod and when I do a nslookup from it I got the following errors. Note that the command hang for some time before returning the error.

ubuntu@server-1:~$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'

ubuntu@server-1:~$ kubectl exec -ti busybox -- nslookup kubernetes.local
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.local'

ubuntu@server-1:~$ kubectl exec -ti busybox -- nslookup kubernetes
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes'

ubuntu@server-1:~$ kubectl exec -ti busybox -- nslookup 10.96.0.1
Server:    10.96.0.10
Address 1: 10.96.0.10

Name:      10.96.0.1
Address 1: 10.96.0.1

Resolv.conf seems properly configured

ubuntu@server-1:~$ kubectl exec busybox cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local local
options ndots:5

DNS pod is running

ubuntu@server-1:~$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY     STATUS    RESTARTS   AGE
kube-dns-692378583-5zj21   3/3       Running   0          17h

Here is iptables from server 1

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-ISOLATION  all  --  anywhere             anywhere            
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             10.103.141.154       /* kube-system/kubernetes-dashboard: has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable

here are iptables from server-2

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-ISOLATION  all  --  anywhere             anywhere            
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             10.103.141.154       /* kube-system/kubernetes-dashboard: has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable

Extending CI support to s390x

I need to setup Kube-DNS CI setup for s390x. I am looking for some guidelines so that I can setup Kube-DNS CI setup for s390x.
What are the prerequisites and hardware configuration needed to do this job.

kubedns panic "fatal error: concurrent map writes"

See linked issue for stacktraces:
kubernetes/kubernetes#45593

For my case my cluster is very quiet, it's just a test cluster with no other users, my kubedns is a bit overscaled atm, however over two days I'm seeing several crashing (I've maybe applied half a dozen services a few times)

kube-dns-1251125892-0172d                                       3/3       Running   4          2d
kube-dns-1251125892-bmbm7                                       3/3       Running   1          2d
kube-dns-1251125892-ns3qg                                       3/3       Running   4          2d
kube-dns-1251125892-xrm3p                                       3/3       Running   7          2d

No PTR records generated for headless service if hostname was not set.

If there is no hostname entry on an endpoint of a headless service, a DNS record is created with a hostname part based on some sort of hash, ultimately derived from the pod ip:
https://github.com/kubernetes/dns/blob/master/pkg/dns/dns.go#L486

The PTR record is then explicitly not generated:
https://github.com/kubernetes/dns/blob/master/pkg/dns/dns.go#L503

The spec states:

Given a ready endpoint with hostname of <hostname> and IP address <a>.<b>.<c>.<d>, a PTR record of the following form must exist.

hostname is defined earlier as the value of the hostname field on the endpoint or a "unique, system-assigned identifier", which in this case is the hash generated on line 486.

I would expect either of the following:

A PTR records is always generated, dropping the conditional on line 503, satisfying the spec.
No DNS records are generated in these cases and the spec is updated to reflect this behaviour.

k8s-dns-dnsmasq-nanny-arm:1.14.3 broken for 'arm'

/ # uname -a
Linux 26e00a3161d1 4.9.20-std-1 #1 SMP Wed Apr 5 15:38:34 UTC 2017 armv7l GNU/Linux

# docker run -it   --entrypoint=/bin/sh gcr.io/google_containers/k8s-dns-dnsmasq-nanny-arm:1.14.3
/ # /dnsmasq-nanny 
F0622 23:21:33.323491       7 nanny.go:173] Could not start dnsmasq with initial configuration: fork/exec /usr/sbin/dnsmasq: no such file or directory
goroutine 1 [running]:
k8s.io/dns/vendor/github.com/golang/glog.stacks(0x131b000, 0x0, 0x97, 0xcb)
	/go/src/k8s.io/dns/vendor/github.com/golang/glog/glog.go:769 +0x84
k8s.io/dns/vendor/github.com/golang/glog.(*loggingT).output(0x130a968, 0x3, 0x11698dc0, 0x12a53d1, 0x8, 0xad, 0x0)
	/go/src/k8s.io/dns/vendor/github.com/golang/glog/glog.go:720 +0x2f8
k8s.io/dns/vendor/github.com/golang/glog.(*loggingT).printf(0x130a968, 0x3, 0xcb7f76, 0x36, 0x1171bee4, 0x1, 0x1)
	/go/src/k8s.io/dns/vendor/github.com/golang/glog/glog.go:655 +0x10c
k8s.io/dns/vendor/github.com/golang/glog.Fatalf(0xcb7f76, 0x36, 0x1171bee4, 0x1, 0x1)
	/go/src/k8s.io/dns/vendor/github.com/golang/glog/glog.go:1148 +0x4c
k8s.io/dns/pkg/dnsmasq.RunNanny(0x12c4560, 0x118d12a0, 0xc90bb1, 0x11, 0x131b138, 0x0, 0x0, 0x0)
	/go/src/k8s.io/dns/pkg/dnsmasq/nanny.go:173 +0x1dc
main.main()
	/go/src/k8s.io/dns/cmd/dnsmasq-nanny/main.go:80 +0x21c

Noisy Logs in sidecar 1.14.2 logs

seems to be missing the fix applied in dns.go

ERROR: logging before flag.Parse: I0525 13:43:00.569730       1 main.go:48] Version v1.14.1-16-gff416ee
ERROR: logging before flag.Parse: I0525 13:43:00.570013       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
ERROR: logging before flag.Parse: I0525 13:43:00.570090       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
ERROR: logging before flag.Parse: I0525 13:43:00.570258       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}

Failed to create a kubernetes client

Hello, this looks like the new home for kube-dns, so I hope I'm posting in the right place.
I'm getting the following error [in the kubedns container] when trying to create the kube-dns addon:

2017-01-18T04:12:37.756554582Z ERROR: logging before flag.Parse: F0118 04:12:37.756422       1 server.go:52] Failed to create a kubernetes client: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

Some details about my setup:

Vagrant: 1.9.1
Container Linux (CoreOS): 1235.6.0 (Stable)
Docker: 1.12.3
Flannel: v0.6.0
Kubernetes: v1.5.2
kube-dns: 1.11.0
Note: no config-map, no federation options.

To implement kube-dns I'm using the yaml's provided in /cluster/addons/dns

The error suggests the token is missing so I confirmed the service-account/secret exists
(and includes ca.crt):

Name:		default-token-tqc73
Namespace:	kube-system
Labels:		<none>
Annotations:	kubernetes.io/service-account.name=default
		kubernetes.io/service-account.uid=6b323b3c-dd33-11e6-8e80-080027acfaf0
Type:	kubernetes.io/service-account-token

Data
====
token:		eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLXRxYzczIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI2YjMyM2IzYy1kZDMzLTExZTYtOGU4MC0wODAwMjdhY2ZhZjAiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.A3jyD08wFKBwBlnrVCw87uPOv-65km9H17uUgQhw6tUrJIFiISkC8FfFccz5UlqerlIgeiIqjXIP5Hf8V0QvvQMrh9gk7DEEuMUXerSJwnfMWwTWsE53BVg_ErTU0A4xsdW5f2d7WhbcW8FxbpnZNCUAOFbCI99boinFR3Zn89LSilp5f5kJNI_19WAbmXAiNNACsi9GZisD_w3rps9WiwEBU3v5CFjefu7Ph_oR0R6Zzk9LMAt_1izEXHFKb3TBVQTb7uip-Tc3m0ouMtqr7ltGRe2o7Wv_coQBB_TcXUVF2F0lBvtNa3ndO-VeW7fXKcRdOsrWz6MTSOCYiqI6uw
ca.crt:		1363 bytes
namespace:	11 bytes

Note: To rule out any race condition (pod created before secret)
I booted the kubernetes cluster to idle, then manually ran create -f <kube-dns-yamls>

Inspecting the pod shows the token mounted:

Name:		kube-dns-2829910835-xhtrv
Namespace:	kube-system
Node:		172.17.8.11/172.17.8.11
Start Time:	Wed, 18 Jan 2017 04:12:32 +0000
Labels:		k8s-app=kube-dns
		pod-template-hash=2829910835
Status:		Running
IP:		10.13.176.2
Controllers:	ReplicaSet/kube-dns-2829910835
Containers:
  kubedns:
    Container ID:	docker://8a657cfdc2fad409ca0075655b4bd6636899f5b68707270e3c277f772d6d7f6a
    Image:		gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.11.0
    Image ID:		docker://sha256:b1e4978ccc41b11ce1c94b95dd807c3cb8d83fc57085b83d5f80607e4d923266
    Ports:		10053/UDP, 10053/TCP, 10055/TCP
    Args:
      --domain=cluster.local.
      --dns-port=10053
      --v=2
    Limits:
      memory:	170Mi
    Requests:
      cpu:		100m
      memory:		70Mi
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	255
      Started:		Wed, 18 Jan 2017 04:12:55 +0000
      Finished:		Wed, 18 Jan 2017 04:12:56 +0000
    Ready:		False
    Restart Count:	2
    Liveness:		http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:		http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tqc73 (ro)
    Environment Variables:
      PROMETHEUS_PORT:	10055
  dnsmasq:
    Container ID:	docker://42052c081c038110a6ea09888a6fc11a25f63384b52c6e6d28da8c7ed3fa7b23
    Image:		gcr.io/google_containers/k8s-dns-dnsmasq-amd64:1.11.0
    Image ID:		docker://sha256:721bf2add40b598b9deb978a7ce65c3f3e650f634ed40d0c665888e4108b72cd
    Ports:		53/UDP, 53/TCP
    Args:
      --cache-size=1000
      --no-resolv
      --server=127.0.0.1#10053
      --log-facility=-
    Requests:
      cpu:		150m
      memory:		10Mi
    State:		Running
      Started:		Wed, 18 Jan 2017 04:12:35 +0000
    Ready:		True
    Restart Count:	0
    Liveness:		http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tqc73 (ro)
    Environment Variables:	<none>
  sidecar:
    Container ID:	docker://d169091ab2d0581ade25cb6e6c5edc7da98d877ff64d3623b1f4209865475e24
    Image:		gcr.io/google_containers/k8s-dns-sidecar-amd64:1.11.0
    Image ID:		docker://sha256:cbae2d53df65429a0e131fbe140fd7c66d6d1059b3359b9e5b5e4e5b341d250b
    Port:		10054/TCP
    Args:
      --v=2
      --logtostderr
      --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
      --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
    Requests:
      cpu:		10m
      memory:		20Mi
    State:		Running
      Started:		Wed, 18 Jan 2017 04:12:35 +0000
    Ready:		True
    Restart Count:	0
    Liveness:		http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tqc73 (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True
  Ready 	False
  PodScheduled 	True
Volumes:
  default-token-tqc73:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-tqc73
QoS Class:	Burstable
Tolerations:	CriticalAddonsOnly=:Exists
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----			-------------			--------	------		-------
  41s		41s		1	{default-scheduler }					Normal		Scheduled	Successfully assigned kube-dns-2829910835-xhtrv to 172.17.8.11
  39s		39s		1	{kubelet 172.17.8.11}	spec.containers{kubedns}	Normal		Created		Created container with docker id a39c0d802043; Security:[seccomp=unconfined]
  39s		39s		1	{kubelet 172.17.8.11}	spec.containers{kubedns}	Normal		Started		Started container with docker id a39c0d802043
  39s		39s		1	{kubelet 172.17.8.11}	spec.containers{dnsmasq}	Normal		Pulled		Container image "gcr.io/google_containers/k8s-dns-dnsmasq-amd64:1.11.0" already present on machine
  39s		39s		1	{kubelet 172.17.8.11}	spec.containers{dnsmasq}	Normal		Created		Created container with docker id 42052c081c03; Security:[seccomp=unconfined]
  38s		38s		1	{kubelet 172.17.8.11}	spec.containers{sidecar}	Normal		Started		Started container with docker id d169091ab2d0
  38s		38s		1	{kubelet 172.17.8.11}	spec.containers{dnsmasq}	Normal		Started		Started container with docker id 42052c081c03
  38s		38s		1	{kubelet 172.17.8.11}	spec.containers{sidecar}	Normal		Pulled		Container image "gcr.io/google_containers/k8s-dns-sidecar-amd64:1.11.0" already present on machine
  38s		38s		1	{kubelet 172.17.8.11}	spec.containers{sidecar}	Normal		Created		Created container with docker id d169091ab2d0; Security:[seccomp=unconfined]
  36s		36s		1	{kubelet 172.17.8.11}	spec.containers{kubedns}	Normal		Created		Created container with docker id b90d8e58c16d; Security:[seccomp=unconfined]
  36s		36s		1	{kubelet 172.17.8.11}	spec.containers{kubedns}	Normal		Started		Started container with docker id b90d8e58c16d
  35s		30s		3	{kubelet 172.17.8.11}					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 10s restarting failed container=kubedns pod=kube-dns-2829910835-xhtrv_kube-system(55ad8e0e-dd34-11e6-8e80-080027acfaf0)"

  40s	18s	3	{kubelet 172.17.8.11}	spec.containers{kubedns}	Normal	Pulled		Container image "gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.11.0" already present on machine
  18s	18s	1	{kubelet 172.17.8.11}	spec.containers{kubedns}	Normal	Created		Created container with docker id 8a657cfdc2fa; Security:[seccomp=unconfined]
  18s	18s	1	{kubelet 172.17.8.11}	spec.containers{kubedns}	Normal	Started		Started container with docker id 8a657cfdc2fa
  35s	10s	5	{kubelet 172.17.8.11}	spec.containers{kubedns}	Warning	BackOff		Back-off restarting failed docker container
  16s	10s	2	{kubelet 172.17.8.11}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 20s restarting failed container=kubedns pod=kube-dns-2829910835-xhtrv_kube-system(55ad8e0e-dd34-11e6-8e80-080027acfaf0)"

I'm at a loss as to how to troubleshoot further.
I'm using the provided files which require minimal change.
Please let me know if you need more info, thank you!

sidecar should verify that it had fully qualified DNS names

Spent a while debugging the fact that the input was cluster.local and not cluster.local.

Or we can just fix it up.

Only one A record set for headless service with pods having single hostname.

/kind bug

What happened
When a headless service is created to point to pods which share a single hostname, (which happens, for example, when the hostname field was set in the template of a deployment/replicaset)

Only one A record is returned for the service DNS name
A pod DNS name is generated based on this host name, which points to a single pod

What was expected to happen

Return A records for all available endpoints on the service DNS name
Not sure what the correct behaviour should be for the pod dns name, either also return multiple A records, or don't create at all.

Seems this has to do with the following code:
https://github.com/kubernetes/dns/blob/master/pkg/dns/dns.go#L490

Then the endpointName will be equal for any pod in the service which has the same hostname, so the entry in subCache will be overwritten.

How to reproduce

Apply the following spec:

apiVersion: v1
kind: List
items:
- apiVersion: extensions/v1beta1
  kind: Deployment
  metadata:
    name: depl-1
  spec:
    replicas: 2
    template:
      metadata:
        labels:
          app: depl-1
      spec:
        hostname: depl-1-host
        subdomain: depl-1-service
        containers:
        - name: test
          args:
          - bash
          stdin: true
          tty: true
          image: debian:jessie
- apiVersion: v1
  kind: Service
  metadata:
    name: depl-1-service
  spec:
    clusterIP: None
    selector:
      app: depl-1
    ports:
    - port: 5000

Resolving the hostnames gives back but a single A record.

# host depl-1-host.depl-1-service.default.svc.cluster.local
depl-1-host.depl-1-service.default.svc.cluster.local has address 10.56.0.140
# host depl-1-service.default.svc.cluster.local
depl-1-service.default.svc.cluster.local has address 10.56.0.140

PTR records ARE being created for all the pods, all resolving back to the single hostname. This is expected behaviour.

dnsmasq container fails to build on OS X (incompatible with BSD sed)

The Makefile for the dnsmasq container contains invocations in the style of:

sed -i "s/pattern/replacement/" Dockerfile

This is incompatible with BSD sed, whose -i flag takes a required argument. It generates the following:

sed: 1: "Dockerfile": extra characters at the end of D command

Note that it's trying to parse the string Dockerfile as a sed command.

I'll have a patch shortly.

DNS flag kube-master-url is not respected

Using gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
K8s: v1.6.0-beta.4

Looks like the value in the kubeconfig overrides the value provided in kube-master-url flag. The dns pod doesn't start up.

Settings the flags below when starting the DNS pod

- --kubecfg-file=/root/.kube/config
- --kube-master-url=https://34.205.140.27:6443

kube config content

apiVersion: v1
kind: Config
clusters:
- name: kubernetes
  cluster:
    certificate-authority: /etc/kubernetes/ca.pem
    server: "https://127.0.0.1:6443"
users:
- name: admin
  user:
    username: admin
    password: "abbazabba"
contexts:
- name: kubernetes
  context:
    cluster: kubernetes
    user: admin
current-context: kubernetes

[root@ip-10-0-3-130 specs]# docker logs e919ff944998
I0327 23:42:02.206667       1 dns.go:49] version: v1.5.2-beta.0+$Format:%h$
I0327 23:42:02.230408       1 server.go:74] ConfigMap and ConfigDir not configured, using values from command line flags
I0327 23:42:02.230455       1 server.go:112] FLAG: --alsologtostderr="false"
I0327 23:42:02.230480       1 server.go:112] FLAG: --config-dir=""
I0327 23:42:02.230485       1 server.go:112] FLAG: --config-map=""
I0327 23:42:02.230488       1 server.go:112] FLAG: --config-map-namespace="kube-system"
I0327 23:42:02.230492       1 server.go:112] FLAG: --config-period="10s"
I0327 23:42:02.230497       1 server.go:112] FLAG: --dns-bind-address="0.0.0.0"
I0327 23:42:02.230500       1 server.go:112] FLAG: --dns-port="10053"
I0327 23:42:02.230506       1 server.go:112] FLAG: --domain="cluster.local."
I0327 23:42:02.230514       1 server.go:112] FLAG: --federations=""
I0327 23:42:02.230519       1 server.go:112] FLAG: --healthz-port="8081"
I0327 23:42:02.230522       1 server.go:112] FLAG: --initial-sync-timeout="1m0s"
I0327 23:42:02.230525       1 server.go:112] FLAG: --kube-master-url="https://34.205.140.27:6443"
I0327 23:42:02.230530       1 server.go:112] FLAG: --kubecfg-file="/root/.kube/config"
I0327 23:42:02.230533       1 server.go:112] FLAG: --log-backtrace-at=":0"
I0327 23:42:02.230538       1 server.go:112] FLAG: --log-dir=""
I0327 23:42:02.230543       1 server.go:112] FLAG: --log-flush-frequency="5s"
I0327 23:42:02.230546       1 server.go:112] FLAG: --logtostderr="true"
I0327 23:42:02.230550       1 server.go:112] FLAG: --nameservers=""
I0327 23:42:02.230552       1 server.go:112] FLAG: --stderrthreshold="2"
I0327 23:42:02.230556       1 server.go:112] FLAG: --v="2"
I0327 23:42:02.230559       1 server.go:112] FLAG: --version="false"
I0327 23:42:02.230564       1 server.go:112] FLAG: --vmodule=""
I0327 23:42:02.230610       1 server.go:175] Starting SkyDNS server (0.0.0.0:10053)
I0327 23:42:02.230722       1 server.go:199] Skydns metrics not enabled
I0327 23:42:02.230732       1 dns.go:147] Starting endpointsController
I0327 23:42:02.230735       1 dns.go:150] Starting serviceController
I0327 23:42:02.231403       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0327 23:42:02.231421       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
E0327 23:42:02.231652       1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://127.0.0.1:6443/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
E0327 23:42:02.231698       1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://127.0.0.1:6443/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
I0327 23:42:02.731122       1 dns.go:174] DNS server not ready, retry in 500 milliseconds
I0327 23:42:03.230954       1 dns.go:174] DNS server not ready, retry in 500 milliseconds

kubedns nslookup: can't resolve

I have successfully setup a Kube cluster and trying to setup KubeDNS.

Version:

kubectl version

Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

When I deploy kubedns, I get this error in journalctl -xe

"Error syncing deployment kube-system/kube-dns-v20: Operation cannot be fulfilled on deployments.extensions "kube-dns-v20": the object has been modified; please apply your changes to the latest version and try again"

But didn't find enough info on that, so ignored it to atleast see where I could reach.

kubectl cluster-info

Kubernetes master is running at http://192.168.6.101:8080 KubeDNS is running at http://192.168.6.101:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns

mongodb and webapp are my pods

kubectl exec mongodb cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local     federated.fds
nameserver 10.254.0.2
nameserver 8.8.8.8
nameserver 8.8.4.4
options ndots:5

[osboxes@kubemaster kubernetes]$ kubectl exec webapp cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local federated.fds
nameserver 10.254.0.2
nameserver 8.8.8.8
nameserver 8.8.4.4
options ndots:5

on the master and node, in etc/kubernetes/config, I have this

ENABLE_CLUSTER_DNS="${KUBE_ENABLE_CLUSTER_DNS:-true}"
DNS_SERVER_IP="10.254.0.2"
DNS_DOMAIN="cluster.local"
DNS_REPLICAS=1

I have passed the required arguments in the node in kubelet on startup --cluster_dns=10.254.0.2 --cluster_domain=cluster.local

`kubectl exec webapp -- cat /etc/hosts
127.0.0.1   localhost
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.17.23.2 webapp`

When I do a nslookup on the container


kubectl exec -it webapp -- nslookup kubernetes.default.svc.cluster.local localhost
;; connection timed out; no servers could be reached

error: error executing remote command: error executing command in container: Error executing in Docker Container: 1

But this works,

kubectl --namespace=kube-system exec -ti kube-dns-v20-2502365070-2rmzl -c     kubedns --  nslookup kubernetes.default.svc.cluster.local localhost
Server:    127.0.0.1
Address 1: 127.0.0.1 localhost

Name:      kubernetes.default.svc.cluster.local
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local

No other errors in logs kubedns, dnsmasq or healthz. I can

Error with pod hostname query

Query for the <pod-hostname>.<ns>.pods.cluster.local is not returning the A record for the pod

/ # dig customer-deployment-onuwixzygi4tkn3fgztgeylfgnrdm-62180010musmy.staging.pod.cluster.local.

kube-dns logs

I0316 16:18:30.889074       1 logs.go:41] skydns: error from backend: Invalid IP Address customer.deployment.onuwixzygi4tkn3fgztgeylfgnrdm.62180010musmy

stubDomains should also support nameserver with port

@bowei, Thanks for the feature to add nameserver to kubedns via stubDomains, its really useful.
I find that nameserver need to be pure ip and does not support appending with port.

it would be really useful to support nameserver appended with port, if somebody wants to deploy dns-server in kubernetes as a NodePort service and there could be possibilities where cloud-provider may not be supporting LoadBalancer type service.

We do have a use case in federation for federating clusters on non-cloud environments.

if agree, i can send a patch to support nameserver appended with port number

cc @kubernetes/sig-federation-misc

Custom DNS entries for kube-dns

I re-create the issue here as suggested by @bowei.

Kubernetes version (use kubectl version): v1.5.2

Environment:

Cloud provider or hardware configuration: baremetal,
OS (e.g. from /etc/os-release): Debian GNU/Linux 8 (jessie)
Kernel (e.g. uname -a): 3.16.7
Install tools: docker-multinode.

Current status: entries in /etc/hosts of the nodes are not used by kube-dns. There is no straightforward way to replicate this custom DNS configuration for the cluster.

What I would like: Some way to easily define custom DNS entries used by kube-dns on a cluster-wide level, without deploying an additional DNS server.

Already considered solutions:

Use a Service without selector and external Endpoint. It won't allow full DNS names and that won't change for the time being.
Deploy an additional DNS server and add it to /etc/resolv.conf in ~~all nodes~~ the node running kube-dns. This works, but it's not convenient for quick/testing deployments. Also, it requires access to the host.
Define a ConfigMap with the DNS entries and add them to the kube-dns dnsmasq /etc/hosts.d. This is possible, but requires modifying kube-dns launch flags (not obvious in automated deployments like docker-multinode.

Possible solutions:

Implement a special ConfigMap to declare custom entries on a cluster-wide level and make kube-dns look it up.
kube-dns imports node's /etc/hosts entries, like it does for /etc/resolv.conf. This is not very elegant and doesn't scale, but it replicates a capability currently existing in non containerized system administration.

Spec: Update the spec for federation

The current spec covers only a single cluster. It needs to be updated to reflect federation records and behavior.

External queries fail with Cloudflare domain in DNS search list

My domain's DNS is hosted with Cloudflare. I am using a bare metal cluster that has the domain in the /etc/resolv.conf search list.

DNS queries for external domains (e.g. www.yahoo.com) fail under the above conditions on some containers (e.g. Alpine-based):

/ # ping www.yahoo.com
ping: bad address 'www.yahoo.com'

When I manually remove my domain from the /etc/resolv.conf search list in the container the query works as expected.

With Wireshark, I was able to determine that Cloudflare's NS returns RCODE = 0 with no RRs when queried with a nonexistent domain (e.g. www.yahoo.com.mydomain.com). Most other NSs I've tried return RCODE = 3 in this case. (This issue never came up until I moved to Cloudflare; my domain registry's nameservers return RCODE = 3 for nonexistent domains.)

Could the RCODE = 0 result code on the search list be preventing Kubernetes DNS from performing a FQDN lookup (e.g. just www.yahoo.com) in this case, resulting in the ultimate failure of the query?

I've raised the RCODE issue with Cloudflare, and had a quick look at the SkyDNS and miekg/dns project source code, but it wasn't not immediately clear to me what the code path is here.

KubeDNS consistently dies

Hi there! I'm running a vanilla Kubernetes v1.5.1 cluster inside VirtualBox with a flannel overlay network. All of that worked perfectly, so I decided to get cluster DNS working today.

I'm having a couple of issues here. First off is sometimes when KubeDNS starts it can't reach the apiserver. I have to restart it a couple of times before it reaches it correctly. The second problem is that the service seems to stop serving DNS and stop responding to the health checks, so I have to restart it again. The dnsmasq container seems to be fine, but I get this error message on the KubeDNS container around when things stop working:

Ignoring signal terminated (can only be terminated by SIGKILL)

KubeDNS does work for a couple of minutes at a time, but only for a couple of minutes.

Its gotten kind of frustrating and I've tried to hunt down any docs. I'm not pasting any logs in this issue because there's so much to sift through, so any ideas on where to start?

Wrong version reported for 1.14.2 (1.14.1-16-gff416ee)

All containers (dnsmask, kubedns & sidecar) report the wrong version.

gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.2
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.2
gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.2

$ kubectl logs kube-dns-1578267200-z2ptk -c sidecar
ERROR: logging before flag.Parse: I0525 13:43:00.569730       1 main.go:48] Version v1.14.1-16-gff416ee

Unable to use port numbers for stubDomain servers

When trying to setup a stubDomain to resolve to my consul cluster, I'm getting "invalid nameserver" messages

configMap data:

data:
  stubDomains: {"service.consul": ["100.71.68.61:8600"]}

The IsDNS1123Subdomain check on line 104 in config/config.go looks unnecessary to me.

Use glide instead of GoDep

Glide seems to be a better dependency management mechanism.

Federations config should work with or without trailing dots in the zone names

Federations config either passed through a flag or a ConfigMap is validated to ensure that the zone names (the domain name suffix) are valid RFC 1123 subdomain names. Most DNS zone configurations take a trailing dot in their configuration, but the regex we use in kube-dns has a format that does not match trailing dots. This mismatch is a cognitive overhead for our users. We should just accept zone names in Federations config with or without trailing dots.

How to debug dns issues ?

Documentation on how to go about debugging issues in kube-dns

kube-dns doesn't support using service name in stubDomains as docs and code seem to suggest

According to the announcement blog post, I should be able to use a service as an upstream for stubDomains (note the note):

stubDomains (optional)

    Format: a JSON map using a DNS suffix key (e.g.; “acme.local”) and a value consisting of a JSON array of DNS IPs.
    Note: The target nameserver may itself be a Kubernetes service. For instance, you can run your own copy of dnsmasq to export custom DNS names into the ClusterDNS namespace.

However if I try to point at our unbound service, the dnsmasq container crashes:

I0505 21:26:49.795586       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0505 21:26:49.795988       1 sync.go:167] Updated stubDomains to map[example.com:[unbound]]
I0505 21:26:49.796066       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053 --server /example.com/unbound]
I0505 21:26:49.797178       1 nanny.go:111]
W0505 21:26:49.797301       1 nanny.go:112] Got EOF from stdout
F0505 21:26:49.797341       1 nanny.go:182] dnsmasq exited: exit status 1
I0505 21:26:49.797283       1 nanny.go:115]
E0505 21:26:49.797501       1 nanny.go:116] Error reading from stderr: read |0: bad file descriptor

This test also seems to indicate that names are allowed in stubDomains, but from what I can tell, dnsmasq doesn't allow a name in a --server parameter (in the above example, it's sending --server /example.com/unbound)

We can currently use the clusterIP of a service as upstream DNS, but that seems pretty error prone. Is there something else we should be doing here, or am I misinterpreting the documentation?

DNS failures for internal AWS URLs

I'm seeing intermittent DNS failures, but only for the URLs that AWS assigns my machines (e.g. ip-10-16-25-61.us-west-2.compute.internal).

External URLs (e.g. google.com) resolve fine, as do URLs for services (e.g. kubernetes.default).

Setup:
Two-replica DNS behind a service IP, configured with kops. Each pod is running using this template, which contains kubedns, dnsmasq, and dns-sidecar. All are using version 1.14.1 of the kube-dns images.

The cluster is running Kubernetes version 1.6.0.

Problem description:

The system seems to end up in one of three states:

Lookups succeed 100% of the time.
The lookup resolves, returning the correct TTL (20). Typically, the first lookup after kubedns logs new entries will succeed (each time kubedns runs a work cycle, every 5 minutes). Lookups will continue to succeed until the TTL of the entry expires, and will start to fail after that.
Lookups fail 100% of the time.

When I've restarted the DNS services, they seem to progress from state 1, through state 2, into state 3, occasionally jumping back to state 1 or 2.

The logs for the containers aren't displaying any errors or other interesting information.

short-form dns query nslookup kubernetes.default not working

in case someone encounter the same problem, i write my finding here.

kube-dns behaviour:

kube-dns POD's /etc/resolv.conf is usually the same as the host;
kubedns will forward unknown domain to name server in its resolv.conf;
NOTE: short-form query such as kubernetes.default is unknown to kubedns
kubedns seem to be use only the first nameserver.
EDIT: from https://github.com/skynetservices/skydns/blob/f694f5637b31e2b9c9871fb396773d5d18b9309e/server/exchange.go#L29, it's not doing NSRotate. when no NSRotate, it always use the first name server first, and it only retry on connection error, for application error it just directly forward upstream error code.

so the short-form query works like this:

client send short-form query to kube-dns;
kube-dns know nothing about it, forward to external nameserver in resolv.conf;
external name server return ERROR to kube-dns;
kube-dns forward failure to client;
client append search domain in its resolv.conf and goto 1 to retry.

client behaviour:
for 5, different client seem to have differnt behaviour regards to differnt error from 3.
for busybox, it seem to be only append search for NXDOMAIN, and not append search for REFUSED;
and for alpine and tutum/dnsutils, it will append search for both NXDOMAIN and REFUSED.

my installation it's a bit unusually, although they have idential /etc/resolv.conf, the first name server behaviour differently on different node: some are not recusive and will return REFUSED, and the other
will return NXDOMAIN.

so i got this weird behaviour:
when kube-dns is on NXDOMAIN node, busybox nslookup testing works;
when kube-dns is on REFUSED node, busybox nslookup testing fails.
and alpine/tutum/dnsutils always works regards which node kube-dns is on.

so, when deploy kube-dns, you should ensure the first nameserver in your host /etc/resolv.conf works as expected.

=============== BELOW are original question ===============================
i enabled RBAC for my on-premise k8s cluster, but found cross-name space DNS query different from non-RBAC.
i didn't find any document for this behaviour, so this issue.

for no-RBAC:

i can do cross-namespace DNS query for service in ns1 from ns2 using svc1.ns1;
i can query service in the same namespace using svc1.ns1 with namespace.

but in RBAC, i have to use the FQDN:

can't do cross-namespace query using svc1.ns1 from ns2;
can't query service in the same namespace using svc1.ns1 with namespace;

from a busybox in default namespace, i got the following output:

/ # nslookup nginx-deployment
Server:    10.233.0.3
Address 1: 10.233.0.3 kubedns.kube-system.svc.cluster.local

Name:      nginx-deployment
Address 1: 10.233.33.138 nginx-deployment.default.svc.cluster.local
/ # nslookup nginx-deployment.default
Server:    10.233.0.3
Address 1: 10.233.0.3 kubedns.kube-system.svc.cluster.local

nslookup: can't resolve 'nginx-deployment.default'
/ # nslookup nginx-deployment.kube-system
Server:    10.233.0.3
Address 1: 10.233.0.3 kubedns.kube-system.svc.cluster.local

nslookup: can't resolve 'nginx-deployment.kube-system'
/ # nslookup nginx-deployment.kube-system.svc.cluster.local
Server:    10.233.0.3
Address 1: 10.233.0.3 kubedns.kube-system.svc.cluster.local

Name:      nginx-deployment.kube-system.svc.cluster.local
Address 1: 10.233.18.29 nginx-deployment.kube-system.svc.cluster.local

here is my container info:

  Containers:
   kubedns:
    Image:      gcr.io/google_containers/kubedns-amd64:1.9
   dnsmasq:
    Image:      gcr.io/google_containers/kube-dnsmasq-amd64:1.3
   healthz:
    Image:      gcr.io/google_containers/exechealthz-amd64:1.1