Giter Club home page Giter Club logo

Comments (34)

jakerobb avatar jakerobb commented on June 26, 2024 3

I asked in Slack about this, and I was asked to provide:

  1. /etc/resolv.conf from the conduit-proxy container:
I have no name!@reference-code-service-dd6bfc9b4-65hqx:/$ cat /etc/resolv.conf
nameserver 10.96.0.10
search conduit-sandbox.svc.cluster.local svc.cluster.local cluster.local devop.vertafore.com vertafore.com sircon.com
options ndots:5
  1. the output of dig showsearch cassandra01 (where cassandra01 is a host which my service can't reach, despite skipping the requisite port)
bash-4.2# dig +showsearch cassandra01

; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> +showsearch cassandra01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 22402
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cassandra01.conduit-sandbox.svc.cluster.local. IN A

;; AUTHORITY SECTION:
cluster.local.		30	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1526676639 7200 1800 86400 30

;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri May 18 20:53:08 UTC 2018
;; MSG SIZE  rcvd: 128


; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> +showsearch cassandra01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 34761
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cassandra01.svc.cluster.local.	IN	A

;; AUTHORITY SECTION:
cluster.local.		30	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1526676639 7200 1800 86400 30

;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri May 18 20:53:08 UTC 2018
;; MSG SIZE  rcvd: 112


; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> +showsearch cassandra01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 22301
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cassandra01.cluster.local.	IN	A

;; AUTHORITY SECTION:
cluster.local.		30	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1526676639 7200 1800 86400 30

;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri May 18 20:53:08 UTC 2018
;; MSG SIZE  rcvd: 108


; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> +showsearch cassandra01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10361
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cassandra01.devop.vertafore.com. IN	A

;; AUTHORITY SECTION:
devop.vertafore.com.	3578	IN	SOA	botd-dc02.devop.vertafore.com. hostmaster.devop.vertafore.com. 351821 900 600 86400 3600

;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri May 18 20:53:08 UTC 2018
;; MSG SIZE  rcvd: 117


; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> +showsearch cassandra01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 8545
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cassandra01.vertafore.com.	IN	A

;; AUTHORITY SECTION:
vertafore.com.		3552	IN	SOA	ent-de-dc02.vertafore.com. hostmaster.vertafore.com. 2017206649 900 600 86400 3600

;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri May 18 20:53:08 UTC 2018
;; MSG SIZE  rcvd: 113


; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> +showsearch cassandra01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 21719
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;cassandra01.sircon.com.		IN	A

;; AUTHORITY SECTION:
sircon.com.		3600	IN	SOA	oke-entnt02vw.vertafore.com. hostmaster.innovativeit.com. 2190 3600 600 432000 3600

;; Query time: 54 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri May 18 20:53:18 UTC 2018
;; MSG SIZE  rcvd: 135


; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> +showsearch cassandra01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62828
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cassandra01.			IN	A

;; ANSWER SECTION:
cassandra01.		3600	IN	A	10.0.7.14

;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri May 18 20:53:18 UTC 2018
;; MSG SIZE  rcvd: 56

Hope this helps!

from linkerd2.

olix0r avatar olix0r commented on June 26, 2024 1

@hawkw and I tried to verify that the proxy works with alternate dnsPolicy settings. We've both observed the following behavior:

Using a spec like:

    spec:
      dnsPolicy: "Default"
      containers:
      - ...

We are able to resolve names like emoji.voto:

root@voter-68dd555fb6-spss2:/# dig +showsearch emoji.voto

; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch emoji.voto
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31472
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;emoji.voto.			IN	A

;; ANSWER SECTION:
emoji.voto.		1125	IN	A	35.193.85.33

;; Query time: 10 msec
;; SERVER: 192.168.65.1#53(192.168.65.1)
;; WHEN: Thu Apr 26 21:50:16 UTC 2018
;; MSG SIZE  rcvd: 44

but requests fail:

root@voter-68dd555fb6-spss2:/# curl -vs http://emoji.voto
* Rebuilt URL to: http://emoji.voto/
* Hostname was NOT found in DNS cache
*   Trying 35.193.85.33...
* Connected to emoji.voto (35.193.85.33) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: emoji.voto
> Accept: */*
> 

and then 10 seconds pass before the request fails with a 500.

In the proxy's log we see the following:

ERR! conduit_proxy turning operation timed out after 10s into 500
ERR! conduit_proxy::control "controller-client", controller error: Error attempting to establish underlying session layer: operation timed out after 3s
WARN trust_dns_proto::dns_handle error notifying wait, possible future leak: Err(ResolveError(Proto(Timeout), (None, stack backtrace:
   0:     0x55e78dd04e0c - backtrace::backtrace::trace::hef23eb1bdedb9de4
   1:     0x55e78dd040a2 - backtrace::capture::Backtrace::new::h570f24a892cd6a34
   2:     0x55e78dcce917 - <trust_dns_proto::error::ProtoError as core::convert::From<trust_dns_proto::error::ProtoErrorKind>>::from::h043748def247281b
   3:     0x55e78dcc2ec2 - _ZN110_$LT$trust_dns_proto..dns_handle..DnsFuture$LT$S$C$$u20$E$C$$u20$MF$GT$$u20$as$u20$futures..future..Future$GT$4poll17h37890051da49b999E.llvm.17885337389023634268
   4:     0x55e78dc92bd0 - <futures::future::chain::Chain<A, B, C>>::poll::h046c15f0cba805b8
   5:     0x55e78dcba667 - <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll::hbeb65d90d09deaf4
   6:     0x55e78dd58fbf - futures::task_impl::std::set::h0eda14c4de187820
   7:     0x55e78dd5644f - <scoped_tls::ScopedKey<T>>::set::h0efdea05668e906c
   8:     0x55e78dd4cb7f - tokio_core::reactor::Core::poll::hfaacf20236fbcc81
   9:     0x55e78db31643 - tokio_core::reactor::Core::run::hdb7e20c27c731447
  10:     0x55e78db71da8 - std::sys_common::backtrace::__rust_begin_short_backtrace::h808bc1a152c1d7fa
  11:     0x55e78daa5455 - _ZN3std9panicking3try7do_call17h968da37b516bd463E.llvm.6120476831782816583
  12:     0x55e78ddbb82e - __rust_maybe_catch_panic
                        at libpanic_unwind/lib.rs:102
  13:     0x55e78daa87fa - <F as alloc::boxed::FnBox<A>>::call_box::hb783e783e73f8fc2
  14:     0x55e78ddb376b - <alloc::boxed::Box<alloc::boxed::FnBox<A, Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h598f9713c9cb9093
                        at /checkout/src/liballoc/boxed.rs:798
                         - std::sys_common::thread::start_thread::h65eb43a1201d41e6
                        at libstd/sys_common/thread.rs:24
                         - std::sys::unix::thread::Thread::new::thread_start::h711c51a13a158afa
                        at libstd/sys/unix/thread.rs:90
  15:     0x7f5a7ebf9063 - start_thread
  16:     0x7f5a7e71862c - clone
  17:                0x0 - <unknown>)))

from linkerd2.

jakerobb avatar jakerobb commented on June 26, 2024 1

Maybe the rest of you already realize this, but I've found a workaround for this issue that works as long as the hosts you need to resolve have static IP addresses. In the pod template section of your deployment YAML, under the "spec" element, you can create a 'hostAliases' element and define entries which will land in the container's /etc/hosts file, essentially circumventing DNS entirely. Here's mine:

spec:
  hostAliases:
  - ip: "10.0.7.14"
    hostnames:
    - "cassandra01"
  - ip: "10.0.7.15"
    hostnames:
    - "cassandra02"
  - ip: "10.0.7.27"
    hostnames:
    - "cassandra03"

Hope this helps someone!

from linkerd2.

bluejekyll avatar bluejekyll commented on June 26, 2024 1

error notifying wait, possible future leak usually means the handle to the lookup went away, though that may be a redherring given the timeout message.

I'm not familiar with this dnsPolicy: Default vs. ClusterFirst, what is that doing?

Edit: I just reviewed the docs,

  • “Default“: The Pod inherits the name resolution configuration from the node that the pods run on. See related discussion for more details.
  • “ClusterFirst“: Any DNS query that does not match the configured cluster domain suffix, such as “www.kubernetes.io”, is forwarded to the upstream nameserver inherited from the node. Cluster administrators may have extra stub-domain and upstream DNS servers configured. See related discussion for details on how DNS queries are handled in those cases.
  • “ClusterFirstWithHostNet“: For Pods running with hostNetwork, you should explicitly set its DNS policy “ClusterFirstWithHostNet”.
  • “None“: A new option value introduced in Kubernetes v1.9 (Beta in v1.10). It allows a Pod to ignore DNS settings from the Kubernetes environment. All DNS settings are supposed to be provided using the dnsConfig field in the Pod Spec. See DNS config subsection below.

Are these settings being passed directly into trust-dns-resolver config? meaning, is there some ordering required in the NameServers in the Resolver instance?

from linkerd2.

bluejekyll avatar bluejekyll commented on June 26, 2024 1

If constant ordering is required, we'll have to look into adding that as an option to the NameServerPool. Right now the nameservers are reordered base on performance: https://github.com/bluejekyll/trust-dns/blob/master/resolver/src/name_server_pool.rs#L624

If this is undesirable, we should disable this sort_unstable with some config option. We'd also need to review the ordering and make sure the set of NameServers is consistent with the passed in configuration. This should be easy now, since I switched this to be a plain Vec away from the BinaryHeap before: https://github.com/bluejekyll/trust-dns/blob/master/resolver/src/name_server_pool.rs#L448-L449

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024 1

In the configuration above in #62 (comment) there's only one nameserver.

from linkerd2.

hawkw avatar hawkw commented on June 26, 2024 1

@wmorgan I'm not aware of anything that's happened recently that would fix it, but would have to test to confirm...

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

See also http://blog.kubernetes.io/2017/04/configuring-private-dns-zones-upstream-nameservers-kubernetes.html for some interesting details on this issue. We should consider allowing the controller Destination service to be configured with a ConfigMap like that.

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

Based on the solution to #366, we should change course. Instead of documenting that Conduit would potentially break pods using incompatible DNS configuration, we should instead by default avoid injecting the proxy into such pods. We should provide a documented way to get the proxy working with such pods. For example, we could document that one has to remove the incompatible DNS settings. And/or we could implement & document an explicit pod annotation that would override our default decision to assume that such DNS configurations are incompatible.

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

Based on the investigation in #392 and the current direction of things as described in the conversation at the end of #360 I think it's best to avoid doing anything for this for 0.3. Basically, if we spend the effort to, we should be able to transparently support any DNS policy, automatically. It'd be unfortunate that 0.3 wouldn't do that, but this isn't the most significant DNS transparency issue in 0.3, and we might fix the overall issue in 0.4.

from linkerd2.

olix0r avatar olix0r commented on June 26, 2024

We should hold off on this until #421 is done

from linkerd2.

olix0r avatar olix0r commented on June 26, 2024

I think we should be able to inject to pods with alternate dns policies... but this should be hashed out in terms of the larger transparency project.

from linkerd2.

olix0r avatar olix0r commented on June 26, 2024

@briansmith Do we still need to prevent injection for pods dnsConfig once #155 is resolved?

My understanding is that, once the proxy is able to fallback to local dns lookup, dnsConfig won't be a problem (as long as the proxy honors proper dns client semantics wrt /etc/nsswitch.conf etc). Is that correct?

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

@briansmith Do we still need to prevent injection for pods dnsConfig once #155 is resolved?

My understanding is that, once the proxy is able to fallback to local dns lookup, dnsConfig won't be a problem (as long as the proxy honors proper dns client semantics wrt /etc/nsswitch.conf etc). Is that correct?

That's correct My goal is that the fix for #155 will at least mostly fix this. We might need some follow-up work to be completely resolve it. For example, we might fix #155 assuming that the DNS search path starts with ".$current-namespace.svc.cluster.local. .svc.cluster.local. .cluster.local." and then follow that up in this issue by completely honoring the actual configured DNS search path to handle cases where those items aren't the first items in the list or where they aren't in the list at all. And, also, we'd need to add tests for this in this issue too.

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

I changed the title to "Support custom dnsPolicy and dnsConfig" to better capture the plan.

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

It would be useful to see the contents of /etc/resolv.conf and /etc/hosts in the configuration that fails.

from linkerd2.

olix0r avatar olix0r commented on June 26, 2024

running in docker-for-mac:

:; kubectl -n bot exec -c vote-bot voter-68dd555fb6-spss2 -i -t bash
root@voter-68dd555fb6-spss2:/# cat /etc/resolv.conf 
nameserver 192.168.65.1
root@voter-68dd555fb6-spss2:/# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
fe00::0	ip6-mcastprefix
fe00::1	ip6-allnodes
fe00::2	ip6-allrouters
10.1.0.83	voter-68dd555fb6-spss2
root@voter-68dd555fb6-spss2:/# dig +showsearch emoji.voto

; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch emoji.voto
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37274
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;emoji.voto.			IN	A

;; ANSWER SECTION:
emoji.voto.		1125	IN	A	35.193.85.33

;; Query time: 3 msec
;; SERVER: 192.168.65.1#53(192.168.65.1)
;; WHEN: Thu Apr 26 22:59:29 UTC 2018
;; MSG SIZE  rcvd: 44

In about without a dnsPolicy, we see the following:

root@absentee-voter-bc6545bd5-97dkq:/# cat /etc/resolv.conf 
nameserver 10.96.0.10
search bot.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
root@absentee-voter-bc6545bd5-97dkq:/# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
fe00::0	ip6-mcastprefix
fe00::1	ip6-allnodes
fe00::2	ip6-allrouters
10.1.0.82	absentee-voter-bc6545bd5-97dkq
root@absentee-voter-bc6545bd5-97dkq:/# dig +showsearch emoji.voto

; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch emoji.voto
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 12832
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;emoji.voto.bot.svc.cluster.local. IN	A

;; AUTHORITY SECTION:
cluster.local.		60	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1524783600 28800 7200 604800 60

;; Query time: 0 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Thu Apr 26 23:01:56 UTC 2018
;; MSG SIZE  rcvd: 143


; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch emoji.voto
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 35302
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;emoji.voto.svc.cluster.local.	IN	A

;; AUTHORITY SECTION:
cluster.local.		60	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1524783600 28800 7200 604800 60

;; Query time: 3 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Thu Apr 26 23:01:56 UTC 2018
;; MSG SIZE  rcvd: 139


; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch emoji.voto
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 30547
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;emoji.voto.cluster.local.	IN	A

;; AUTHORITY SECTION:
cluster.local.		60	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1524783600 28800 7200 604800 60

;; Query time: 2 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Thu Apr 26 23:01:56 UTC 2018
;; MSG SIZE  rcvd: 135


; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> +showsearch emoji.voto
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17100
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;emoji.voto.			IN	A

;; ANSWER SECTION:
emoji.voto.		865	IN	A	35.193.85.33

;; Query time: 3 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Thu Apr 26 23:01:56 UTC 2018
;; MSG SIZE  rcvd: 55

from linkerd2.

wmorgan avatar wmorgan commented on June 26, 2024

For completeness, the errors that @jakerobb reported on Slack were:

ERR! conduit_proxy::control "controller-client", controller error: Error attempting to establish underlying session layer: operation timed out after 3s
WARN trust_dns_proto::dns_handle error notifying wait, possible future leak: Err(ResolveError(Proto(Timeout), (None, stack backtrace:
   0:     0x562b82671e0c - backtrace::backtrace::trace::hef23eb1bdedb9de4
   1:     0x562b826710a2 - backtrace::capture::Backtrace::new::h570f24a892cd6a34
   2:     0x562b8263b917 - <trust_dns_proto::error::ProtoError as core::convert::From<trust_dns_proto::error::ProtoErrorKind>>::from::h043748def247281b
   3:     0x562b8262fec2 - _ZN110_$LT$trust_dns_proto..dns_handle..DnsFuture$LT$S$C$$u20$E$C$$u20$MF$GT$$u20$as$u20$futures..future..Future$GT$4poll17h37890051da49b999E.llvm.17885337389023634268
   4:     0x562b825ffbd0 - <futures::future::chain::Chain<A, B, C>>::poll::h046c15f0cba805b8
   5:     0x562b82627667 - <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll::hbeb65d90d09deaf4
   6:     0x562b826c5fbf - futures::task_impl::std::set::h0eda14c4de187820
   7:     0x562b826c344f - <scoped_tls::ScopedKey<T>>::set::h0efdea05668e906c
   8:     0x562b826b9b7f - tokio_core::reactor::Core::poll::hfaacf20236fbcc81
   9:     0x562b8249e643 - tokio_core::reactor::Core::run::hdb7e20c27c731447
  10:     0x562b824deda8 - std::sys_common::backtrace::__rust_begin_short_backtrace::h808bc1a152c1d7fa
  11:     0x562b82412455 - _ZN3std9panicking3try7do_call17h968da37b516bd463E.llvm.6120476831782816583
  12:     0x562b8272882e - __rust_maybe_catch_panic
                        at libpanic_unwind/lib.rs:102
  13:     0x562b824157fa - <F as alloc::boxed::FnBox<A>>::call_box::hb783e783e73f8fc2
  14:     0x562b8272076b - <alloc::boxed::Box<alloc::boxed::FnBox<A, Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h598f9713c9cb9093
                        at /checkout/src/liballoc/boxed.rs:798
                         - std::sys_common::thread::start_thread::h65eb43a1201d41e6
                        at libstd/sys_common/thread.rs:24
                         - std::sys::unix::thread::Thread::new::thread_start::h711c51a13a158afa
                        at libstd/sys/unix/thread.rs:90
  15:     0x7f0749c81063 - start_thread
  16:     0x7f07497a062c - clone
  17:                0x0 - <unknown>)))
ERR! conduit_proxy::control "controller-client", controller error: Error attempting to establish underlying session layer: operation timed out after 3s
WARN trust_dns_proto::dns_handle error notifying wait, possible future leak: Err(ResolveError(Proto(Timeout), (None, stack backtrace:
   0:     0x562b82671e0c - backtrace::backtrace::trace::hef23eb1bdedb9de4
   1:     0x562b826710a2 - backtrace::capture::Backtrace::new::h570f24a892cd6a34
   2:     0x562b8263b917 - <trust_dns_proto::error::ProtoError as core::convert::From<trust_dns_proto::error::ProtoErrorKind>>::from::h043748def247281b
   3:     0x562b8262fec2 - _ZN110_$LT$trust_dns_proto..dns_handle..DnsFuture$LT$S$C$$u20$E$C$$u20$MF$GT$$u20$as$u20$futures..future..Future$GT$4poll17h37890051da49b999E.llvm.17885337389023634268
   4:     0x562b825ffbd0 - <futures::future::chain::Chain<A, B, C>>::poll::h046c15f0cba805b8
   5:     0x562b82627667 - <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll::hbeb65d90d09deaf4
   6:     0x562b826c5fbf - futures::task_impl::std::set::h0eda14c4de187820
   7:     0x562b826c344f - <scoped_tls::ScopedKey<T>>::set::h0efdea05668e906c
   8:     0x562b826b9b7f - tokio_core::reactor::Core::poll::hfaacf20236fbcc81
   9:     0x562b8249e643 - tokio_core::reactor::Core::run::hdb7e20c27c731447
  10:     0x562b824deda8 - std::sys_common::backtrace::__rust_begin_short_backtrace::h808bc1a152c1d7fa
  11:     0x562b82412455 - _ZN3std9panicking3try7do_call17h968da37b516bd463E.llvm.6120476831782816583
  12:     0x562b8272882e - __rust_maybe_catch_panic
                        at libpanic_unwind/lib.rs:102
  13:     0x562b824157fa - <F as alloc::boxed::FnBox<A>>::call_box::hb783e783e73f8fc2
  14:     0x562b8272076b - <alloc::boxed::Box<alloc::boxed::FnBox<A, Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h598f9713c9cb9093
                        at /checkout/src/liballoc/boxed.rs:798
                         - std::sys_common::thread::start_thread::h65eb43a1201d41e6
                        at libstd/sys_common/thread.rs:24
                         - std::sys::unix::thread::Thread::new::thread_start::h711c51a13a158afa
                        at libstd/sys/unix/thread.rs:90
  15:     0x7f0749c81063 - start_thread
  16:     0x7f07497a062c - clone
  17:                0x0 - <unknown>)))
ERR! conduit_proxy turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500

from linkerd2.

olix0r avatar olix0r commented on June 26, 2024

We just updated our trust-dns dependency at the end of last week, so we'll retest this and work with upstream to address the issue if it still exists.

from linkerd2.

wmorgan avatar wmorgan commented on June 26, 2024

Great to know. Thanks @jakerobb

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

cassandra01. 3600 IN A 10.0.7.14

The most likely reason it failed: Maybe Trust-DNS won't fall back to resolving the single-label unqualified name X as "X." after it tries everything else in the search path? I suggest we add this to Trust-DNS's unit tests and see what happens.

from linkerd2.

wmorgan avatar wmorgan commented on June 26, 2024

Maybe @bluejekyll knows?

from linkerd2.

bluejekyll avatar bluejekyll commented on June 26, 2024

The issue @briansmith is mentioning around search order and ndots was fixed in both 0.8.2 and 0.9 of the Resolver.

There is another issue of NameServer starvation that was only fixed in 0.9: hickory-dns/hickory-dns#457

Which version are you on currently?

from linkerd2.

hawkw avatar hawkw commented on June 26, 2024

@bluejekyll this error was observed using 0.8.2 (I checked Cargo.lock and it looks like we were pulling in hickory-dns/hickory-dns@ce6952c).

I'm planning on doing some testing and seeing if this also occurs with trust-dns-resolver 0.9+.

from linkerd2.

hawkw avatar hawkw commented on June 26, 2024

Interesting, after updating to trust-dns-resolver I now see

ERR! admin={bg=resolver} conduit_proxy::control controller error: Error attempting to establish underlying session layer: operation timed out after 3s
WARN trust_dns_proto::xfer error notifying wait, possible future leak: Err(ResolveError { inner: ProtoError { inner:  request timed out })
ERR! admin={bg=resolver} conduit_proxy::control controller error: Error attempting to establish underlying session layer: operation timed out after 3s
WARN trust_dns_proto::xfer error notifying wait, possible future leak: Err(ResolveError { inner: ProtoError { inner: request timed out })

in the logs from a pod with

    spec:
      dnsPolicy: "Default"
      containers:
      - ...

curling an external name fails with HTTP 500, but it appears after upgrading trust-dns-resolver, the proxy can no longer resolve cluster-local names with dnsPolicy: Default either.

dnsPolicy: ClusterFirst still works fine with the latest trust-dns-resolver.

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

We need to put #1032 into the next release and then retest this. I think #1032 will make our DNS stuff much more reasonable to think about, which is why it was prioritized.

from linkerd2.

hawkw avatar hawkw commented on June 26, 2024

@briansmith the results above are from a build of Conduit with #1032.

I'm planning on doing some additional digging into this issue.

from linkerd2.

hawkw avatar hawkw commented on June 26, 2024

Nevermind, just re-ran the tests with the master build of conduit, looks like cluster-local names were always broken with custom dnsPolicy configurations, prior to the trust-dns-resolver update. I must have failed to document that when I was doing the earlier testing.

from linkerd2.

briansmith avatar briansmith commented on June 26, 2024

cluster-local names were always broken

To clarify, they weren't working yet for custom configurations that aren't using ClusterFirst policy, so this issue isn't resolved yet. They've been working fine for in the default DNS configuration.

from linkerd2.

hawkw avatar hawkw commented on June 26, 2024

@briansmith Yes, that's correct. Updated my original comment to make that clearer.

from linkerd2.

hawkw avatar hawkw commented on June 26, 2024

@bluejekyll
Thanks for following up & looking into the K8s docs!

Are these settings being passed directly into trust-dns-resolver config? meaning, is there some ordering required in the NameServers in the Resolver instance?

This is in fact, what I was going to start looking into next.

from linkerd2.

hawkw avatar hawkw commented on June 26, 2024

@bluejekyll I did the test again using a custom build of trust-dns-resolver the sort_unstable commented out, and I'm still seeing this error. Will keep digging.

from linkerd2.

wmorgan avatar wmorgan commented on June 26, 2024

Is this issue still relevant?

from linkerd2.

grampelberg avatar grampelberg commented on June 26, 2024

We're not bypassing DNS any longer and statefulsets work properly now.

from linkerd2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.