Giter Club home page Giter Club logo

Comments (20)

maxi0604 avatar maxi0604 commented on May 27, 2024 2

This should now be fixed in the new version 2024_03_26.4988e2b.

As the Arch Linux maintainer just happened to merge a change two hours ago, I guess you'll get an updated package for Arch rather soon.

I've flagged the package in the Arch repository, thanks for the quick fix everyone

from podman.

dgibson avatar dgibson commented on May 27, 2024 1

@KirilMihaylov , which pasta version do you have installed? There was a DNS related issue fixed recently, which you might be seeing.

from podman.

maxi0604 avatar maxi0604 commented on May 27, 2024 1

The update has been released and works, closing this

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

Faking a --pcap option in pasta, for wget google.com I see a RST from the container just after the request and a TCP window update from pasta, see frame 19 below:

$ tshark -r /tmp/hack.pcap
    1   0.000000           :: → ff02::16     ICMPv6 110 Multicast Listener Report Message v2
    2   0.520126           :: → ff02::1:ff00:2 ICMPv6 86 Neighbor Solicitation for 2a01:4f8:222:904::2
    3   0.584131           :: → ff02::16     ICMPv6 110 Multicast Listener Report Message v2
    4   0.616134           :: → ff02::1:ffdb:35eb ICMPv6 86 Neighbor Solicitation for fe80::6417:aaff:fedb:35eb
    5   1.640101 fe80::6417:aaff:fedb:35eb → ff02::16     ICMPv6 110 Multicast Listener Report Message v2
    6   1.640122 fe80::6417:aaff:fedb:35eb → ff02::2      ICMPv6 70 Router Solicitation from 66:17:aa:db:35:eb
    7   2.600041 fe80::6417:aaff:fedb:35eb → ff02::16     ICMPv6 110 Multicast Listener Report Message v2
    8   4.353807 66:17:aa:db:35:eb → Broadcast    ARP 42 Who has 88.198.0.161? Tell 88.198.0.164
    9   4.353843 ASRockIn_8e:d7:b6 → 66:17:aa:db:35:eb ARP 42 88.198.0.161 is at a8:a1:59:8e:d7:b6
   10   4.353851 88.198.0.164 → 185.12.64.1  DNS 70 Standard query 0x495e A google.com
   11   4.353855 88.198.0.164 → 185.12.64.1  DNS 70 Standard query 0x30d9 AAAA google.com
   12   4.354172  185.12.64.1 → 88.198.0.164 DNS 98 Standard query response 0x30d9 AAAA google.com AAAA 2a00:1450:4001:829::200e
   13   4.354183  185.12.64.1 → 88.198.0.164 DNS 86 Standard query response 0x495e A google.com A 142.250.184.206
   14   4.354416 88.198.0.164 → 142.250.184.206 TCP 74 44824 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM TSval=262450325 TSecr=0 WS=4096
   15   4.359743 142.250.184.206 → 88.198.0.164 TCP 62 80 → 44824 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=61440 WS=256
   16   4.359749 88.198.0.164 → 142.250.184.206 TCP 54 44824 → 80 [ACK] Seq=1 Ack=1 Win=65536 Len=0
   17   4.359788 88.198.0.164 → 142.250.184.206 HTTP 127 GET / HTTP/1.1 
   18   4.359813 142.250.184.206 → 88.198.0.164 TCP 54 [TCP Window Update] 80 → 44824 [<None>] Seq=1 Win=65280 Len=0
   19   4.359817 88.198.0.164 → 142.250.184.206 TCP 54 44824 → 80 [RST, ACK] Seq=2622007341 Ack=1 Win=0 Len=0
   20   4.568208 88.198.0.164 → 142.250.184.206 TCP 127 [TCP Retransmission] 44824 → 80 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=73
   21   5.000035 88.198.0.164 → 142.250.184.206 TCP 127 [TCP Retransmission] 44824 → 80 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=73
   22   5.736172 fe80::6417:aaff:fedb:35eb → ff02::2      ICMPv6 70 Router Solicitation from 66:17:aa:db:35:eb
   23   5.864084 88.198.0.164 → 142.250.184.206 TCP 127 [TCP Retransmission] 44824 → 80 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=73
   24   6.980008 88.198.0.164 → 142.250.184.206 TCP 54 44824 → 80 [FIN, ACK] Seq=74 Ack=1 Win=65536 Len=0
   25   7.560166 88.198.0.164 → 142.250.184.206 TCP 127 [TCP Retransmission] 44824 → 80 [FIN, PSH, ACK] Seq=1 Ack=1 Win=65536 Len=73
   26  11.112077 88.198.0.164 → 142.250.184.206 TCP 127 [TCP Retransmission] 44824 → 80 [FIN, PSH, ACK] Seq=1 Ack=1 Win=65536 Len=73
   27  13.928009 fe80::6417:aaff:fedb:35eb → ff02::2      ICMPv6 70 Router Solicitation from 66:17:aa:db:35:eb

...if I enter the target namespace and capture traffic from there, I don't see that segment, though:

10:47:34.893690 IP (tos 0x0, ttl 64, id 46395, offset 0, flags [DF], proto TCP (6), length 60)
    10.89.0.9.39992 > 142.250.185.110.80: Flags [S], cksum 0x52f9 (incorrect -> 0xde27), seq 3706848507, win 64240, options [mss 1460,sackOK,TS val 1860972213 ecr 0,nop,wscale 12], length 0
10:47:34.898732 IP (tos 0x0, ttl 254, id 0, offset 0, flags [none], proto TCP (6), length 48)
    142.250.185.110.80 > 10.89.0.9.39992: Flags [S.], cksum 0x02cf (correct), seq 873828756, ack 3706848508, win 65535, options [mss 61440,nop,wscale 8], length 0
10:47:34.898756 IP (tos 0x0, ttl 64, id 46396, offset 0, flags [DF], proto TCP (6), length 40)
    10.89.0.9.39992 > 142.250.185.110.80: Flags [.], cksum 0x52e5 (incorrect -> 0x18d8), ack 1, win 16, length 0
10:47:34.898783 IP (tos 0x0, ttl 64, id 46397, offset 0, flags [DF], proto TCP (6), length 113)
    10.89.0.9.39992 > 142.250.185.110.80: Flags [P.], cksum 0x532e (incorrect -> 0x1f9b), seq 1:74, ack 1, win 16, length 73: HTTP, length: 73
	GET / HTTP/1.1
	Host: google.com
	User-Agent: Wget
	Connection: close

Is it from netfilter? It doesn't look like netavark is configuring anything that might lead to that:

# nft list ruleset
table ip nat {
	chain NETAVARK-F7FBBA6E0636F {
		ip daddr 10.89.0.0/24 counter packets 0 bytes 0 accept
		ip daddr != 224.0.0.0/4 counter packets 1 bytes 60 # xt_MASQUERADE
	}

	chain POSTROUTING {
		type nat hook postrouting priority srcnat; policy accept;
		counter packets 8 bytes 512 jump NETAVARK-HOSTPORT-MASQ
		ip saddr 10.89.0.0/24 counter packets 2 bytes 100 jump NETAVARK-F7FBBA6E0636F
	}

	chain NETAVARK-HOSTPORT-SETMARK {
		counter packets 0 bytes 0 # xt_MARK
	}

	chain NETAVARK-HOSTPORT-MASQ {
		# xt_comment meta mark & 0x00002000 == 0x00002000 counter packets 0 bytes 0 # xt_MASQUERADE
	}

	chain NETAVARK-HOSTPORT-DNAT {
	}

	chain PREROUTING {
		type nat hook prerouting priority dstnat; policy accept;
		# xt_addrtype counter packets 1 bytes 56 jump NETAVARK-HOSTPORT-DNAT
	}

	chain OUTPUT {
		type nat hook output priority -100; policy accept;
		# xt_addrtype counter packets 0 bytes 0 jump NETAVARK-HOSTPORT-DNAT
	}
}
table ip filter {
	chain NETAVARK_FORWARD {
		ip daddr 10.89.0.0/24 # xt_conntrack counter packets 1 bytes 48 accept
		ip saddr 10.89.0.0/24 counter packets 11 bytes 1117 accept
	}

	chain FORWARD {
		type filter hook forward priority filter; policy accept;
		# xt_comment counter packets 12 bytes 1165 jump NETAVARK_FORWARD
	}
}

from podman.

Luap99 avatar Luap99 commented on May 27, 2024

@maxi0604 Is this ipv4 or ipv6 traffic that is not working? I only have access to ipv4 systems so I cannot test v6.
Does it work with with --network pasta?

For the cutsom rootless network case the setup is more complicated with involves both pasta and netavark so it is not easy to tell where things go wrong. You can enter our rootless netns with podman unshare --rootless-netns and then there if you have the container running you should see both the pasta interface (i.e. should have the same name as your external interface) and the podman/netavark bridge interface (called podmanX).
So in order to do a full package dump you run something like this podman unshare --rootless-netns tcpdump -nn -i any in another terminal and then try to run your reproducer again, then we should see where the packages are getting lost.

from podman.

maxi0604 avatar maxi0604 commented on May 27, 2024

@Luap99

Is this ipv4 or ipv6 traffic that is not working? I only have access to ipv4 systems so I cannot test v6.

I've explicitly tested both and both show the same hang. If the network was not created with --ipv6, then trying an ipv6-only connection does fail fast as expected.

Does it work with with --network pasta?

Yes, that seems to work with v4 and v6.

For the cutsom rootless network case the setup is more complicated with involves both pasta and netavark so it is not easy to tell where things go wrong. You can enter our rootless netns with podman unshare --rootless-netns and then there if you have the container running you should see both the pasta interface (i.e. should have the same name as your external interface) and the podman/netavark bridge interface (called podmanX). So in order to do a full package dump you run something like this podman unshare --rootless-netns tcpdump -nn -i any in another terminal and then try to run your reproducer again, then we should see where the packages are getting lost.

$ podman unshare --rootless-netns tcpdump -nn -i any                                                                                                                                                            
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
16:26:26.973408 veth0 B   ARP, Request who-has 10.89.2.2 tell 10.89.2.2, length 28
16:26:26.978505 podman3 Out IP 10.89.2.1 > 224.0.0.22: igmp v3 report, 1 group record(s)
16:26:26.978516 veth0 Out IP 10.89.2.1 > 224.0.0.22: igmp v3 report, 1 group record(s)
16:26:26.981891 veth0 Out IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 2 group record(s), length 48
16:26:26.981920 podman3 Out IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 3 group record(s), length 68
16:26:26.981941 veth0 Out IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 3 group record(s), length 68
16:26:26.981956 veth0 M   IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:26.981967 podman3 M   IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:26.991893 veth0 Out IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 2 group record(s), length 48
16:26:27.048404 podman3 Out IP 10.89.2.1 > 224.0.0.22: igmp v3 report, 1 group record(s)
16:26:27.048422 veth0 Out IP 10.89.2.1 > 224.0.0.22: igmp v3 report, 1 group record(s)
16:26:27.121090 veth0 B   ARP, Request who-has 10.89.2.1 tell 10.89.2.2, length 28
16:26:27.121098 podman3 B   ARP, Request who-has 10.89.2.1 tell 10.89.2.2, length 28
16:26:27.121123 podman3 Out ARP, Reply 10.89.2.1 is-at ea:97:e8:10:7f:5f, length 28
16:26:27.121128 veth0 Out ARP, Reply 10.89.2.1 is-at ea:97:e8:10:7f:5f, length 28
16:26:27.121138 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [S], seq 507767311, win 32120, options [mss 1460,sackOK,TS val 307039792 ecr 0,nop,wscale 7], length 0
16:26:27.121140 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [S], seq 507767311, win 32120, options [mss 1460,sackOK,TS val 307039792 ecr 0,nop,wscale 7], length 0
16:26:27.121190 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [S], seq 507767311, win 32120, options [mss 1460,sackOK,TS val 307039792 ecr 0,nop,wscale 7], length 0
16:26:27.129336 wlan0 In  IP 142.250.74.206.80 > 172.17.61.166.56748: Flags [S.], seq 2288725518, ack 507767312, win 65535, options [mss 61440,nop,wscale 8], length 0
16:26:27.129373 podman3 Out IP 142.250.74.206.80 > 10.89.2.2.56748: Flags [S.], seq 2288725518, ack 507767312, win 65535, options [mss 61440,nop,wscale 8], length 0
16:26:27.129378 veth0 Out IP 142.250.74.206.80 > 10.89.2.2.56748: Flags [S.], seq 2288725518, ack 507767312, win 65535, options [mss 61440,nop,wscale 8], length 0
16:26:27.129408 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [.], ack 1, win 251, length 0
16:26:27.129410 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [.], ack 1, win 251, length 0
16:26:27.129421 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [.], ack 1, win 251, length 0
16:26:27.129499 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:27.129502 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:27.129566 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:27.129707 wlan0 In  IP 142.250.74.206.80 > 172.17.61.166.56748: Flags [none], win 255, length 0
16:26:27.129738 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [R.], seq 3787199985, ack 1, win 0, length 0
16:26:27.261774 veth0 M   IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:27.261789 podman3 M   IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:27.341741 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:27.341750 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:27.341802 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:27.368433 veth0 Out IP6 :: > ff02::1:ff4f:949a: ICMP6, neighbor solicitation, who has fe80::5cce:1cff:fe4f:949a, length 32
16:26:27.528188 podman3 Out IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 3 group record(s), length 68
16:26:27.528206 veth0 Out IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 3 group record(s), length 68
16:26:27.638446 veth0 M   IP6 :: > ff02::1:ff3a:267b: ICMP6, neighbor solicitation, who has fe80::bc6e:c9ff:fe3a:267b, length 32
16:26:27.638456 podman3 M   IP6 :: > ff02::1:ff3a:267b: ICMP6, neighbor solicitation, who has fe80::bc6e:c9ff:fe3a:267b, length 32
16:26:27.718555 podman3 Out IP6 :: > ff02::1:ff10:7f5f: ICMP6, neighbor solicitation, who has fe80::e897:e8ff:fe10:7f5f, length 32
16:26:27.718577 veth0 Out IP6 :: > ff02::1:ff10:7f5f: ICMP6, neighbor solicitation, who has fe80::e897:e8ff:fe10:7f5f, length 32
16:26:27.768217 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:27.768228 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:27.768270 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:28.381582 veth0 Out IP6 fe80::5cce:1cff:fe4f:949a > ff02::16: HBH ICMP6, multicast listener report v2, 3 group record(s), length 68
16:26:28.391525 veth0 Out IP6 fe80::5cce:1cff:fe4f:949a > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:28.528441 veth0 Out IP6 fe80::5cce:1cff:fe4f:949a > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:28.621747 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:28.621755 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:28.621801 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:28.648248 veth0 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:28.648274 podman3 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:28.648309 veth0 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::2: ICMP6, router solicitation, length 16
16:26:28.648311 podman3 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::2: ICMP6, router solicitation, length 16
16:26:28.728483 veth0 Out IP6 fe80::5cce:1cff:fe4f:949a > ff02::16: HBH ICMP6, multicast listener report v2, 3 group record(s), length 68
16:26:28.728515 podman3 Out IP6 fe80::e897:e8ff:fe10:7f5f > ff02::16: HBH ICMP6, multicast listener report v2, 4 group record(s), length 88
16:26:28.728540 veth0 Out IP6 fe80::e897:e8ff:fe10:7f5f > ff02::16: HBH ICMP6, multicast listener report v2, 4 group record(s), length 88
16:26:28.738407 podman3 Out IP6 fe80::e897:e8ff:fe10:7f5f > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:28.738413 veth0 Out IP6 fe80::e897:e8ff:fe10:7f5f > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:29.581568 podman3 Out IP6 fe80::e897:e8ff:fe10:7f5f > ff02::16: HBH ICMP6, multicast listener report v2, 4 group record(s), length 88
16:26:29.581589 veth0 Out IP6 fe80::e897:e8ff:fe10:7f5f > ff02::16: HBH ICMP6, multicast listener report v2, 4 group record(s), length 88
16:26:29.635135 podman3 Out IP6 fe80::e897:e8ff:fe10:7f5f > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:29.635160 veth0 Out IP6 fe80::e897:e8ff:fe10:7f5f > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:29.661510 veth0 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:29.661522 podman3 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:26:30.328262 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:30.328269 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:30.328309 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:32.194847 wlan0 Out ARP, Request who-has 172.17.60.1 tell 172.17.61.166, length 28
16:26:32.194855 podman3 Out ARP, Request who-has 10.89.2.2 tell 10.89.2.1, length 28
16:26:32.194865 veth0 Out ARP, Request who-has 10.89.2.2 tell 10.89.2.1, length 28
16:26:32.194930 veth0 P   ARP, Reply 10.89.2.2 is-at be:6e:c9:3a:26:7b, length 28
16:26:32.194931 wlan0 In  ARP, Reply 172.17.60.1 is-at 40:1a:58:6c:2f:87, length 28
16:26:32.194939 podman3 In  ARP, Reply 10.89.2.2 is-at be:6e:c9:3a:26:7b, length 28
16:26:32.835226 veth0 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::2: ICMP6, router solicitation, length 16
16:26:32.835236 podman3 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::2: ICMP6, router solicitation, length 16
16:26:33.901550 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:33.901560 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:33.901616 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:40.728234 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:40.728244 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:40.728288 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [P.], seq 1:78, ack 1, win 251, length 77: HTTP: GET / HTTP/1.1
16:26:41.158185 veth0 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::2: ICMP6, router solicitation, length 16
16:26:41.158194 podman3 M   IP6 fe80::bc6e:c9ff:fe3a:267b > ff02::2: ICMP6, router solicitation, length 16
16:26:42.336506 veth0 P   IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [F.], seq 78, ack 1, win 251, length 0
16:26:42.336513 podman3 In  IP 10.89.2.2.56748 > 142.250.74.206.80: Flags [F.], seq 78, ack 1, win 251, length 0
16:26:42.336543 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [F.], seq 78, ack 1, win 251, length 0

142.250.74.206:80 is google.com.

I think the IPv6 output is unrelated, the network was created without --ipv6.

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

I don't think there's any packet getting lost, @maxi0604's output is consistent with mine, here is the RST segment:

16:26:27.129738 wlan0 Out IP 172.17.61.166.56748 > 142.250.74.206.80: Flags [R.], seq 3787199985, ack 1, win 0, length 0

I strace'd pasta, and it close()s the "host" socket as it gets this, as expected.

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

[Distractedly thinking about this, sorry for the rain of comments] On a second thought, we can't exclude that the window update frame (18 in my first capture, #22146 (comment)) is seen as somewhat strange by the kernel and that warrants a reset.

The acknowledgement sequence is increased by one compared to the SYN, ACK segment, but the ACK flag is not set (because we want to update the window) -- that should be legitimate but somewhat unusual.

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

Tagging @dgibson in case that rings a bell.

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

Confirmed, the kernel doesn't seem to like (anymore?) a segment that just updates the window, without any flag set, and with the acknowledgement sequence matching the previous one. If I force the ACK flag in pasta, here:

diff --git a/tcp.c b/tcp.c
index a1860d1..7785ab3 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1679,7 +1679,7 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
        } else {
                th->ack = !!(flags & (ACK | DUP_ACK)) ||
                          conn->seq_ack_to_tap != prev_ack_to_tap ||
-                         !prev_wnd_to_tap;
+                         !prev_wnd_to_tap || 1;
        }
 
        th->doff = (sizeof(*th) + optlen) / 4;

then we don't get a reset and wget completes.

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

This smells like a kernel issue to me and we should look into that. Probably reasonable workaround meanwhile: if we just completed the three-way handshake, with a connection started from the tap side (container), reset our own value of the window we sent to the container, in order to force an ACK flag on the next segment (including a possible window update, as it happens here):

diff --git a/tcp.c b/tcp.c
index a1860d1..1135c71 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2629,6 +2629,7 @@ int tcp_tap_handler(struct ctx *c, uint8_t pif, sa_family_t af,
                        goto reset;
 
                conn_event(c, conn, ESTABLISHED);
+               conn->wnd_to_tap = 0;
 
                if (th->fin) {
                        conn->seq_from_tap++;

lightly tested, this seems to work as well.

from podman.

dgibson avatar dgibson commented on May 27, 2024

I don't think it is a kernel issue. @sbrivio-rh pointed out this kernel commit. It states that RFC 793 requires that packets without an ACK be dropped, and my reading of RFC 793 its successors concurs. See for example here.

I think we should be setting ACK on all non-SYN, non-RST packets. What we do for RST packets is a bit more complicated.

Currently trying to figure out how to correct this without excessive churn. I've also filed an upstream pasta bug to track it.

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

While we fix this in pasta and make updated packages available, I tested this nftables-based workaround:

nft 'add chain ip filter input { type filter hook input priority 0; }'
nft add rule filter input 'tcp flags & (syn | rst | ack) == 0 counter drop'

from the target network namespace (for pasta itself).

For some reason podman unshare --rootless-netns didn't bring me there, so I entered it with nsenter -U -n -t $(pidof aardvark-dns).

The idea is to drop any TCP segment that has none of the SYN, RST, and ACK flags set, before some kernel component (we haven't figured that out yet) resets the connection. @dgibson also points out that RFC 9293 says those segments should be discarded, but not that they should cause a reset. This part looks like a kernel issue to me.

from podman.

KirilMihaylov avatar KirilMihaylov commented on May 27, 2024

I can confirm that on 5.0 it is broken with the default bridged network adapter when running on WSL. Unless a custom DNS server is added, e.g. Cloudflare's 1.1.1.1, DNS requests fail.

from podman.

maxi0604 avatar maxi0604 commented on May 27, 2024

I can confirm that on 5.0 it is broken with the default bridged network adapter when running on WSL. Unless a custom DNS server is added, e.g. Cloudflare's 1.1.1.1, DNS requests fail.

This seems different. In my case, DNS and ping work but the actual TCP transfer fails.

from podman.

KirilMihaylov avatar KirilMihaylov commented on May 27, 2024

I'm sorry then. I must have misunderstood the reported issue. My apologies!

from podman.

dgibson avatar dgibson commented on May 27, 2024

I have something I hope is a fix, essentially a polished version of Stefano's suggestion. Unfortunately I haven't been able to test it against the specific problem here, because I wasn't able to reproduce. I don't know quite what's different about my setup, but the wget from an alpine container is working fine for me with podman 5.0.0 and existing pasta binaries.

from podman.

dgibson avatar dgibson commented on May 27, 2024

Ok, tree with the draft fix is here. I believe @sbrivio-rh will be able to make a release, and we can test from there.

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

Unfortunately I haven't been able to test it against the specific problem here, because I wasn't able to reproduce.

I'm able to reproduce the issue reliably, and your series fixes it for me. Testing and releasing now.

I don't know quite what's different about my setup, but the wget from an alpine container is working fine for me with podman 5.0.0 and existing pasta binaries.

I think it's pretty much a combination of two factors, which might be unlikely or impossible to reproduce on some setups: first off we get a slightly different window value from the socket (65280 instead of 65536 bytes in my case) between three-way handshake and just after it, and we reflect it to the container, hence the problematic packet.

Second, we write the HTTP request to the socket, but we don't see it being acknowledged right away (hence no increase of acknowledged sequence and no ACK flag in the problematic packet).

from podman.

sbrivio-rh avatar sbrivio-rh commented on May 27, 2024

This should now be fixed in the new version 2024_03_26.4988e2b.

As the Arch Linux maintainer just happened to merge a change two hours ago, I guess you'll get an updated package for Arch rather soon.

from podman.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.