Hi, I've been running with an older image for quite a while successf

Hello, I'm a user of Triton and the downstream user of <a class="use

Alright, here's the curl story: <div class="snippet-clipboard-content notranslate

Ugh. Seems like some corruption is happening here as well. <a target="_blank" rel=

Slow download, very small packet sizes,about ipxe/ipxe

Comments (37)

goekesmi commented on September 25, 2024 1

Hello,

I'm a user of Triton and the downstream user of @danmcd mentioned above with the "woodchipper" which I've been running tests through.

The test system is a Dell OptiPlex 980. It was, during this debugging, updated to BIOS version A18. It has been configured in BIOS to use PXE boot, and has no UEFI mode.

During the boot sequence, the following information is displayed, which may help in identifying the exact model of card.

Initializing Intel(R) Boot Agent GE v1.4.10
PXE 2.1 Build 092 (WIM 2.0)
Intel(R) Boot Agent GE v1.4.10
Copyright (C) 1997-2012, Intel Corporation
CLIENT MAC ADDR: 84 2B 2B A5 BF 3A GUID: 44454C4C 3900 1038 8051 B8C04F4D4831
CLIENT IP: 172.20.47.200 MASK: 255.255.255.0 DHCP IP: 172.20.47.3
PXE-›EB: !PXE at 9929:0070, entry point at 9929:0106
UNDI code segment 9929:5750, data segment 92EB:63BO (587-635kB)
UNDI device is PCI 00:19.0, type DIX+802.3 
587kB free base memory after PXE unload
iPXE initialising devices..._

A packet trace of a failed boot using the default for Triton version of undionly is here https://manta.matrix.msu.edu/goekesmi/public/iPXE-debug/2024-0520-0001/AgentSmith-3a.2024-0502.undionly-d0c0252a89de00b943aa2017c39c204b.snoop

A packet trace of a successful boot using a variant that @danmcd provided that backs out 2d180ce is available at https://manta.matrix.msu.edu/goekesmi/public/iPXE-debug/2024-0520-0001/AgentSmith-3a.ipxe-256k-tcp-buffer-undionly.kpxe-610c7eaf10dd2e585671fae58afc1577-bootsequence.snoop

The capture was done with a mirror port from a switch so temporal packet reordering is possible in the trace, along with the occasional dropped packet.

The specific build versions and options @danmcd can speak to. The embedded md5 hashes in the file names refer to the undionly.pxe version that was used for that boot and packet trace.

Hope this helps.

from ipxe.

NiKiZe commented on September 25, 2024

Please use current master, you will have to build it yourself, you can also use the ones from boot.ipxe.org, but for any debugging to be done you will need to be able to modify and build new versions.

Could you please dump the http headers you get from the server?

from ipxe.

ctheune commented on September 25, 2024

Please use current master, you will have to build it yourself, you can also use the ones from boot.ipxe.org, but for any debugging to be done you will need to be able to modify and build new versions.

Will do.

Could you please dump the http headers you get from the server?

As it's https I can give you the headers from a curl call to the same URL. Is that what you want?

from ipxe.

NiKiZe commented on September 25, 2024

You could start with what curl shows you, but really want what iPXE gets.

from ipxe.

ctheune commented on September 25, 2024

Alright, here's the curl story:

curl -v https://hydra.flyingcircus.io/channels/installer/dev/initrd -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying [2a02:238:f030:102::1068]:443...
* Connected to hydra.flyingcircus.io (2a02:238:f030:102::1068) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
} [326 bytes data]
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* (304) (IN), TLS handshake, Unknown (8):
{ [19 bytes data]
* (304) (IN), TLS handshake, Certificate (11):
{ [2856 bytes data]
* (304) (IN), TLS handshake, CERT verify (15):
{ [520 bytes data]
* (304) (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* (304) (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=hydra.flyingcircus.io
*  start date: Mar 10 03:33:57 2024 GMT
*  expire date: Jun  8 03:33:56 2024 GMT
*  subjectAltName: host "hydra.flyingcircus.io" matched cert's "hydra.flyingcircus.io"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://hydra.flyingcircus.io/channels/installer/dev/initrd
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: hydra.flyingcircus.io]
* [HTTP/2] [1] [:path: /channels/installer/dev/initrd]
* [HTTP/2] [1] [user-agent: curl/8.4.0]
* [HTTP/2] [1] [accept: */*]
> GET /channels/installer/dev/initrd HTTP/2
> Host: hydra.flyingcircus.io
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/2 200
< server: nginx
< date: Sun, 10 Mar 2024 08:47:30 GMT
< content-type: application/octet-stream
< content-length: 756393569
< last-modified: Sat, 17 Dec 2022 06:02:03 GMT
< etag: "639d5b5b-2d15a661"
< accept-ranges: bytes

This is a public URL, so if you like you can poke it directly for debugging.

How do I get to see the headers that ipxe sees?

from ipxe.

mcb30 commented on September 25, 2024

I ran a quick test downloading that URL with both curl and iPXE just now:

curl: 54.3s
iPXE: 59.0s

so I am unable to reproduce your problem.

Since you have a packet capture: could you please provide the raw .pcapng file? Doesn't need to include the whole download: the first 10 seconds or so should be sufficient to observe the problem.

from ipxe.

ctheune commented on September 25, 2024

Thanks a lot. I'll get a pcap file, could be a couple of days, though as travel is coming up.

from ipxe.

ctheune commented on September 25, 2024

Alright, here's a pcap file (unfortunately it doesn't compress well due to the encryption). Something I noticed while going through it is a high number of duplicate ACKs. I'm not aware of an underling issue in our network here as I can use another host attached to the same network and switch and get the download within 10s which is close to 1GBit which is almost identical to the slowest link on the path.

ipxe-initrd-download.pcap.gz

from ipxe.

ctheune commented on September 25, 2024

Ugh. Seems like some corruption is happening here as well.

The archive itself is intact. I downloaded it using curl on the neighbouring machine. I'm double checking this on other hardware now to see whether this is specific to that one machine.

Edit: actually, I'm going to try a manually compiled current version of ipxe (based on 226531e) first.

from ipxe.

ctheune commented on September 25, 2024

Ok, so this is also happening on a current version. I'm now getting a proper version number reported:

from ipxe.

ctheune commented on September 25, 2024

After trying a couple of times I was able to boot one of the initrds we have available and on that machine, using the same link I got the initrd downloaded within 1 minute. So it's not an issue with the machine itself.

from ipxe.

mcb30 commented on September 25, 2024

Alright, here's a pcap file (unfortunately it doesn't compress well due to the encryption). Something I noticed while going through it is a high number of duplicate ACKs. I'm not aware of an underling issue in our network here as I can use another host attached to the same network and switch and get the download within 10s which is close to 1GBit which is almost identical to the slowest link on the path.

ipxe-initrd-download.pcap.gz

Thanks. The capture file is taken from an interface with some kind of TCP offload enabled, so is not showing the actual packets that went over the wire. For example: packet 115 is shown as being 15994 bytes long, which is longer than an Ethernet jumbo frame. We therefore cannot trust what the capture shows about duplicate ACKs, etc, since we are seeing a resynthesis of a TCP conversation rather than the actual TCP conversation.

Could you try disabling the assorted segmentation offload features on the capture interface via ethtool -K <device> <feature> off and then retry the capture. You can get a feature list for your NIC using ethtool -k <device>.

from ipxe.

ctheune commented on September 25, 2024

Ok, so, this is quite fiddly to setup and I only managed to get an excerpt from the middle of the conversation. It might be that this doesn't yet help, but I think I managed to get a better dump now. I used a router in the middle and set its offloading settings to ethtool -K ethsrv gso off gro off tso off for the duration of the dump.

Looking at the dump in wireshark now only shows packet sizes around 1514, so correct L2 overhead for 1500 link MTU.

I still see messages about reassambled PDUs, though as well as bursts of retransmisions and duplicate acks ...

Any ideas? Let me know if you do need the beginning of the conversation instead.

ipxe.pcap.gz

from ipxe.

mcb30 commented on September 25, 2024

Ok, so, this is quite fiddly to setup and I only managed to get an excerpt from the middle of the conversation. It might be that this doesn't yet help, but I think I managed to get a better dump now. I used a router in the middle and set its offloading settings to ethtool -K ethsrv gso off gro off tso off for the duration of the dump.

Looking at the dump in wireshark now only shows packet sizes around 1514, so correct L2 overhead for 1500 link MTU.

Great, so we can rule out any problem relating to packet sizes.

I still see messages about reassambled PDUs, though as well as bursts of retransmisions and duplicate acks ...

I see normal length packets and ACK RTT times (at the point of the wireshark capture) of <1ms from iPXE. TCP SACK is in use and is working as expected.

I think you're using undionly.kpxe, which means that we have no direct control over the NIC and no visibility into things like RX buffer exhaustion. Are you able to use ipxe.pxe and a NIC for which there exists a native iPXE driver?

from ipxe.

ctheune commented on September 25, 2024

The cards are 61:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) and seem to be natively supported.

I'm using undionly mostly due to (very longterm) historical reasons when I tried to get things working reliably around 10+ years ago ... so this choise is likely cargo cult for now.

I can try using ipxe.pxe - I'm curious whether this might be a driver issue and would resolve itself by switching to the natvie driver ...

from ipxe.

ctheune commented on September 25, 2024

Ok, so I chainloaded into ipxe.pxe and had the impression, that the kernel loaded faster, but the initrd is still as slow at 1% in 10seconds.

I canceled the download and here's the data from the interfaces:

net0: d8:5e:d3:1f:44:58 using i350 on 0000:61:00.0 (Ethernet) [closed]
  [Link:up, TX:15 TXE:0 RX:123 RXE:77]
  [RXE: 36 x "Operation not supported (https://ipxe.org/3c086003)"]
  [RXE: 1 x "Invalid argument (https://ipxe.org/1c056002)"]
  [RXE: 40 x "The socket is not connected (https://ipxe.org/380f6001)"]
net1: d8:5e:d3:1f:44:59 using i350 on 0000:61:00.1 (Ethernet) [open]
  [Link:up, TX:41278 TXE:1 RX:3156520 RXE:3063147]
  [TXE: 1 x "Network unreachable (https://ipxe.org/28086011)"]
  [RXE: 2958 x "Error 0x2a654006 (https://ipxe.org/2a654006)"]
  [RXE: 1639005 x "Operation not supported (https://ipxe.org/3c086003)"]
  [RXE: 1421072 x "Invalid argument (https://ipxe.org/1c056002)"]
  [RXE: 1 x "The socket is not connected (https://ipxe.org/380f6001)"]
net2: 10:70:fd:cb:c8:32 using ConnectX-5 on 0000:c1:00.0 (Ethernet) [closed]
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Unknown (https://ipxe.org/1a086101)]
net3: 10:70:fd:cb:c8:33 using ConnectX-5 on 0000:c1:00.1 (Ethernet) [closed]
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Unknown (https://ipxe.org/1a086101)]

The relevant interface is net1 (or potentially net0 which is the same) ... and ... right in this moment I'm noticing that usually we did boot from net0 and not net1. There was a slight firewall misconfiguration that caused the tftp server not to respond from net0 but on net1. Interestingly ... I now chained this to ipxe.pxe again after adjusting the firewall and the stats now show that I'm downloading from net0 and it's fast now.

net0: d8:5e:d3:1f:44:58 using i350 on 0000:61:00.0 (Ethernet) [open]
  [Link:up, TX:80108 TXE:0 RX:261113 RXE:625]
  [RXE: 207 x "Operation not supported (https://ipxe.org/3c086003)"]
  [RXE: 263 x "The socket is not connected (https://ipxe.org/380f6001)"]
  [RXE: 146 x "Error 0x2a654006 (https://ipxe.org/2a654006)"]
  [RXE: 9 x "Invalid argument (https://ipxe.org/1c056002)"]
net1: d8:5e:d3:1f:44:59 using i350 on 0000:61:00.1 (Ethernet) [closed]
  [Link:up, TX:0 TXE:1 RX:0 RXE:0]
  [TXE: 1 x "Network unreachable (https://ipxe.org/28086011)"]
net2: 10:70:fd:cb:c8:32 using ConnectX-5 on 0000:c1:00.0 (Ethernet) [closed]
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Unknown (https://ipxe.org/1a086101)]
net3: 10:70:fd:cb:c8:33 using ConnectX-5 on 0000:c1:00.1 (Ethernet) [closed]
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Unknown (https://ipxe.org/1a086101)]

This shows much much lower error rates ... I'm 95% sure that this isn't a problem on the actual network it's connected to. I can double check that once I booted.

Consider me puzzled.

from ipxe.

ctheune commented on September 25, 2024

Ah, I chained this again into the undionly.kpxe using the net0 interface and it's fast now as well. So something is weird with the difference between net0/net1. Both are onboard ports:

61:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
61:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

from ipxe.

mcb30 commented on September 25, 2024

Ah, I chained this again into the undionly.kpxe using the net0 interface and it's fast now as well. So something is weird with the difference between net0/net1. Both are onboard ports:
61:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
61:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

Interesting! In the absence of any information to the contrary, I'm going to assume that this is most likely a configuration issue on the network side. If you are able to test that it really does depend on whether the NIC is using port 0 or port 1 (e.g. by physically swapping cables and observing that the slow/fast behaviour can be reproduced the other way round), then we can investigate further.

from ipxe.

ctheune commented on September 25, 2024

Yes. I'm a bit tight on on-hands resources at the moment, so the first thing I can check is whether this also happens in a regular Linux environment. I'm happy to experiment with swapping the cables in a few days.

from ipxe.

ctheune commented on September 25, 2024

So, within Linux on the same machine downloading over the two interfaces shows no differences. I'll try with switches cables in a couple of days.

from ipxe.

NiKiZe commented on September 25, 2024

Are the 2 interfaces connected to identically configured ports? Is there any LCAP or other group functions enabled on the ports? STP configuration?

from ipxe.

ctheune commented on September 25, 2024

Both are connected to identical switches, no LACP or other functions enabled. The faster network has a bit less traffic on the router (both area 1 switch away from the same router) but either are 1g interfaces that aren't fully utilized either way.

from ipxe.

danmcd commented on September 25, 2024

Commenting here just so I can follow along. We've seen this when updating the Triton Data Center version of iPXE to March 2024 (ending with upstream commit 926816c) from October 2023 (ending with upstream commit 8b14652). I can detail the our-own commits and what-not if need be, but we are seeing problem with the most recent merge, and bisection has not helped us a lot in digging into the problem as of yet.

from ipxe.

danmcd commented on September 25, 2024

I've added a packet trace of the very slow http download of our "unix" binary here: https://kebe.com/~danmcd/webrevs/2440-variants/httpboot-stock-failed-only.snoop . This was captured by one of our community members.

from ipxe.

danmcd commented on September 25, 2024

I've been able to do some testing locally. I'm seeing very slow http downloads with a recent merge with our upstream. Given the 2MB window size in the snoop I linked above, I considered undoing this commit:

2d180ce

and the resultant undionly.kpxe appears to be noticeably faster on downloading our 3.5MB boot archives.

I believe this is a problem EXCLUSIVELY with undionly.kpxe. I have other methods of iPXE booting in deployment on my test cloud: EFI netboot chain to snponly,kpxe, and off-disk ipxe.lkrn. Both of these have NO issues with the larger 2MB max buffer size.

I get the feeling undionly is special for some reason. The community member who has set up the "woodchipper" to confirm/deny things won't be back until Monday. I'll report back here with the woodchipper's results.

Those who have this problem ( @ctheune ) who can recompile undionly.kpxe with the max window size shrunk back down to 256k, please try it, and see if it helps.

I do think the window size is exposing an undionly.kpxe problem, not causing it, given my positive experience with other iPXE binary artifacts.

from ipxe.

NiKiZe commented on September 25, 2024

Makes sense that this is an issue withe the underlaying UNDI stack, maybe more info on which NICs and ROMs/BIOSes this happens for needs to be collected, because it isn't on every device?

from ipxe.

danmcd commented on September 25, 2024

I'm not 100% sure about every-device or not, because when I first heard of this bug I had not noticed the severe slowdown in my otherwise successful boot_archive download. There may be others out there who are experiencing a problem without having outright failure occur, so they dismiss it.

I will make sure I gather information on my slowed-down-but-not-failed one, as will anyone else from Triton-land with failures or slowdown who can gather that as well.

from ipxe.

danmcd commented on September 25, 2024

So for my Supermicro, Xeon E5 v3 (Haswell), booting off of the Intel X540:

BIOS version: Supermicro X10DRU-i+ BIOS Date:01/29/2020 Rev:3.2a
Other boot-screen version: Version 2.17.1249
IPMI/BMC verision: BMC Firmware Revision 3.88
Network boot information: Network:IBA XE Slot 0100 v2346

An interesting data point for me is that I have "DUAL" boot support selected but also have "LEGACY to EFI support" DISABLED.

NOTE that this machine does boot, but MUCH MORE SLOWLY with a 2MiB TCP max buffer, vs. the 256KiB max buffer.

from ipxe.

mcb30 commented on September 25, 2024

Thanks for the packet trace. I have a working theory as to what may be happening.

Using undionly is known to be slower than using a native driver, and so packet drops due to receiver overruns are much more likely than with a native driver. This is exercising portions of the TCP RX queue management that don't normally get much use.

A 2MB TCP window is necessary in order to get close to expected throughput on a modern network (as documented in the commit message for commit 2d180ce). However, this 2MB window is now larger than iPXE's internal heap, which is limited to 512kB. There are good reasons for keeping the heap size small: not least of these is that in some boot scenarios (such as iSCSI boot under BIOS) any memory used by iPXE is lost to the operating system.

With a 2MB TCP window, a 512kB heap, and a high rate of packet loss, there will inevitably be scenarios in which iPXE is forced to discard packets from the TCP receive queue, i.e. to "renege" in the terminology as used by RFC 2018. This is permitted by the RFC, but is expected to cause the overall behaviour to fall back to relying upon a retransmission timer on the sender side, which will degrade the performance back to roughly what it would have been without SACK. This is a significant degradation: as noted in commit e0fc8fe78, the improvement from adding SACK was in the region of 400%-700% of throughput increase.

It would be interesting to try undionly.kpxe in the known-bad setup with a single modification: change HEAP_SIZE (in core/malloc.c) from 512kB to 8MB, which would definitely be sufficient to hold a 2MB receive window (even after allowing for inefficiencies due to buffer alignment).

I am also noticing some oddities in the SACK values shown in the packet capture. For example, for two consecutive ACKs sent by iPXE:

#2223: ACK=5573629 SACK=5583765-5664317
#2227: ACK=5573629 SACK=5786325-5789221 5776189-5781981 5766589-5771845
.....
#2557: ACK=5573629 SACK=6301493-6304389

i.e. we seem to have discarded ("reneged upon") some packets from earlier on in the TCP receive queue, rather than dropping the later packets. This is not how iPXE is supposed to behave: under memory pressure, the TCP cache discarder (in tcp_discard()) should be discarding packets higher up in the sequence space first.

I will check the behaviour of the TCP cache discarder. In the meantime, @danmcd @goekesmi could you please carry out the test with the 8MB HEAP_SIZE and report back?

from ipxe.

danmcd commented on September 25, 2024

Rebuilding the Triton iPXE from 20240502/master but with this:

diff --git a/src/core/malloc.c b/src/core/malloc.c
index 8499ab45..03bb683b 100644
--- a/src/core/malloc.c
+++ b/src/core/malloc.c
@@ -103,8 +103,9 @@ size_t maxusedmem;
  * Heap size
  *
  * Currently fixed at 512kB.
+ * ... XXX KEBE SAYS TRY 8MiB instead
  */
-#define HEAP_SIZE ( 512 * 1024 )
+#define HEAP_SIZE ( 8 * 1024 * 1024 )
 
 /** The heap itself */
 static char heap[HEAP_SIZE] __attribute__ (( aligned ( __alignof__(void *) )));

@goekesmi ==> Same kebe.com location, but file is 8m-heap-undionly.kpxe. (MD5 == 79223488ec603400a2c638bd47b5f2dd)

from ipxe.

danmcd commented on September 25, 2024

For me and my supermicro it was faster than stock, but after an initial smooth burst it trickled to a much slower transfer rate.

I should probably recapture TCP snoops on all three scenarios: Stock 20240502, reversion of max buffer, and 8MiB heap. Can't do that this moment, but hope to in the next 24-48 hours (sooner if I'm lucky).

from ipxe.

danmcd commented on September 25, 2024

Okay I took snoops of three variants:

1.) "stock" == The current iPXE downstream in TritonDataCenter. Last merged with upstream commit:

    commit 926816c58fca5641b17c17379b52203458081668
        [efi] Pad transmit buffer length to work around vendor driver bugs

2.) "8m" == Stock, but with the heap sized raised to 8MiB.

3.) "256k" == Stock, but with the TCP max buffer size revered to 256KiB.

The winner in my environment is still "256k" by a long shot. Here are the highlights:

[root@moe (kebecloud) /zones/root/ipxe-captures]# snoop -t a -r -i stock.snoop dst port 80 | egrep "Syn |Fin "
  1 21:12:45.02002 192.168.4.85 -> 192.168.4.6  TCP D=80 S=23157 Syn Seq=739949321 Len=0 Win=65532 Options=<nop,nop,tstamp 77832160 0,nop,nop,sackOK,nop,wscale 9,mss 1460>
15536 21:13:17.49287 192.168.4.85 -> 192.168.4.6  TCP D=80 S=23157 Fin Ack=1631200839 Seq=739949990 Len=0 Win=4096 Options=<nop,nop,tstamp 77864864 490203224>
[root@moe (kebecloud) /zones/root/ipxe-captures]# snoop -t a -r -i 8m.snoop dst port 80 | egrep "Syn |Fin "
  1 21:30:33.26264 192.168.4.85 -> 192.168.4.6  TCP D=80 S=45701 Syn Seq=1895842926 Len=0 Win=65532 Options=<nop,nop,tstamp 78920632 0,nop,nop,sackOK,nop,wscale 9,mss 1460>
19247 21:30:56.21511 192.168.4.85 -> 192.168.4.6  TCP D=80 S=45701 Fin Ack=150344437 Seq=1895843595 Len=0 Win=4096 Options=<nop,nop,tstamp 78943760 491262073>
[root@moe (kebecloud) /zones/root/ipxe-captures]# snoop -t a -r -i 256k.snoop dst port 80 | egrep "Syn |Fin "
  1 21:37:19.65470 192.168.4.85 -> 192.168.4.6  TCP D=80 S=43986 Syn Seq=836941936 Len=0 Win=65532 Options=<nop,nop,tstamp 79335480 0,nop,nop,sackOK,nop,wscale 9,mss 1460>
3399 21:37:23.19284 192.168.4.85 -> 192.168.4.6  TCP D=80 S=43986 Fin Ack=2420103357 Seq=836942605 Len=0 Win=512 Options=<nop,nop,tstamp 79338672 491649103>
[root@moe (kebecloud) /zones/root/ipxe-captures]# ls
256k.snoop   8m.snoop     stock.snoop
[root@moe (kebecloud) /zones/root/ipxe-captures]# ls -lt ; digest -a md5 *
total 225187
-rw-r--r--   1 root     root     18829508 May 21 21:37 256k.snoop
-rw-r--r--   1 root     root     59642089 May 21 21:30 8m.snoop
-rw-r--r--   1 root     root     36502423 May 21 21:13 stock.snoop
(256k.snoop) = 56e7cdc3f1a2c5bcde98b5f46b366a93
(8m.snoop) = b484588e7a792cddc6f5cac18852e20b
(stock.snoop) = 3653b0c04e0cd3cc445ea3305a0ebdf1
[root@moe (kebecloud) /zones/root/ipxe-captures]#

Note the 32sec for stock, 23sec for 8m heap, and 3.5sec for 256k max tcp buffer size.

These snoops are available in https://kebe.com/~danmcd/webrevs/2440-variants/ for download.

from ipxe.

danmcd commented on September 25, 2024

Can/should we have HEAP_SIZE and TCP_MAX_WINDOW_SIZE be configurable in src/config/local/general.h ?

from ipxe.

mcb30 commented on September 25, 2024

Can/should we have HEAP_SIZE and TCP_MAX_WINDOW_SIZE be configurable in src/config/local/general.h ?

No, that's definitely not a solution I'd accept. Those aren't meaningful user configuration choices, and making them configurable would just be papering over the problem and putting the burden onto all future users to guess what the "correct" values might happen to be for their use case.

from ipxe.

goekesmi commented on September 25, 2024

To my great surprise, The test node using the 8m heap variant did boot. It was slow, and inconsistent on transfer speeds, but it did complete the boot. Twice. Which is all I have tested it.

Packet capture of the boot at https://manta.matrix.msu.edu/goekesmi/public/iPXE-debug/2024-0522-0001/AgentSmith-3a.8m-heap-undionly.kpxe-79223488ec603400a2c638bd47b5f2dd-bootsequence.snoop

from ipxe.

danmcd commented on September 25, 2024

This matches my experience (See my timings above).

from ipxe.

ctheune commented on September 25, 2024

I'm sorry that I wasn't able to follow up on this. I'm currently swamped with other tasks. I still have this open and might be able to get more insight in a few months.

from ipxe.

Slow download, very small packet sizes about ipxe HOT 37 OPEN

Comments (37)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent