Comments (11)
By the way, now that I have your attention, I was wondering if you would be willing to have a look at #84? Would be nice to have it merged before its two-year anniversary.
On it!
from ipxe.
I am attaching the console output when reproducing with an iPXE binary built from current Git master with LACP debugging (command line: make bin-x86_64-efi/ipxe.efi DEBUG=eth_slow
). The output contains terminal colour codes, so it's best viewed with less -R
or bat
). With debugging output enabled, the packet flood is reduced to a trickle; presumably the 115200-baud serial console is slowing it down.
from ipxe.
As one would expect, disabling LACP support in iPXE (sed -i /NET_PROTO_LACP/s/define/undef/ config/general.h
) prevents the bug from being triggered.
Disabling LACP might be a sensible default for build targets meant for chainloading, by the way. The initial UEFI/PCBIOS PXE firmware does generally not support LACP anyway, so in order for it to work the switch must support LACP bypass or equivalent, which in turn means that iPXE does not really need LACP support either.
from ipxe.
@toreanderson It looks as though your switch (or some other component below iPXE's network stack) is echoing the LACP packets back to iPXE: you can see from the trace lines such as
SLOW net4 TX LACP actor (ffff,e0:07:1b:7f:52:10,0001,ff,0001) [aFGSCDlx]
SLOW net4 TX LACP partner (ffff,44:38:39:ff:30:bf,000f,ff,0001) [AFGScdLx]
SLOW net4 TX LACP collector 0000 (0 us)
followed immediately by
SLOW net4 RX LACP actor (ffff,e0:07:1b:7f:52:10,0001,ff,0001) [aFGSCDlx]
SLOW net4 RX LACP partner (ffff,44:38:39:ff:30:bf,000f,ff,0001) [AFGScdLx]
SLOW net4 RX LACP collector 0000 (0 us)
i.e. the packet transmitted by iPXE (with TX LACP actor (ffff,e0:07:1b:7f:52:10,0001,ff,0001)
containing iPXE's MAC address) is then immediately received by iPXE.
We should be able to work around this loopback issue by detecting and ignoring looped-back packets. Could you please try the commit in #158?
If it works, then please let me know if I have your name and e-mail address correct in the commit message. Thanks!
from ipxe.
I can confirm that the commit in #158 does resolve the flooding problem.
I did notice, however, that the curious behaviour of iPXE's LACP responses being consistently 14 bytes larger then those originating from the switch remain as before. I am attaching a packet capture (ipxe-fixed.pcap.gz) of the behaviour with #158 applied so you can see for yourself.
I do not believe it is the switch that is responsible for the looped LACP PDUs, as we make extensive use of LACP between servers and these switches have not seen this problem anywhere else outside of iPXE. I was also able to reproduce it in the lab while using different switch hardware than when we saw it happen in production. While reproducing I paid close attention to the hardware counters of the switch port; the Tx counters were increasing wildly, but the Rx counters were ticking along at a few pps (consistent with normal LACP and STP background chatter).
I suspect, therefore, that the looping occurs in the NIC's firmware or in its iPXE driver.
One interesting thing worth noting is that when the looping has gone on for a while, there is a huge amount of TX Errors - and that the TX
and TXE
counters are essentially in sync. After I let the flooding go on for a little while, this is how it looks:
iPXE> ifstat net4
net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (open)
[Link:up, TX:46857 TXE:47200 RX:47343 RXE:9]
[TXE: 47200 x "Error 0x2a086089 (http://ipxe.org/2a086089)"]
[RXE: 3 x "The socket is not connected (http://ipxe.org/380f6093)"]
[RXE: 6 x "Operation not supported (http://ipxe.org/3c086083)"]
The same thing with the fix in #158 applied:
iPXE> dhcp net4
Configuring (net4 e0:07:1b:7f:52:10)...... ok
iPXE> ifstat net4
net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (open)
[Link:up, TX:7 TXE:0 RX:14 RXE:5]
[RXE: 3 x "The socket is not connected (http://ipxe.org/380f6093)"]
[RXE: 2 x "Error 0x202a608a (http://ipxe.org/202a608a)"]
iPXE> ifstat net4
net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (open)
[Link:up, TX:19 TXE:0 RX:43 RXE:16]
[RXE: 3 x "The socket is not connected (http://ipxe.org/380f6093)"]
[RXE: 13 x "Error 0x202a608a (http://ipxe.org/202a608a)"]
iPXE> ifclose net4
iPXE> ifstat net4
net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (closed)
[Link:up, TX:30 TXE:0 RX:72 RXE:28]
[RXE: 3 x "The socket is not connected (http://ipxe.org/380f6093)"]
[RXE: 24 x "Error 0x202a608a (http://ipxe.org/202a608a)"]
[RXE: 1 x "Operation not supported (http://ipxe.org/3c086083)"]
While I did this latter round, I also did a tcpdump on the switch side of all incoming frames:
cumulus@lab2:mgmt:~$ tcpdump -r swp24.pcap | grep -c LACP
reading from file swp24.pcap, link-type EN10MB (Ethernet)
24
I find it highly suspect that the amount of RX Error 0x202a608a, which, according to the web site means «No buffer space available» matches exactly the amount of LACP frames received by the switch. So while the #158 patch does cure the symptom, I think the underlying issue remain.
If I were to speculate, I'd say it looks something like this: iPXE deposits an outgoing LACP PDU in a buffer, tells the NIC to transmit it, which the NIC does but for some reason returns this 0x202a608a error, which in turn makes it so that the transmitted LACP PDU gets left behind in that buffer. Then iPXE goes to check the buffer for incoming frames, finds it, and thinks it came from the other side, and the flooding begins.
By the way, now that I have your attention, I was wondering if you would be willing to have a look at #84? Would be nice to have it merged before its two-year anniversary. 🙂
Edit: the Reported-by
in your commit is correct.
from ipxe.
I just realised that the 24 errors on the fixed version were RX errors, not TX errors, and that they had a different error code than the one seen during the flooding. That means my speculation doesn't necessarily make any sense, so feel free to ignore that. 🙂
from ipxe.
I can confirm that the commit in #158 does resolve the flooding problem.
Thank you!
I did notice, however, that the curious behaviour of iPXE's LACP responses being consistently 14 bytes larger then those originating from the switch remain as before. I am attaching a packet capture (ipxe-fixed.pcap.gz) of the behaviour with #158 applied so you can see for yourself.
Will take a look.
I do not believe it is the switch that is responsible for the looped LACP PDUs, as we make extensive use of LACP between servers and these switches have not seen this problem anywhere else outside of iPXE. I was also able to reproduce it in the lab while using different switch hardware than when we saw it happen in production. While reproducing I paid close attention to the hardware counters of the switch port; the Tx counters were increasing wildly, but the Rx counters were ticking along at a few pps (consistent with normal LACP and STP background chatter).
I suspect, therefore, that the looping occurs in the NIC's firmware or in its iPXE driver.
Thanks. I'll update the commit message to note this.
I just realised that the 24 errors on the fixed version were RX errors, not TX errors, and that they had a different error code than the one seen during the flooding. That means my speculation doesn't necessarily make any sense, so feel free to ignore that.
The 0x202a608a error is the ELOOP reported by the code in #158 that detects and drops the looped-back LACP packets, so it is expected in the circumstances.
from ipxe.
I did notice, however, that the curious behaviour of iPXE's LACP responses being consistently 14 bytes larger then those originating from the switch remain as before. I am attaching a packet capture (ipxe-fixed.pcap.gz) of the behaviour with #158 applied so you can see for yourself.
Will take a look.
I don't see this same erroneous behaviour in my test setup (qemu VM running iPXE, using a host tap0
interface made a member of a bond0
port configured using mode=4 miimon=100 lacp_rate=1
). The transmitted packets are the correct length.
iPXE reuses the received I/O buffer to construct the response, without adjusting the length. Given the odd and apparently LACP-specific behaviour arising from what seems to be the HPE/Mellanox UEFI NII driver, my best guess is that the UEFI NII driver is adding 14 bytes (suspiciously equal in length to an Ethernet header) on either the transmit or receive datapath.
I've added a patch to #158 that will reset the length of the I/O buffer before reusing it. If the UEFI NII driver is adding the padding on the receive datapath then this should cure the problem. If the UEFI NII driver is adding the padding on the transmit datapath then there's nothing iPXE can do about it.
from ipxe.
Another piece of information in case you're interested: I just tried booting the server in legacy BIOS mode, chainloading into undionly.kpxe
built from your LACP branch (version 1.20.1+ (g1bec)
) with DEBUG=eth_slow
. It does not output any SLOW netN RX loopback detected
message.
If I change boot mode back to UEFI, which chainloads into ipxe.efi
instead (same version as undionly.kpxe
, built at the same time), the RX loopback detected
message is back. No change to switches, cables, or anything like that was made.
Also, regardless of the boot mode, there is no evidence of any loopback occurring once the server has completed its boot into Linux (using the bonding.ko
LACP implementation).
So it seems clear to me that this LACP loopback issue is something the NIC firmware/driver or iPXE (impossible for me to tell which) conjures up all on its own, it is not caused by an improperly configured network topology.
I also see the LACP PDU size mismatch with undionly.kpxe
, for what it is worth. It doesn't appear to cause any negative effects, though.
from ipxe.
To try and avoid any possible missunderstanding
The card provides an EFI Driver for the NIC that is exposed as a NII interface
iPXE uses that NII interface. net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (open)
When running PCBIOS mode the undi driver probably comes from the same ROM on the NIC but is a completely different driver.
So with the information so far most likely this is a bug with the EFI driver on NIC, rather than with iPXE NII implementation.
You might want to tell HP/Mellanox about this.
from ipxe.
So it seems clear to me that this LACP loopback issue is something the NIC firmware/driver or iPXE (impossible for me to tell which) conjures up all on its own, it is not caused by an improperly configured network topology.
Agreed; that's why I changed the comments and commit log wording to indicate that the faulty component is the UEFI NII driver.
from ipxe.
Related Issues (20)
- Slow download, very small packet sizes HOT 26
- ARM64 build failed HOT 4
- esxi6.7 After downloading boot, it got stuck in downloading http HOT 3
- Support for EC certificates HOT 3
- [ new feature ? ] send 'tput init' to terminal from ipxe HOT 2
- Provide checksums for downloads on boot.ipxe.org HOT 4
- Sanboot --drive 0 fails HOT 6
- iPXE on Synology NAS ds1515+ - NICs link down. HOT 6
- iPXE on Synology NAS ds1515+ - serial consol is not able to receive keyboard keystrokes. HOT 13
- qemu + ipv6 promblem
- qemu + ipv6 promblem HOT 12
- iPXE ISO 'breaking' Dell iDRAC LOM NIC HOT 5
- Network interface ordering HOT 4
- HTTP Boot not getting DNS info (Dell R640) HOT 2
- Ubuntu 24.04 LTS fails to load HOT 1
- Make vlan available as a variable HOT 2
- After the network configuration fails, ipxe cannot ping through any ip address, even 127.0.0.1
- efi_veto vetoes Ip4Config on my working system HOT 2
- Hang on iPXE initialising devices
- Intel 10GBit X540-AT2 100MBit/s not working correctly HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ipxe.