Giter Club home page Giter Club logo

Comments (11)

mcb30 avatar mcb30 commented on June 5, 2024 1

By the way, now that I have your attention, I was wondering if you would be willing to have a look at #84? Would be nice to have it merged before its two-year anniversary.

On it!

from ipxe.

toreanderson avatar toreanderson commented on June 5, 2024

I am attaching the console output when reproducing with an iPXE binary built from current Git master with LACP debugging (command line: make bin-x86_64-efi/ipxe.efi DEBUG=eth_slow). The output contains terminal colour codes, so it's best viewed with less -R or bat). With debugging output enabled, the packet flood is reduced to a trickle; presumably the 115200-baud serial console is slowing it down.

ipxe-lacp-debug.txt

from ipxe.

toreanderson avatar toreanderson commented on June 5, 2024

As one would expect, disabling LACP support in iPXE (sed -i /NET_PROTO_LACP/s/define/undef/ config/general.h) prevents the bug from being triggered.

Disabling LACP might be a sensible default for build targets meant for chainloading, by the way. The initial UEFI/PCBIOS PXE firmware does generally not support LACP anyway, so in order for it to work the switch must support LACP bypass or equivalent, which in turn means that iPXE does not really need LACP support either.

from ipxe.

mcb30 avatar mcb30 commented on June 5, 2024

@toreanderson It looks as though your switch (or some other component below iPXE's network stack) is echoing the LACP packets back to iPXE: you can see from the trace lines such as

SLOW net4 TX LACP actor (ffff,e0:07:1b:7f:52:10,0001,ff,0001) [aFGSCDlx]
SLOW net4 TX LACP partner (ffff,44:38:39:ff:30:bf,000f,ff,0001) [AFGScdLx]
SLOW net4 TX LACP collector 0000 (0 us)

followed immediately by

SLOW net4 RX LACP actor (ffff,e0:07:1b:7f:52:10,0001,ff,0001) [aFGSCDlx]
SLOW net4 RX LACP partner (ffff,44:38:39:ff:30:bf,000f,ff,0001) [AFGScdLx]
SLOW net4 RX LACP collector 0000 (0 us)

i.e. the packet transmitted by iPXE (with TX LACP actor (ffff,e0:07:1b:7f:52:10,0001,ff,0001) containing iPXE's MAC address) is then immediately received by iPXE.

We should be able to work around this loopback issue by detecting and ignoring looped-back packets. Could you please try the commit in #158?

If it works, then please let me know if I have your name and e-mail address correct in the commit message. Thanks!

from ipxe.

toreanderson avatar toreanderson commented on June 5, 2024

I can confirm that the commit in #158 does resolve the flooding problem.

I did notice, however, that the curious behaviour of iPXE's LACP responses being consistently 14 bytes larger then those originating from the switch remain as before. I am attaching a packet capture (ipxe-fixed.pcap.gz) of the behaviour with #158 applied so you can see for yourself.

I do not believe it is the switch that is responsible for the looped LACP PDUs, as we make extensive use of LACP between servers and these switches have not seen this problem anywhere else outside of iPXE. I was also able to reproduce it in the lab while using different switch hardware than when we saw it happen in production. While reproducing I paid close attention to the hardware counters of the switch port; the Tx counters were increasing wildly, but the Rx counters were ticking along at a few pps (consistent with normal LACP and STP background chatter).

I suspect, therefore, that the looping occurs in the NIC's firmware or in its iPXE driver.

One interesting thing worth noting is that when the looping has gone on for a while, there is a huge amount of TX Errors - and that the TX and TXE counters are essentially in sync. After I let the flooding go on for a little while, this is how it looks:

iPXE> ifstat net4
net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (open)
  [Link:up, TX:46857 TXE:47200 RX:47343 RXE:9]
  [TXE: 47200 x "Error 0x2a086089 (http://ipxe.org/2a086089)"]
  [RXE: 3 x "The socket is not connected (http://ipxe.org/380f6093)"]
  [RXE: 6 x "Operation not supported (http://ipxe.org/3c086083)"]

The same thing with the fix in #158 applied:

iPXE> dhcp net4                          
Configuring (net4 e0:07:1b:7f:52:10)...... ok
iPXE> ifstat net4
net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (open)
  [Link:up, TX:7 TXE:0 RX:14 RXE:5]
  [RXE: 3 x "The socket is not connected (http://ipxe.org/380f6093)"]
  [RXE: 2 x "Error 0x202a608a (http://ipxe.org/202a608a)"]
iPXE> ifstat net4
net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (open)
  [Link:up, TX:19 TXE:0 RX:43 RXE:16]
  [RXE: 3 x "The socket is not connected (http://ipxe.org/380f6093)"]
  [RXE: 13 x "Error 0x202a608a (http://ipxe.org/202a608a)"]
iPXE> ifclose net4
iPXE> ifstat net4
net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (closed)
  [Link:up, TX:30 TXE:0 RX:72 RXE:28]
  [RXE: 3 x "The socket is not connected (http://ipxe.org/380f6093)"]
  [RXE: 24 x "Error 0x202a608a (http://ipxe.org/202a608a)"]
  [RXE: 1 x "Operation not supported (http://ipxe.org/3c086083)"]

While I did this latter round, I also did a tcpdump on the switch side of all incoming frames:

cumulus@lab2:mgmt:~$ tcpdump -r swp24.pcap | grep -c LACP
reading from file swp24.pcap, link-type EN10MB (Ethernet)
24

I find it highly suspect that the amount of RX Error 0x202a608a, which, according to the web site means «No buffer space available» matches exactly the amount of LACP frames received by the switch. So while the #158 patch does cure the symptom, I think the underlying issue remain.

If I were to speculate, I'd say it looks something like this: iPXE deposits an outgoing LACP PDU in a buffer, tells the NIC to transmit it, which the NIC does but for some reason returns this 0x202a608a error, which in turn makes it so that the transmitted LACP PDU gets left behind in that buffer. Then iPXE goes to check the buffer for incoming frames, finds it, and thinks it came from the other side, and the flooding begins.

By the way, now that I have your attention, I was wondering if you would be willing to have a look at #84? Would be nice to have it merged before its two-year anniversary. 🙂

Edit: the Reported-by in your commit is correct.

from ipxe.

toreanderson avatar toreanderson commented on June 5, 2024

I just realised that the 24 errors on the fixed version were RX errors, not TX errors, and that they had a different error code than the one seen during the flooding. That means my speculation doesn't necessarily make any sense, so feel free to ignore that. 🙂

from ipxe.

mcb30 avatar mcb30 commented on June 5, 2024

I can confirm that the commit in #158 does resolve the flooding problem.

Thank you!

I did notice, however, that the curious behaviour of iPXE's LACP responses being consistently 14 bytes larger then those originating from the switch remain as before. I am attaching a packet capture (ipxe-fixed.pcap.gz) of the behaviour with #158 applied so you can see for yourself.

Will take a look.

I do not believe it is the switch that is responsible for the looped LACP PDUs, as we make extensive use of LACP between servers and these switches have not seen this problem anywhere else outside of iPXE. I was also able to reproduce it in the lab while using different switch hardware than when we saw it happen in production. While reproducing I paid close attention to the hardware counters of the switch port; the Tx counters were increasing wildly, but the Rx counters were ticking along at a few pps (consistent with normal LACP and STP background chatter).

I suspect, therefore, that the looping occurs in the NIC's firmware or in its iPXE driver.

Thanks. I'll update the commit message to note this.

I just realised that the 24 errors on the fixed version were RX errors, not TX errors, and that they had a different error code than the one seen during the flooding. That means my speculation doesn't necessarily make any sense, so feel free to ignore that.

The 0x202a608a error is the ELOOP reported by the code in #158 that detects and drops the looped-back LACP packets, so it is expected in the circumstances.

from ipxe.

mcb30 avatar mcb30 commented on June 5, 2024

I did notice, however, that the curious behaviour of iPXE's LACP responses being consistently 14 bytes larger then those originating from the switch remain as before. I am attaching a packet capture (ipxe-fixed.pcap.gz) of the behaviour with #158 applied so you can see for yourself.

Will take a look.

I don't see this same erroneous behaviour in my test setup (qemu VM running iPXE, using a host tap0 interface made a member of a bond0 port configured using mode=4 miimon=100 lacp_rate=1). The transmitted packets are the correct length.

iPXE reuses the received I/O buffer to construct the response, without adjusting the length. Given the odd and apparently LACP-specific behaviour arising from what seems to be the HPE/Mellanox UEFI NII driver, my best guess is that the UEFI NII driver is adding 14 bytes (suspiciously equal in length to an Ethernet header) on either the transmit or receive datapath.

I've added a patch to #158 that will reset the length of the I/O buffer before reusing it. If the UEFI NII driver is adding the padding on the receive datapath then this should cure the problem. If the UEFI NII driver is adding the padding on the transmit datapath then there's nothing iPXE can do about it.

from ipxe.

toreanderson avatar toreanderson commented on June 5, 2024

Another piece of information in case you're interested: I just tried booting the server in legacy BIOS mode, chainloading into undionly.kpxe built from your LACP branch (version 1.20.1+ (g1bec)) with DEBUG=eth_slow. It does not output any SLOW netN RX loopback detected message.

If I change boot mode back to UEFI, which chainloads into ipxe.efi instead (same version as undionly.kpxe, built at the same time), the RX loopback detected message is back. No change to switches, cables, or anything like that was made.

Also, regardless of the boot mode, there is no evidence of any loopback occurring once the server has completed its boot into Linux (using the bonding.ko LACP implementation).

So it seems clear to me that this LACP loopback issue is something the NIC firmware/driver or iPXE (impossible for me to tell which) conjures up all on its own, it is not caused by an improperly configured network topology.

I also see the LACP PDU size mismatch with undionly.kpxe, for what it is worth. It doesn't appear to cause any negative effects, though.

from ipxe.

NiKiZe avatar NiKiZe commented on June 5, 2024

To try and avoid any possible missunderstanding
The card provides an EFI Driver for the NIC that is exposed as a NII interface
iPXE uses that NII interface. net4: e0:07:1b:7f:52:10 using NII on NII-0000:04:00.0 (open)

When running PCBIOS mode the undi driver probably comes from the same ROM on the NIC but is a completely different driver.
So with the information so far most likely this is a bug with the EFI driver on NIC, rather than with iPXE NII implementation.

You might want to tell HP/Mellanox about this.

from ipxe.

mcb30 avatar mcb30 commented on June 5, 2024

So it seems clear to me that this LACP loopback issue is something the NIC firmware/driver or iPXE (impossible for me to tell which) conjures up all on its own, it is not caused by an improperly configured network topology.

Agreed; that's why I changed the comments and commit log wording to indicate that the faulty component is the UEFI NII driver.

from ipxe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.