Comments (34)
To add a data point to the reports: We had the same issue with Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
and "fixed" it by adding an#undef NET_PROTO_EAPOL
to our build config.
from ipxe.
This sounds like a possible duplicate of #1048 which should be fixed in current master. Can you verify which commit you have checked out?
from ipxe.
Could we get confirmation if this is fixed by the merge of #1174, thanks
from ipxe.
With ipxe.efi this does work for my test host, though we can’t use that bootloader due to issues that occur when many different nics are installed in a system.
Thank you for testing.
Your result indicates that the issue is fixed in iPXE, so I will close this issue now. If you want to continue using snponly.efi, you will need to contact your UEFI BIOS vendor to get a BIOS update that includes the equivalent fix in the BIOS-provided SNP driver.
You can also open a separate issue to cover whatever problem you are seeing that prevents you from using ipxe.efi when many different NICs are installed.
from ipxe.
The tested revert was specifically fbc3b4a
master...nshalman:ipxe:fbc3b4a104698658202c2a83217ca8722453bf49
from ipxe.
I may not have tested on the latest master. Thank you for the pointer.
from ipxe.
I may not have tested on the latest master. Thank you for the pointer.
Based on your git bisect
log, your most recent commit tested was 115707c which is older than the known fix for this issue.
from ipxe.
My test fails on the latest commit of master (98dd25a)
http://147.28.150.231:8000/ipxe.efi... ok
iPXE initialising devices...ok
iPXE 1.0.0+ -- Open Source Network Boot Firmware -- https://ipxe.org
Features: DNS HTTP HTTPS NFS TFTP VLAN EFI Menu
Welcome to iPXE Stress Test Embedded Script!
Configuring (net0 98:03:9b:89:d9:36)..................... ok
https://artifacts.platformequinix.com/images/ubuntu/22_04/fe3f18eead9ab1bf6a333294198cdb6cdf918290/image.tar.gz.................. Connection timed out (https://ipxe.org/4c116092)
flexboot_nodnic_ports_register_dev: port register_dev failed (Status = -336093320)
flexboot_nodnic_probe: flexboot_nodnic_ports_register_dev failed (Status = -336093320)
flexboot_nodnic_ports_register_dev: port register_dev failed (Status = -336093320)
flexboot_nodnic_probe: flexboot_nodnic_ports_register_dev failed (Status = -336093320)
flexboot_nodnic_ports_register_dev: port register_dev failed (Status = -336093320)
flexboot_nodnic_probe: flexboot_nodnic_ports_register_dev failed (Status = -336093320)
flexboot_nodnic_ports_register_dev: port register_dev failed (Status = -336093320)
flexboot_nodnic_probe: flexboot_nodnic_ports_register_dev failed (Status = -336093320)
flexboot_nodnic_ports_register_dev: port register_dev failed (Status = -336093320)
flexboot_nodnic_probe: flexboot_nodnic_ports_register_dev failed (Status = -336093320)
flexboot_nodnic_ports_register_dev: port register_dev failed (Status = -336093320)
flexboot_nodnic_probe: flexboot_nodnic_ports_register_dev failed (Status = -336093320)
from ipxe.
Just confirming that additional testing confirms that Mellanox CX4 cards are having trouble once booted into the latest commit of iPXe (98dd25a) but the problems go away if I apply my revert commit (fbc3b4a)
What additional debugging information would be of use for tracking down the issue?
from ipxe.
What is the card connected to, and what do you see on the wire?
from ipxe.
Can confirm, that 8b14652 breaks it also for Mellanox ConnectX-6 LX
cards. This happens up to the latest commit.
The NICs are connected through 100GBASE-CR4 QSFP28 cables through LAG to the switch.
tcpdump done on the switch:
Switch A
bash-4.2# tcpdump -i vlan1101 ether host b8:3f:d2:99:f0:34 -vvv
tcpdump: listening on vlan1101, link-type EN10MB (Ethernet), capture size 262144 bytes
12:56:14.461352 b8:3f:d2:99:f0:34 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 431: (tos 0x0, ttl 64, id 4395, offset 0, flags [none], proto UDP (17), length 417)
0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b8:3f:d2:99:f0:34 (oui Unknown), length 389, xid 0xb0370948, secs 4, Flags [Broadcast] (0x8000)
Client-Ethernet-Address b8:3f:d2:99:f0:34 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Discover
MSZ Option 57, length 2: 1472
ARCH Option 93, length 2: 11
NDI Option 94, length 3: 1.3.10
Vendor-Class Option 60, length 32: "PXEClient:Arch:00011:UNDI:003010"
User-Class Option 77, length 4:
instance#1: ERROR: invalid option
Parameter-Request Option 55, length 24:
Subnet-Mask, Default-Gateway, Domain-Name-Server, LOG
Hostname, Domain-Name, RP, MTU
NTP, Vendor-Option, Vendor-Class, TFTP
BF, Option 119, Option 128, Option 129
Option 130, Option 131, Option 132, Option 133
Option 134, Option 135, Option 175, Option 203
T175 Option 175, length 36: 2969895189,3004178411,50402561,385941796,16847617,17891585,654377237,16852481,17957121
Client-ID Option 61, length 7: ether b8:3f:d2:99:f0:34
GUID Option 97, length 17: 0.80.53.57.56.54.57.83.71.72.51.50.56.70.50.90.83
END Option 255, length 0
12:56:14.461520 b8:3f:d2:99:f0:34 (oui Unknown) > 33:33:00:00:00:02 (oui Unknown), ethertype IPv6 (0x86dd), length 70: (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::ba3f:d2ff:fe99:f034 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
source link-address option (1), length 8 (1): b8:3f:d2:99:f0:34
0x0000: b83f d299 f034
12:56:14.776110 00:1c:73:00:00:99 (oui Arista Networks) > b8:3f:d2:99:f0:34 (oui Unknown), ethertype IPv6 (0x86dd), length 118: (hlim 255, next-header ICMPv6 (58) payload length: 64) fe80::21c:73ff:fe00:99 > fe80::ba3f:d2ff:fe99:f034: [icmp6 sum ok] ICMP6, router advertisement, length 64
hop limit 64, Flags [managed], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 1000ms
source link-address option (1), length 8 (1): 00:1c:73:00:00:99
0x0000: 001c 7300 0099
mtu option (5), length 8 (1): 9100
0x0000: 0000 0000 238c
prefix info option (3), length 32 (4): 2a05:b540:2:22::/64, Flags [onlink], valid time 2592000s, pref. time 604800s
0x0000: 4080 0027 8d00 0009 3a80 0000 0000 2a05
0x0010: b540 0002 0022 0000 0000 0000 0000
12:56:14.942300 00:1c:73:00:00:99 (oui Arista Networks) > b8:3f:d2:99:f0:34 (oui Unknown), ethertype IPv6 (0x86dd), length 118: (hlim 255, next-header ICMPv6 (58) payload length: 64) fe80::21c:73ff:fe00:99 > fe80::ba3f:d2ff:fe99:f034: [icmp6 sum ok] ICMP6, router advertisement, length 64
hop limit 64, Flags [managed], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 1000ms
source link-address option (1), length 8 (1): 00:1c:73:00:00:99
0x0000: 001c 7300 0099
mtu option (5), length 8 (1): 9100
0x0000: 0000 0000 238c
prefix info option (3), length 32 (4): 2a05:b540:2:22::/64, Flags [onlink], valid time 2592000s, pref. time 604800s
0x0000: 4080 0027 8d00 0009 3a80 0000 0000 2a05
0x0010: b540 0002 0022 0000 0000 0000 0000
Switch B:
12:59:22.241906 b8:3f:d2:99:f0:34 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 431: (tos 0x0, ttl 64, id 6565, offset 0, flags [none], proto UDP (17), length 417)
0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b8:3f:d2:99:f0:34 (oui Unknown), length 389, xid 0x23431559, secs 4, Flags [Broadcast] (0x8000)
Client-Ethernet-Address b8:3f:d2:99:f0:34 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Discover
MSZ Option 57, length 2: 1472
ARCH Option 93, length 2: 11
NDI Option 94, length 3: 1.3.10
Vendor-Class Option 60, length 32: "PXEClient:Arch:00011:UNDI:003010"
User-Class Option 77, length 4:
instance#1: ERROR: invalid option
Parameter-Request Option 55, length 24:
Subnet-Mask, Default-Gateway, Domain-Name-Server, LOG
Hostname, Domain-Name, RP, MTU
NTP, Vendor-Option, Vendor-Class, TFTP
BF, Option 119, Option 128, Option 129
Option 130, Option 131, Option 132, Option 133
Option 134, Option 135, Option 175, Option 203
T175 Option 175, length 36: 2969895189,3004178411,50402561,385941796,16847617,17891585,654377237,16852481,17957121
Client-ID Option 61, length 7: ether b8:3f:d2:99:f0:34
GUID Option 97, length 17: 0.80.53.57.56.54.57.83.71.72.51.50.56.70.50.90.83
END Option 255, length 0
12:59:22.242081 b8:3f:d2:99:f0:34 (oui Unknown) > 33:33:00:00:00:02 (oui Unknown), ethertype IPv6 (0x86dd), length 70: (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::ba3f:d2ff:fe99:f034 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
source link-address option (1), length 8 (1): b8:3f:d2:99:f0:34
0x0000: b83f d299 f034
12:59:22.252335 00:1c:73:00:00:99 (oui Arista Networks) > b8:3f:d2:99:f0:34 (oui Unknown), ethertype IPv6 (0x86dd), length 118: (hlim 255, next-header ICMPv6 (58) payload length: 64) fe80::21c:73ff:fe00:99 > fe80::ba3f:d2ff:fe99:f034: [icmp6 sum ok] ICMP6, router advertisement, length 64
hop limit 64, Flags [managed], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 1000ms
source link-address option (1), length 8 (1): 00:1c:73:00:00:99
0x0000: 001c 7300 0099
mtu option (5), length 8 (1): 9100
0x0000: 0000 0000 238c
prefix info option (3), length 32 (4): 2a05:b540:2:22::/64, Flags [onlink], valid time 2592000s, pref. time 604800s
0x0000: 4080 0027 8d00 0009 3a80 0000 0000 2a05
0x0010: b540 0002 0022 0000 0000 0000 0000
hope that helps
from ipxe.
I can also verify that reverting that commit fixes failure to boot with Mellanox CX5 nics as well.
In my tests, I'm booting using snponly.efi, FWIW
from ipxe.
I tested the latest version again, as I saw some more commits related to eapol went in earlier today, but this is still broken.
It appears that the code in eapol.c, where it says "Ignore non-EAPol devices" isn't ignoring these Mellanox cards, because if I just add another unconditional "return 0;" before the "Initialize structure" comment, then my hosts w/ Mellanox boot interfaces will work.
from ipxe.
Hello,
I'm working for a relatively big hosting company and we also noticed that iPXE is broken for a while on Mellanox cards.
As an example we have new HP RL300 ARM servers and these chassis have an onboard Mellanox card.
Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
This issue is not limited to this specific model, we also have 25GbE+ Mellanox cards that are acting in the same way.
We are still on commit cac3a584dc8acea1522669f1ed16e0979fb92252
which works for Mellanox cards.
However, anything after will break PXE boot.
from ipxe.
Ran into this issue with Mellanox CX5 and CX6, rebasing fbc3b4a this onto main got them booting again.
from ipxe.
@Smithx10 can you try the workaround suggested by @Cornelicorn and report back if it helped as it's a much less invasive workaround to tweak a define than backing the code out entirely. I haven't had a chance to test for myself.
To add a data point to the reports: We had the same issue with
Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
and "fixed" it by adding an#undef NET_PROTO_EAPOL
to our build config.
from ipxe.
From the iPXE
IRC channel:
21:00 < Redacted> I'm trying to boot a Mellanox ConnectX5 card and
ran into Configuring (net2 a0:88:c2:6b:7f:44).................. No
configuration methods succeeded (https://ipxe.org/040ee119)
21:00 < Redacted> in both bios and uefi
21:00 < Redacted> Is there some gotcha with these Mellanox cards ?
21:06 < stappers> https://ipxe.org/040ee119
21:14 < Redacted> @stappers think I might be hitting
https://github.com/ipxe/ipxe/issues/1091 ?
21:24 < stappers> Keep thinking and act upon the better thoughts, at
least try to do.
21:55 < Redacted> Interesting, rolling back to
https://github.com/ipxe/ipxe/tree/8f1514a00450119b04b08642c55aa674bdf5a4ef
worked, Im applying this
https://github.com/ipxe/ipxe/commit/fbc3b4a104698658202c2a83217ca8722453bf49
and seeing what happens
21:58 * stappers is in UTC+1 and goes sleeping
22:36 < Redacted> Yea, just confirmed, mellanox worked after rebasing
that commit onto main
I as non mellanox hardware owner, are with the mellanox hardware owners: Somebody else should provide a merge request
from ipxe.
I as non mellanox hardware owner, are with the mellanox hardware owners: Somebody else should provide a merge request
I don't think my revert commit is a good solution. I believe @mcb30 is working on a better solution.
Of the short term fixes I can currently think of, changing the default for NET_PROTO_EAPOL
to be undefined is one option, assuming that that workaround works.
I am going to update my description of this bug to include the suggestion that folks attempt #1091 (comment) before patching the source.
As I have said before, I haven't had time to test that myself, but it seems very likely to me that it is a much simpler workaround.
from ipxe.
Adding a comment so I can watch. I have a downstream and was planning on updating with master
soon. Would very much like this fixed before I accept the merge. (And apparently I have the guilty commit in two releases of our downstream.)
from ipxe.
Could we get confirmation if this is fixed by the merge of #1174, thanks
I'm going to ask our ops team to try it out on an affected box. We have, in the interim, removed EAPOL support from Triton's downstream of ipxe, since we don't use it anyway currently. See TritonDataCenter/ipxe#25 .
from ipxe.
from ipxe.
Still failed for me when building with this latest change and eapol enabled again, booting from snponly.
…
On Sun, Mar 17, 2024 at 7:18 PM Christian I. Nilsson < @.> wrote: Could we get confirmation if this is fixed by the merge of #1174 <#1174>, thanks — Reply to this email directly, view it on GitHub <#1091 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZIKENFOFU6L32BLNASJWRDYYYQEBAVCNFSM6AAAAABADIPO2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGY2TGNBZGI . You are receiving this because you commented.Message ID: @.>
If you are using snponly then there's a good chance that the underlying SNP driver provided by Mellanox has the same bug, since Mellanox uses a shared driver codebase for both iPXE and their UEFI SNP driver. There's nothing we can do about the bug being present in the underlying SNP driver.
Please try using ipxe.efi
instead of snponly.efi
so that the updated iPXE driver (including the fix) is used to drive the hardware instead.
from ipxe.
from ipxe.
Your result indicates that the issue is fixed in iPXE, so I will close this issue now. If you want to continue using snponly.efi, you will need to contact your UEFI BIOS vendor to get a BIOS update that includes the equivalent fix in the BIOS-provided SNP driver.
Triton Data Center needs snponly.efi (and undionly.kpxe for BIOS) as well, so our testing would likely fail as well. (To that our, our downstream will maintain excluding EAPOL for now.)
from ipxe.
The 18-byte packet will be zero-padded to 60 bytes on the wire anyway (64 bytes including the Ethernet FCS), since that is the minimum length Ethernet packet.
We could possibly work around the underlying SNP driver bug by pointlessly zero-padding the packet to 60 bytes ourselves. That would be sufficient to avoid the underlying bug in the SNP driver (assuming that it is using code identical to that fixed in commit c11734eee).
@ech68 could you please retest snponly.efi built from #1177 ?
from ipxe.
from ipxe.
On Mon, Mar 18, 2024 at 11:40 AM Michael Brown @.***> wrote: @ech68 https://github.com/ech68 could you please retest snponly.efi built from #1177 <#1177> ?
Same failure mode unfortunately.
Thanks for testing. Does ifstat
report the driver as SNP or NII when you are using snponly.efi?
from ipxe.
from ipxe.
Thanks for testing. Does ifstat report the driver as SNP or NII when you are using snponly.efi?
NII
Thanks. I've generalised the PR to cover both SNP and NII, and force-pushed PR #1177. Could you please retest with this commit?
from ipxe.
from ipxe.
Thanks. I've generalised the PR to cover both SNP and NII, and
force-pushed PR #1177 #1177. Could you
please retest with this commit?with that update, it works!
Fantastic, thank you! Could you let me know your name and email for the commit log testing credit?
from ipxe.
from ipxe.
Fantastic, thank you! Could you let me know your name and email for the commit log testing credit?
Eric Hagberg, ***@***.***
I think there may be some kind of automated censorship system at work here. 🙃
from ipxe.
Fantastic, thank you! Could you let me know your name and email for the commit log testing credit?
Eric Hagberg, @.***
I think there may be some kind of automated censorship system at work here. 🙃
In https://github.com/ipxe/ipxe/pull/1177/commits is an email address ( mcb30 AT ipxe . org
) Eric, please mail to that address directly to by-pass the automated censorship.
from ipxe.
Related Issues (20)
- Certificates issued by Digicert GeoTrust Cloud DV are not trusted HOT 6
- Unrecognised relocation type 22 in build in loong64 ! HOT 10
- Netboot Debian on a 2006 Xserve with iPXE
- Issue with 25 gigabit Intel 810 seemingly caused by cad1cc6 (100 gigabit driver) HOT 2
- Initial make fails HOT 4
- Windows11 to use iPXE have a error HOT 1
- iPXE unable to operate Intel 82599ES NIC in EFI mode HOT 10
- Invalid file name when use .. (double dots) in path to store file for wimboot HOT 1
- Feature Request : Console Position (X - Y) for ECHO (and other text output) also a font size setting HOT 2
- Can't build iPXE due to error: call to ‘build_assert_805’ declared with attribute error HOT 9
- Feature request: support for /32 DHCP allocations HOT 4
- "No more network devices" on a supposedly supported network card HOT 4
- ipxe iso bootable issue using https HOT 1
- NBD Block device
- LoongArch EFI images fail to build
- Feature Request:Add multiple target which using qemu-img(1) as convert HOT 2
- Chelsio NICs don't play nice with NII HOT 2
- Slow download, very small packet sizes HOT 26
- ARM64 build failed HOT 4
- esxi6.7 After downloading boot, it got stuck in downloading http HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ipxe.