Comments (7)
I guess the solution here is
- Do feature detection in user space (whether mixed tailcall & subprog is supported or not)
- Define macro based on that,
HAS_MIXED_TAILCALL_SUBPROG
for example - Use
__always_inline
for__mcast_ep_delivery
whenHAS_MIXED_TAILCALL_SUBPROG
- Use either bounded loop or unrolling-based loop instead
of for_each_map_elem
whenHAS_MIXED_TAILCALL_SUBPROG
CC @ldelossa
from cilium.
Thanks, @yushoyamaguchi, for opening this issue. There are few things to check
The minimum kernel version required for multicast is 5.15 ( This is probably for x86 ). Looks like for arm64 it is 6.0+ kernel version. You can add these in the documentation.
Aside from the documentation, we may need to add a feature probe helper in the cilium/ebpf library and use it when initializing multicast. On failure to probe ( subprog -> tailcall support ), multicast cell init should fail, and an appropriate message can be logged.
from cilium.
The commit is this.
torvalds/linux@d4609a5
from cilium.
Adding documentation will definitely be one of the enhancement we can take for user, i think we should update https://docs.cilium.io/en/stable/operations/system_requirements/#required-kernel-versions-for-advanced-features once version dependencies are clear. Actually this brings me an another question that all these kernel versions are meant to be for x86? that case, probably we could add note for that as well.
cilium/ebpf library and use it when initializing multicast. On failure to probe ( subprog -> tailcall support ), multicast cell init should fail, and an appropriate message can be logged.
if i am not mistaken. this means once we update the ConfigMap to enable multicast, some ciliumnodes would fail to initialize the cilium-agent with multicast capability, cilium-agent pods are going to in crash loop but providing error information from multicast cell can let the user know that multicast feature cannot be enabled with this ciliumnode. is my understanding correct?
so the expectation here from cilium is that user needs to configure the all ciliumnodes can be enabled with multicast if user wants to use the multicast feature? i am not sure about this graceful behavior, is this common behavior for cilium features?
another behavior would be fallback behavior that cilium-agent can be started with warning message but the feature cannot be used on that ciliumnodes.
besides if the fallback behavior is the case, this also needs to be considered with cilium/cilium-cli#2620. in case of some ciliumnodes cannot mange the multicast, it should be shown for user that as well.
thanks,
from cilium.
@harsimran-pabla @fujitatomoya
Thank you for your comments.
I found the detail cause of this bug.
When using eBPF with version 5 and older versions of the AArch64 kernel, it is prohibited to use tail calls within sub-program. To avoid this, in Cilium's eBPF programs, most of functions are declared as inline functions.
However, the function __mcast_ep_delivery()
is not declared as an inline function.
Ref :
Line 338 in c3d943f
The reason is that the call to __mcast_ep_delivery()
is not a regular function call but is invoked as a callback for for_each_map_elem()
.
Since callback functions cannot be inlined, this approach is used.
Ref :
Line 433 in c3d943f
This is the only instance where for_each_map_elem()
is used, excluding test code.
As @fujitatomoya said, I want to discuss about how to handle this bug.
cc @YutaroHayakawa I'm sorry for adding in cc,. Thank you so much for telling me the cause.
from cilium.
Should we allow this and check kernel compatibility on the agent side by adding functionality in cilium/ebpf ?
Otherwise, should it be fixed as an eBPF program issue?
from cilium.
Thank you for a great suggestion.
Do you have a plan that some community members work on this implementation?
I also can work on it after implementing connectivity test for multicast, but it will take a long time and some support.
One more important thing :
Because I think fixing this as eBPF problem will take a long time or may be given up, I added descriptions of kernel version limitation to multicast Documentation and sent PR. Then it was merged.
#33567
However, if you can fix this as you mentioned quickly, I probably shouldn't have added these descriptions.
I am very sorry for increasing procedures.
When you fix this eBPF problem, please remove these descriptions from multicast Documentation.
(I will do the best to notice and send PR to delete this.)
Thank you.
from cilium.
Related Issues (20)
- [DATA RACE]: k8s.(*ServiceCache).DebugStatus() HOT 1
- custom BPF code loading failes
- Option to disable SourceIPVerification while chaining CNI HOT 8
- Hubble-Relay timesout running on Kubernetes 1.30
- Test [to-fqdns] fails yet manual verification works HOT 2
- [DATA RACE]: github.com/cilium/dns.(*Conn).ReadMsgHeader() HOT 1
- [DATA RACE]: github.com/cilium/cilium/pkg/bgpv1/agent.(*Controller).bgppSelection()
- [DATA RACE]: github.com/cilium/cilium/operator/auth/spire.(*Client).Upsert()
- [DATA RACE]: github.com/cilium/cilium/pkg/ciliumenvoyconfig.(*envoyServiceBackendSyncer).RegisterServiceUsageInCEC()
- Collect cilium_skip_lb{4,6} maps in sysdump
- Cilium identity not correct when trying connect from outside cluster with nodeport HOT 1
- Cannot forward proxied DNS lookup; bind: address already in use HOT 1
- LoadBalancerClass cannot be set on Cilium Gateway API Servicess
- CI: Integration Test: TestClusterMeshMultipleAddRemove HOT 1
- Document GKE cloud DNS deployment using LRP
- Run LRP e2e test for per-packet LB HOT 1
- Policy Correlation: realized policy misses for L4-only rules HOT 3
- CFP: HOT 1
- CFP: cilium
- Documentation issues
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cilium.