Giter Club home page Giter Club logo

Comments (7)

YutaroHayakawa avatar YutaroHayakawa commented on August 15, 2024 2

I guess the solution here is

  1. Do feature detection in user space (whether mixed tailcall & subprog is supported or not)
  2. Define macro based on that, HAS_MIXED_TAILCALL_SUBPROG for example
  3. Use __always_inline for __mcast_ep_delivery when HAS_MIXED_TAILCALL_SUBPROG
  4. Use either bounded loop or unrolling-based loop instead of for_each_map_elem when HAS_MIXED_TAILCALL_SUBPROG

CC @ldelossa

from cilium.

harsimran-pabla avatar harsimran-pabla commented on August 15, 2024 1

Thanks, @yushoyamaguchi, for opening this issue. There are few things to check

The minimum kernel version required for multicast is 5.15 ( This is probably for x86 ). Looks like for arm64 it is 6.0+ kernel version. You can add these in the documentation.

Aside from the documentation, we may need to add a feature probe helper in the cilium/ebpf library and use it when initializing multicast. On failure to probe ( subprog -> tailcall support ), multicast cell init should fail, and an appropriate message can be logged.

from cilium.

yushoyamaguchi avatar yushoyamaguchi commented on August 15, 2024

The commit is this.
torvalds/linux@d4609a5

from cilium.

fujitatomoya avatar fujitatomoya commented on August 15, 2024

Adding documentation will definitely be one of the enhancement we can take for user, i think we should update https://docs.cilium.io/en/stable/operations/system_requirements/#required-kernel-versions-for-advanced-features once version dependencies are clear. Actually this brings me an another question that all these kernel versions are meant to be for x86? that case, probably we could add note for that as well.

cilium/ebpf library and use it when initializing multicast. On failure to probe ( subprog -> tailcall support ), multicast cell init should fail, and an appropriate message can be logged.

@harsimran-pabla

if i am not mistaken. this means once we update the ConfigMap to enable multicast, some ciliumnodes would fail to initialize the cilium-agent with multicast capability, cilium-agent pods are going to in crash loop but providing error information from multicast cell can let the user know that multicast feature cannot be enabled with this ciliumnode. is my understanding correct?

so the expectation here from cilium is that user needs to configure the all ciliumnodes can be enabled with multicast if user wants to use the multicast feature? i am not sure about this graceful behavior, is this common behavior for cilium features?
another behavior would be fallback behavior that cilium-agent can be started with warning message but the feature cannot be used on that ciliumnodes.

besides if the fallback behavior is the case, this also needs to be considered with cilium/cilium-cli#2620. in case of some ciliumnodes cannot mange the multicast, it should be shown for user that as well.

thanks,

from cilium.

yushoyamaguchi avatar yushoyamaguchi commented on August 15, 2024

@harsimran-pabla @fujitatomoya
Thank you for your comments.

I found the detail cause of this bug.

When using eBPF with version 5 and older versions of the AArch64 kernel, it is prohibited to use tail calls within sub-program. To avoid this, in Cilium's eBPF programs, most of functions are declared as inline functions.

However, the function __mcast_ep_delivery() is not declared as an inline function.
Ref :

static long __mcast_ep_delivery(__maybe_unused void *sub_map,

The reason is that the call to __mcast_ep_delivery() is not a regular function call but is invoked as a callback for for_each_map_elem().
Since callback functions cannot be inlined, this approach is used.
Ref :

for_each_map_elem(sub_map, __mcast_ep_delivery, &cb_ctx, 0);

This is the only instance where for_each_map_elem() is used, excluding test code.

As @fujitatomoya said, I want to discuss about how to handle this bug.

cc @YutaroHayakawa I'm sorry for adding in cc,. Thank you so much for telling me the cause.

from cilium.

yushoyamaguchi avatar yushoyamaguchi commented on August 15, 2024

Should we allow this and check kernel compatibility on the agent side by adding functionality in cilium/ebpf ?
Otherwise, should it be fixed as an eBPF program issue?

from cilium.

yushoyamaguchi avatar yushoyamaguchi commented on August 15, 2024

Thank you for a great suggestion.

Do you have a plan that some community members work on this implementation?
I also can work on it after implementing connectivity test for multicast, but it will take a long time and some support.

One more important thing :
Because I think fixing this as eBPF problem will take a long time or may be given up, I added descriptions of kernel version limitation to multicast Documentation and sent PR. Then it was merged.
#33567
However, if you can fix this as you mentioned quickly, I probably shouldn't have added these descriptions.

I am very sorry for increasing procedures.
When you fix this eBPF problem, please remove these descriptions from multicast Documentation.
(I will do the best to notice and send PR to delete this.)

Thank you.

from cilium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.