Giter Club home page Giter Club logo

Comments (13)

edwarnicke avatar edwarnicke commented on June 28, 2024 1

ghcr.io/edwarnicke/govpp/vpp:v22.06-release

Ah... OK... that's not going to work at this moment because there is a patch missing from it for using abstract sockets for memif (if you are interested, I can explain the ins and outs of why).

Could you try using:

v22.06-rc0-147-g1c5485ab8

from deployments-k8s.

edwarnicke avatar edwarnicke commented on June 28, 2024

I am trying to run an external VPP as a daemon-set on k8s on EKS and I am running into some problems.

How are you setting up the external VPP as a daemon-set on K8s? We make a few presumptions about that external VPP... among them that it has an interface that is bound to the TunnelIP (usually the Node IP).

from deployments-k8s.

yuraxdrumz avatar yuraxdrumz commented on June 28, 2024

I tried two different approaches:

  • Create AF_PACKET with a veth pair before running the forwarder
  • Bind another NIC to the node and run DPDK with vfio before running the forwarder

Both fail with the error I sent above.

EDIT:

I didn't find any examples with external VPP, so that is why I tried the approaches above

from deployments-k8s.

yuraxdrumz avatar yuraxdrumz commented on June 28, 2024

I just tried attaching another NIC to one of my k8s nodes, created an AF_PACKET interface, set its IP to the one from AWS console and added the ip as the NSM_TUNNEL_IP and I still get the same error.

Can you explain what I am missing here? Would appreciate the help.

from deployments-k8s.

edwarnicke avatar edwarnicke commented on June 28, 2024

Which external VPP version are you running?

from deployments-k8s.

yuraxdrumz avatar yuraxdrumz commented on June 28, 2024

ghcr.io/edwarnicke/govpp/vpp:v22.06-release

from deployments-k8s.

yuraxdrumz avatar yuraxdrumz commented on June 28, 2024

Sometimes I feel so stupid.

Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock.
Forwarder went up and then I tried running everything on 1 node:

  1. VPP
  2. Forwarder
  3. NSE
  4. NSC

Everything works.
Now I will modify forwarders to use the 2nd NIC ip, as currently I can only pass NSM_TUNNEL_IP, which in my case is different for each forwarder, and then I will try to run the NSE on that node.

Running VPP as a daemon-set seems to be working.

from deployments-k8s.

yuraxdrumz avatar yuraxdrumz commented on June 28, 2024

@edwarnicke

With kernel2kernel example everything worked, but with kernel nsc to memif nse it fails because memif interface is not created.
This is the error I see in my nse vpp - unknown message: memif_socket_filename_add_del_v2_34223bdf: cannot support any of the requested mechanism

I saw #519, which removed the need for external vpp option, but I don't follow how is this supposed to work in the first place.
I understand memif needs a role=server, and role=client.

Now, in my use case, my forwarder is connected to the external vpp socket and both the forwarder and the vpp pods run on hostNetwork.
My kernel nsc and vpp nse run in separate ns namespaces (without hostNetwork).

Is it possible to see an example of an external vpp yaml with a memif interface?

Thanks

from deployments-k8s.

edwarnicke avatar edwarnicke commented on June 28, 2024

Sometimes I feel so stupid.

Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock. Forwarder went up and then I tried running everything on 1 node:

Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)

from deployments-k8s.

yuraxdrumz avatar yuraxdrumz commented on June 28, 2024

@edwarnicke

Thanks for the help, it worked.

I tried a couple of configurations:

  1. VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with memif_socket_filename_add_del_v2_34223bdf.
  2. Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.

After seeing some issues with the cleanup of the interfaces while using the external VPP, I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.

At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.

Thanks

from deployments-k8s.

yuraxdrumz avatar yuraxdrumz commented on June 28, 2024

Sometimes I feel so stupid.
Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock. Forwarder went up and then I tried running everything on 1 node:

Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)

I think verifying we are trying to connect to the api.sock by running some initial command, like a show int and if it fails, catch it and add a warning in addition to the error message that returns from govpp.

An example would be:

Original message

time="2022-08-22T14:03:15Z" level=info msg="Decoding sockclnt_create_reply failed: panic occurred during decoding message sockclnt_create_reply: runtime error: slice bounds out of range [10:0]" logger=govpp/socketclient

Add warning below the message from the govpp wrapper

time="2022-08-22T14:03:15Z" level=warn msg="sending command to /var/run/vpp/external/cli.sock failed, make sure you are trying to connect to the api.sock" logger=govpp/...

from deployments-k8s.

edwarnicke avatar edwarnicke commented on June 28, 2024

@edwarnicke

Thanks for the help, it worked.

I'm a little confused... it sounds above like v22.06-rc0-147-g1c5485ab8 worked, but then below you talk about it not working...

I tried a couple of configurations:

  1. VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with memif_socket_filename_add_del_v2_34223bdf.

Was this with v22.06-rc0-147-g1c5485ab8 ?

  1. Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.

After seeing some issues with the cleanup of the interfaces while using the external VPP,

Could you file a bug on those issues... we should be cleaning up those interfaces on restart of a forwarder using an external VPP... I'm not entirely sure we try to do that with an NSE using an external VPP.

I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.

Ah... so external VPP is intrinsically more complex (coordination needed)... its super useful for cases where you want to increase performance.

I'd love to hear more about the details of what you are doing with vfio :)

I'd also be quite interested in what you are ultimately trying to achieve.

At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.

Thanks

from deployments-k8s.

yuraxdrumz avatar yuraxdrumz commented on June 28, 2024

@edwarnicke

I meant the VPP abstract socket feature works, but connecting an NSC to an NSE with both the NSE and the forwarder using the same VPP fails with the same error memif_socket_filename_add_del_v2_34223bdf. The server side (nse) memif is created after running the VPP daemon-set with the image you provided, but the client side memif (forwarder) returns the error above although it uses the same VPP.

I will recreate the errors over the weekend and open up an issue with all the necessary information and logs regarding the single VPP use case.

Regarding the why I am doing this,

We are trying to use NSM at my company for various use cases, one of them is the ability to allow client VPNs to connect to the internet and other branches of their offices securely via chaining security functions (nsm composition).

Before jumping to all the chaining and various other use cases, I wanted to cover internet access and have a reliable benchmark between 1 NSC to 1 NSE, whether its on same machine or different machines to know the potential of the traffic I can push, before adding any other components in-between.

In the process, I created an NSE, which acts as the internet gateway.
To gain internet access from the NSE, I set it as hostNetwork, created a VETH pair and added an AF_PACKET interface and configured it to go to the internet via the default gateway.

These are my findings so far regarding simple use case of 1 NSC -> 1 NSE that goes to internet:

EKS 1.22 on AWS c5n.2xlarge nodes

  • With all the default NSM configurations, 1 core to the NSE, 1 core to each forwarder and 1 core to the NSC
    • Performance was in the region of XXX kbps.
    • I noticed the ping between 2 memifs took 80ms, which is why I suspected the CPU, so I raised the NSE to have 3 cores and configured the local VPP to have 1 main core and 2 workers, raised the forwarders to have 3 cores as well with the same VPP config above, raised the NSC to have 3 cores
    • After that, ran speedtest-cli and got 1-1.4Gbps download and 0.6-1Gbps upload.
  • I decided to add a 2nd NIC with an elastic ip to each machine, installed DPDK, bounded the NIC with vfio-pci driver and mounted /dev/vfio to the nse.
    • Afterwards, I ran the speedtest-cli again, which got me 1.6-2.4Gbps download and 1.2-1.8Gbps upload.
  • Next up, I thought that if I can add all data plane components with vpp + dpdk, I will get maximum performance, assuming I have enough cores. Now, the issue with running separate vpp's is that as soon as you bind 1 vpp + dpdk to a NIC, you can't bind it anymore and reuse it. The option of attaching a NIC per forwarder + every NSE I will need seems too expensive, not only cost wise, but also cpu wise. Due to the polling mode driver I will lose 1 core per rx queue on each instance vpp + dpdk. So I tried to run a single VPP as a daemon set with 1 dpdk vfio bound NIC per node and run forwarder and all other vpp related workloads on it. I encountered the abstract socket issue, which I didn't even realize, until you helped me with it, but I saw a lot of interfaces are created and are not cleaned, so I discarded the idea and stuck with only mounting /dev/vfio to the nse that acts as the internet gateway

Hope this answers your questions.

Thanks

from deployments-k8s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.