External VPP Issue about deployments-k8s HOT 13 OPEN

yuraxdrumz commented on June 28, 2024

External VPP Issue

from deployments-k8s.

Comments (13)

edwarnicke commented on June 28, 2024 1

ghcr.io/edwarnicke/govpp/vpp:v22.06-release

Ah... OK... that's not going to work at this moment because there is a patch missing from it for using abstract sockets for memif (if you are interested, I can explain the ins and outs of why).

Could you try using:

v22.06-rc0-147-g1c5485ab8

from deployments-k8s.

edwarnicke commented on June 28, 2024

I am trying to run an external VPP as a daemon-set on k8s on EKS and I am running into some problems.

How are you setting up the external VPP as a daemon-set on K8s? We make a few presumptions about that external VPP... among them that it has an interface that is bound to the TunnelIP (usually the Node IP).

from deployments-k8s.

yuraxdrumz commented on June 28, 2024

I tried two different approaches:

Create AF_PACKET with a veth pair before running the forwarder
Bind another NIC to the node and run DPDK with vfio before running the forwarder

Both fail with the error I sent above.

EDIT:

I didn't find any examples with external VPP, so that is why I tried the approaches above

from deployments-k8s.

yuraxdrumz commented on June 28, 2024

I just tried attaching another NIC to one of my k8s nodes, created an AF_PACKET interface, set its IP to the one from AWS console and added the ip as the NSM_TUNNEL_IP and I still get the same error.

Can you explain what I am missing here? Would appreciate the help.

from deployments-k8s.

edwarnicke commented on June 28, 2024

Which external VPP version are you running?

from deployments-k8s.

yuraxdrumz commented on June 28, 2024

ghcr.io/edwarnicke/govpp/vpp:v22.06-release

from deployments-k8s.

yuraxdrumz commented on June 28, 2024

Sometimes I feel so stupid.

Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock.
Forwarder went up and then I tried running everything on 1 node:

VPP
Forwarder
NSE
NSC

Everything works.
Now I will modify forwarders to use the 2nd NIC ip, as currently I can only pass NSM_TUNNEL_IP, which in my case is different for each forwarder, and then I will try to run the NSE on that node.

Running VPP as a daemon-set seems to be working.

from deployments-k8s.

yuraxdrumz commented on June 28, 2024

@edwarnicke

With kernel2kernel example everything worked, but with kernel nsc to memif nse it fails because memif interface is not created.
This is the error I see in my nse vpp - unknown message: memif_socket_filename_add_del_v2_34223bdf: cannot support any of the requested mechanism

I saw #519, which removed the need for external vpp option, but I don't follow how is this supposed to work in the first place.
I understand memif needs a role=server, and role=client.

Now, in my use case, my forwarder is connected to the external vpp socket and both the forwarder and the vpp pods run on hostNetwork.
My kernel nsc and vpp nse run in separate ns namespaces (without hostNetwork).

Is it possible to see an example of an external vpp yaml with a memif interface?

Thanks

from deployments-k8s.

edwarnicke commented on June 28, 2024

Sometimes I feel so stupid.

Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock. Forwarder went up and then I tried running everything on 1 node:

Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)

from deployments-k8s.

yuraxdrumz commented on June 28, 2024

@edwarnicke

Thanks for the help, it worked.

I tried a couple of configurations:

VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with memif_socket_filename_add_del_v2_34223bdf.
Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.

After seeing some issues with the cleanup of the interfaces while using the external VPP, I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.

At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.

Thanks

from deployments-k8s.

yuraxdrumz commented on June 28, 2024

Sometimes I feel so stupid.
Instead of providing /var/run/vpp/external/api.sock I passed /var/run/vpp/external/cli.sock. Forwarder went up and then I tried running everything on 1 node:

Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)

I think verifying we are trying to connect to the api.sock by running some initial command, like a show int and if it fails, catch it and add a warning in addition to the error message that returns from govpp.

An example would be:

Original message

time="2022-08-22T14:03:15Z" level=info msg="Decoding sockclnt_create_reply failed: panic occurred during decoding message sockclnt_create_reply: runtime error: slice bounds out of range [10:0]" logger=govpp/socketclient

Add warning below the message from the govpp wrapper

time="2022-08-22T14:03:15Z" level=warn msg="sending command to /var/run/vpp/external/cli.sock failed, make sure you are trying to connect to the api.sock" logger=govpp/...

from deployments-k8s.

edwarnicke commented on June 28, 2024

@edwarnicke

Thanks for the help, it worked.

I'm a little confused... it sounds above like v22.06-rc0-147-g1c5485ab8 worked, but then below you talk about it not working...

I tried a couple of configurations:

VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with memif_socket_filename_add_del_v2_34223bdf.

Was this with v22.06-rc0-147-g1c5485ab8 ?

Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.

After seeing some issues with the cleanup of the interfaces while using the external VPP,

Could you file a bug on those issues... we should be cleaning up those interfaces on restart of a forwarder using an external VPP... I'm not entirely sure we try to do that with an NSE using an external VPP.

I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.

Ah... so external VPP is intrinsically more complex (coordination needed)... its super useful for cases where you want to increase performance.

I'd love to hear more about the details of what you are doing with vfio :)

I'd also be quite interested in what you are ultimately trying to achieve.

At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.

Thanks

from deployments-k8s.

yuraxdrumz commented on June 28, 2024

@edwarnicke

I meant the VPP abstract socket feature works, but connecting an NSC to an NSE with both the NSE and the forwarder using the same VPP fails with the same error memif_socket_filename_add_del_v2_34223bdf. The server side (nse) memif is created after running the VPP daemon-set with the image you provided, but the client side memif (forwarder) returns the error above although it uses the same VPP.

I will recreate the errors over the weekend and open up an issue with all the necessary information and logs regarding the single VPP use case.

Regarding the why I am doing this,

We are trying to use NSM at my company for various use cases, one of them is the ability to allow client VPNs to connect to the internet and other branches of their offices securely via chaining security functions (nsm composition).

Before jumping to all the chaining and various other use cases, I wanted to cover internet access and have a reliable benchmark between 1 NSC to 1 NSE, whether its on same machine or different machines to know the potential of the traffic I can push, before adding any other components in-between.

In the process, I created an NSE, which acts as the internet gateway.
To gain internet access from the NSE, I set it as hostNetwork, created a VETH pair and added an AF_PACKET interface and configured it to go to the internet via the default gateway.

These are my findings so far regarding simple use case of 1 NSC -> 1 NSE that goes to internet:

EKS 1.22 on AWS c5n.2xlarge nodes

With all the default NSM configurations, 1 core to the NSE, 1 core to each forwarder and 1 core to the NSC
- Performance was in the region of XXX kbps.
- I noticed the ping between 2 memifs took 80ms, which is why I suspected the CPU, so I raised the NSE to have 3 cores and configured the local VPP to have 1 main core and 2 workers, raised the forwarders to have 3 cores as well with the same VPP config above, raised the NSC to have 3 cores
- After that, ran speedtest-cli and got 1-1.4Gbps download and 0.6-1Gbps upload.
I decided to add a 2nd NIC with an elastic ip to each machine, installed DPDK, bounded the NIC with vfio-pci driver and mounted /dev/vfio to the nse.
- Afterwards, I ran the speedtest-cli again, which got me 1.6-2.4Gbps download and 1.2-1.8Gbps upload.
Next up, I thought that if I can add all data plane components with vpp + dpdk, I will get maximum performance, assuming I have enough cores. Now, the issue with running separate vpp's is that as soon as you bind 1 vpp + dpdk to a NIC, you can't bind it anymore and reuse it. The option of attaching a NIC per forwarder + every NSE I will need seems too expensive, not only cost wise, but also cpu wise. Due to the polling mode driver I will lose 1 core per rx queue on each instance vpp + dpdk. So I tried to run a single VPP as a daemon set with 1 dpdk vfio bound NIC per node and run forwarder and all other vpp related workloads on it. I encountered the abstract socket issue, which I didn't even realize, until you helped me with it, but I saw a lot of interfaces are created and are not cleaned, so I discarded the idea and stuck with only mounting /dev/vfio to the nse that acts as the internet gateway

Hope this answers your questions.

Thanks

from deployments-k8s.

External VPP Issue about deployments-k8s HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent