Comments (13)
ghcr.io/edwarnicke/govpp/vpp:v22.06-release
Ah... OK... that's not going to work at this moment because there is a patch missing from it for using abstract sockets for memif (if you are interested, I can explain the ins and outs of why).
Could you try using:
from deployments-k8s.
I am trying to run an external VPP as a daemon-set on k8s on EKS and I am running into some problems.
How are you setting up the external VPP as a daemon-set on K8s? We make a few presumptions about that external VPP... among them that it has an interface that is bound to the TunnelIP (usually the Node IP).
from deployments-k8s.
I tried two different approaches:
- Create AF_PACKET with a veth pair before running the forwarder
- Bind another NIC to the node and run DPDK with vfio before running the forwarder
Both fail with the error I sent above.
EDIT:
I didn't find any examples with external VPP, so that is why I tried the approaches above
from deployments-k8s.
I just tried attaching another NIC to one of my k8s nodes, created an AF_PACKET interface, set its IP to the one from AWS console and added the ip as the NSM_TUNNEL_IP and I still get the same error.
Can you explain what I am missing here? Would appreciate the help.
from deployments-k8s.
Which external VPP version are you running?
from deployments-k8s.
ghcr.io/edwarnicke/govpp/vpp:v22.06-release
from deployments-k8s.
Sometimes I feel so stupid.
Instead of providing /var/run/vpp/external/api.sock
I passed /var/run/vpp/external/cli.sock
.
Forwarder went up and then I tried running everything on 1 node:
- VPP
- Forwarder
- NSE
- NSC
Everything works.
Now I will modify forwarders to use the 2nd NIC ip, as currently I can only pass NSM_TUNNEL_IP, which in my case is different for each forwarder, and then I will try to run the NSE on that node.
Running VPP as a daemon-set seems to be working.
from deployments-k8s.
With kernel2kernel example everything worked, but with kernel nsc to memif nse it fails because memif interface is not created.
This is the error I see in my nse vpp - unknown message: memif_socket_filename_add_del_v2_34223bdf: cannot support any of the requested mechanism
I saw #519, which removed the need for external vpp option, but I don't follow how is this supposed to work in the first place.
I understand memif needs a role=server, and role=client.
Now, in my use case, my forwarder is connected to the external vpp socket and both the forwarder and the vpp pods run on hostNetwork.
My kernel nsc and vpp nse run in separate ns namespaces (without hostNetwork).
Is it possible to see an example of an external vpp yaml with a memif interface?
Thanks
from deployments-k8s.
Sometimes I feel so stupid.
Instead of providing
/var/run/vpp/external/api.sock
I passed/var/run/vpp/external/cli.sock
. Forwarder went up and then I tried running everything on 1 node:
Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)
from deployments-k8s.
Thanks for the help, it worked.
I tried a couple of configurations:
- VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with
memif_socket_filename_add_del_v2_34223bdf
. - Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.
After seeing some issues with the cleanup of the interfaces while using the external VPP, I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.
At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.
Thanks
from deployments-k8s.
Sometimes I feel so stupid.
Instead of providing/var/run/vpp/external/api.sock
I passed/var/run/vpp/external/cli.sock
. Forwarder went up and then I tried running everything on 1 node:Not stupid. I don't believe in stupid users, I believe in doc bugs :) If you could let me know what you think might have precluded this misadventure in terms of doc improvements, I'd love that :)
I think verifying we are trying to connect to the api.sock
by running some initial command, like a show int
and if it fails, catch it and add a warning in addition to the error message that returns from govpp.
An example would be:
Original message
time="2022-08-22T14:03:15Z" level=info msg="Decoding sockclnt_create_reply failed: panic occurred during decoding message sockclnt_create_reply: runtime error: slice bounds out of range [10:0]" logger=govpp/socketclient
Add warning below the message from the govpp wrapper
time="2022-08-22T14:03:15Z" level=warn msg="sending command to /var/run/vpp/external/cli.sock failed, make sure you are trying to connect to the api.sock" logger=govpp/...
from deployments-k8s.
Thanks for the help, it worked.
I'm a little confused... it sounds above like v22.06-rc0-147-g1c5485ab8 worked, but then below you talk about it not working...
I tried a couple of configurations:
- VPP daemon set with forwarder and nse that both use it - with the patch I see the the nse can create the memif server, but forwarder still fails with
memif_socket_filename_add_del_v2_34223bdf
.
Was this with v22.06-rc0-147-g1c5485ab8 ?
- Running forwarder with its own vpp instance and the nse with the external VPP, which works fine.
After seeing some issues with the cleanup of the interfaces while using the external VPP,
Could you file a bug on those issues... we should be cleaning up those interfaces on restart of a forwarder using an external VPP... I'm not entirely sure we try to do that with an NSE using an external VPP.
I just added the /dev/vfio mount to the nse and used a local instance of vpp. It is easier to manage it and the healing / cleanup process is easier as well.
Ah... so external VPP is intrinsically more complex (coordination needed)... its super useful for cases where you want to increase performance.
I'd love to hear more about the details of what you are doing with vfio :)
I'd also be quite interested in what you are ultimately trying to achieve.
At the end of the day, I was wrong thinking 1 VPP is easier to manage, changed back to local vpp instances.
Thanks
from deployments-k8s.
I meant the VPP abstract socket feature works, but connecting an NSC to an NSE with both the NSE and the forwarder using the same VPP fails with the same error memif_socket_filename_add_del_v2_34223bdf
. The server side (nse) memif is created after running the VPP daemon-set with the image you provided, but the client side memif (forwarder) returns the error above although it uses the same VPP.
I will recreate the errors over the weekend and open up an issue with all the necessary information and logs regarding the single VPP use case.
Regarding the why I am doing this,
We are trying to use NSM at my company for various use cases, one of them is the ability to allow client VPNs to connect to the internet and other branches of their offices securely via chaining security functions (nsm composition).
Before jumping to all the chaining and various other use cases, I wanted to cover internet access and have a reliable benchmark between 1 NSC to 1 NSE, whether its on same machine or different machines to know the potential of the traffic I can push, before adding any other components in-between.
In the process, I created an NSE, which acts as the internet gateway.
To gain internet access from the NSE, I set it as hostNetwork, created a VETH pair and added an AF_PACKET interface and configured it to go to the internet via the default gateway.
These are my findings so far regarding simple use case of 1 NSC -> 1 NSE that goes to internet:
EKS 1.22 on AWS c5n.2xlarge nodes
- With all the default NSM configurations, 1 core to the NSE, 1 core to each forwarder and 1 core to the NSC
- Performance was in the region of XXX kbps.
- I noticed the ping between 2 memifs took 80ms, which is why I suspected the CPU, so I raised the NSE to have 3 cores and configured the local VPP to have 1 main core and 2 workers, raised the forwarders to have 3 cores as well with the same VPP config above, raised the NSC to have 3 cores
- After that, ran
speedtest-cli
and got 1-1.4Gbps download and 0.6-1Gbps upload.
- I decided to add a 2nd NIC with an elastic ip to each machine, installed DPDK, bounded the NIC with vfio-pci driver and mounted /dev/vfio to the nse.
- Afterwards, I ran the
speedtest-cli
again, which got me 1.6-2.4Gbps download and 1.2-1.8Gbps upload.
- Afterwards, I ran the
- Next up, I thought that if I can add all data plane components with vpp + dpdk, I will get maximum performance, assuming I have enough cores. Now, the issue with running separate vpp's is that as soon as you bind 1 vpp + dpdk to a NIC, you can't bind it anymore and reuse it. The option of attaching a NIC per forwarder + every NSE I will need seems too expensive, not only cost wise, but also cpu wise. Due to the polling mode driver I will lose 1 core per rx queue on each instance vpp + dpdk. So I tried to run a single VPP as a daemon set with 1 dpdk vfio bound NIC per node and run forwarder and all other vpp related workloads on it. I encountered the abstract socket issue, which I didn't even realize, until you helped me with it, but I saw a lot of interfaces are created and are not cleaned, so I discarded the idea and stuck with only mounting /dev/vfio to the nse that acts as the internet gateway
Hope this answers your questions.
Thanks
from deployments-k8s.
Related Issues (20)
- CI/CD: Release based updates are not deleting after the merge HOT 1
- Help about nsm HOT 1
- `Update References` commits are not signed and trigger `integration-k8s-kind` CI during releasing HOT 1
- Traffic disturbance 2 minutes after node restart HOT 4
- Question about VPP-forwarder HOT 1
- about nsm install HOT 1
- Implement k8s controller for NSM connections
- Implement k8s controller for NSM endpoints/network services to be able to comfortable work with custom registries in k8s
- admission-webhook-k8s stays in a NonReady state HOT 2
- Bug with re-deployment of nse-composition with kernel-interfaces HOT 8
- Cilium bug with multi-nodes NSM cluster.
- Bug with nse-composition example. HOT 1
- install nsm problem HOT 3
- Link for vl3 floating example doesn't work
- NSM in kubeadm cluster
- nse interface incorrect name HOT 1
- NSEs and Forwarders can have the same url when they register in a registry
- `nsc-memif` constantly heal the connection in the `local-nsmgr-local-nse-memif` test
- Add configuration for turning profiling on/off HOT 9
- Loadbalancer example stops working when the deployment is scaled up.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deployments-k8s.