networkservicemesh / deployments-k8s Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Logs:
Build1:
logs1
Build2:
logs2
By my investigation, I found that we got red ci in kind on these changes: 7b34262...7b7cb58
The problem might be related to
Add helmchart based deployment examples
For example, a basic deployment with nsmgr, vpp-forwarder, spire
Cover the following heal scenarios with integration tests:
Good afternoon:
With this new approach for NSM, I have noticed that you have uploaded several cases of connectivity between NSC and NSE. However, there is no example on how to interconnect different NSEs in order to perform a bigger/more complex service with NSM in this case.
If possible, do you consider adding an example of such characteristics to this repository? In my humble opinion, I think it could be a very positive addition to the examples that are already present.
Thanks!
In NSM old gen there was an nsm-spire sidecar image for handling spire configuration. Is there any plan to create it for the next gen also? Or is there some recommended way how to handle spire workload registration based on config files, without entry create commands?
We need to add resource limits for each app in https://github.com/networkservicemesh/deployments-k8s/tree/main/apps
Previously we've fixed issues with refresh
/timeout
:
networkservicemesh/sdk#778
networkservicemesh/sdk#650
networkservicemesh/sdk#520
and now we can reduce tokens expiration for each application to 10 minutes (it is 24h at this moment).
Also, we plan to add refresh
/timeout
examples, but it can be done separately:
https://github.com/orgs/networkservicemesh/projects/1#card-55928687
https://github.com/orgs/networkservicemesh/projects/1#card-55928794
Hi!
When I set the NSM_NAME parameter of cmd-nsc-vpp to anything other than the default, I get this error:
Jun 17 16:19:42.714 [ERRO] [cmd:/bin/cmd-nsc-vpp] (19.1) proxyListener unable to listen on /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: listen unixpacket /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: bind: invalid argument
The interface seems to be in place for a second, but there are no neighbors and the pod keeps restarting.
vppctl show interface address
local0 (dn):
memif1/0 (up):
L3 172.16.1.96/32
I followed this guide: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/use-cases/Memif2Memif
If I don't set the NSM_NAME parameter, it works correctly.
Is it possible that the given name is not handled correctly somewhere, or could you help me on where to look for issues?
Could you add some example Roles and ClusterRoles?
I need to define Roles/ClusterRoles for the cmd-registry-memory, cmd-nsmgr and cmd-forwarder-vpp and have no info about exactly what resources and verbs are needed for each of them. An example yaml would be very welcome.
Please switch all examples to use the annotations instead of just a bare NSC container with an ENV variable patches in.
part of #1174
Currently, all examples pass with a correct token chain. To check that the invalid token chain will fail we could add an example with NSC that trying to request service with the expired token.
This can help to user show how works OPA policies and this can help us to track regressions related to OPA stuff.
Currently, we have examples for:
kernel2kernel
memif2memif
kernel2vxlan2kernel
https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/basic#includes
These variants currently is not covered by integration:
kernel2memif
memif2kernel
kernel2vxlan2memif
memif2vxlan2kernel
memif2memif
Postgresql should be correctly installed
Postgresql is not installed on alpine, so TestWebhook
failed
Sometimes apk update
can return the next errors:
ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.13/main: temporary error (try again later)
ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.13/community: temporary error (try again later)
Use postgres
container as a client instead of alpine
Hello, Is there a plan to provide VPN example ? our team was referring to - https://networkservicemesh.io/docs/examples/vpn/ and it seems to be not available anymore? Also, we noticed SFC is no more listed as a feature on the NSM website. Apologies if this has been described already. It would be great to get some help on this.
Use cases:
Test scenarios:
3d
The application uses only the quantity of resources it needs
Vpp apps reserve more than they need
If we specify a container's limits, but not its request, then request is set to match its limit.
Currently, we are using sharing cert for two spire servers to make nsm working over two domains. To generate cert and key we are using openssl. It is can be improved if we'll rework spire deployments and spire examples to use spire federation feature.
Migrate secure-intranet
example to MD tests for multi repo.
secure-intranet
- https://github.com/networkservicemesh/examples/tree/master/examples/secure-intranetDepends on networkservicemesh/sdk#871, networkservicemesh/sdk#878, networkservicemesh/sdk#879, networkservicemesh/sdk#880.
Depends on networkservicemesh/cmd-nse-firewall-vpp#1.
Heal never starts, or stops right after client pod deletion, or at the very least after few minutes after client deletion.
Sometimes heal works indefinitely.
[ERRO] [cmd:Nsmgr] [healServer:processHeal] Failed to heal connection cbdd4f32-354d-4983-9caf-92ae5fc42f53: no match endpoints or all endpoints fail: context deadline exceeded
I got this issue after running scalability tests (they are not uploaded anywhere at the moment of creating this issue). These tests are basically Kernek2Kerner tests on steroids: they create many clients and endpoints, and each client makes not 1 but many requests, so they should be functionally identical to running Kernek2Kerner test many times.
deployments-k8s git revision 18fdb9c (though, I believe I also saw this exact issue on a revision from 2 weeks ago).
Warning, 50 MB of logs: nsmgr-2021-07-02T14.01.27+07.00.zip
These are full logs of nsmgr during several scalability tests and some time after the last test ended. I added some delay after each test, to check if heal works. After first few tests there were only register requests from forwarder, but after some time after the last test ended I discovered that logs contain a lot of heal requests.
Because the tests make a lot of requests (50 during last test, 50-100 during previous tests), so logs are not pretty.
Use cases:
Test scenarios:
8d
Hi,
I'm trying to deploy the memif2vxlan2memif example but getting strange errors in the local nsmgr of the nsc. On one machine the connection is fine, on the other the NSC is not getting a response, but the error is there in both try.
I post the logs from the NSC and the NSMGR in both cases (bad, working). Could you please take a look at them?
bad-memif2vxlan2memif-nsc.txt
bad-memif2vxlan2memif-nsmgr.txt
working-memif2vxlan2memif-nsc.txt
working-memif2vxlan2memif-nsmgr.txt
The difference between the machines is the k8s version:
Working machine:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"ec2760d6d916781de466541a6babb4309766c995", GitTreeState:"clean", BuildDate:"2021-02-27T17:18:03Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
NOT working machine:
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:10:43Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-28T05:33:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Thanks!
Currently interdomain only has Kernel2Vxlan2Kernel, we should test the other permutations as well.
Why is:
Still being done with a single match rather than two matches as specified:
I thought we had fixed this already?
Currently this repository uses spec.nodeSelector[kubernetes.io/hostname]
label for defining node affinity.
This label is equivalent to spec.nodeName
, unless the label was deliberately changed.
Using spec.nodeName
field for node affinity would be simpler.
Replace this:
spec:
nodeSelector:
kubernetes.io/hostname: ${NODES[0]}
with this:
spec:
nodeName: ${NODES[0]}
Implement/document a test scenario for running and configuring at least two different types of forwarders simultaneously.
For instance the vpp forwarder alongside the sriov forwarder
We could create one cmd-${TODO: consider name}-nse that would parse config based on caddyfile format. The main point is: Have a possible to build NSEs based on SDKs without building new app
. That can be used for testing goals (we do not more add new cmd-repo for each new typically NSE) and for users that do not want to code NSE and want just a play with NSM.
To create an copy of current cmd-icmp-responder we could use this config:
my-endpoint {
point2pointipam
recvfd
mechanisms {
'kernel' {
kernel
}
}
dnscontext
sendfd
}
To create a copy of the current cmd-icmp-responder-vpp we could use this config:
my-vpp-endpoint {
point2pointipam
mechanisms {
'memif' {
sendfd
up
connectioncontext
tag
memif
}
}
}
Let me know if this direction can be interested, then I can provide more technical details on how to achieve this :)
Currently in deployment-k8s we are manually putting the nsc containers into the Pods... we should switch to doing that with a web admissions controller once:
networkservicemesh/cmd-admission-webhook-k8s#1
Is complete.
When we set replicas
value for an nse-kernel deployment to more than 1, we create several identical endpoints (=endpoints with the same configuration).
When client makes several requests, and these requests go to different endpoints with the same configuration, these endpoints can give the client the same IP addresses, and second request will overwrite the connection of the first request.
While we could call this an invalid configuration in case of completely distinct endpoints, from user side there is nothing to be done when using replicas, so this must be a valid configuration.
The issue is present when the following preconditions are met:
When making scalability tests I slightly modified icmp responder to register itself for several services and set 2 replicas, and added many connections to client config.
I immediately stumbled upon instability in tests: sometimes clients were getting all required connections, sometimes they were few connections short. This was caused, as explained above, by the fact that 1 client went to 2 servers and got the same IP for 2 distinct connections, and second connection overwrote the first.
We have the excluded_prefixes
field in request connection context, which we can use to pass the list of occupied IP addresses, so the endpoint doesn't try to overwrite them.
However, this solution assumes that we know occupied addresses beforehand.
But if we were to make requests from several threads, we wouldn't know which addresses could be taken by other threads.
I'm not sure if it would be possible to solve this without some kind of global mutex in the NSC, which would prevent parallel requests.
Maybe we could have some synchronization in the moment of creating a connection.
At the time of writing this I haven't researched how hard it would be to add such synchronization, and which exact components we would need to change or create.
Also I would think that maybe there is some method of non-destructive interface assignment, so that we would get an error on an attempt to use already occupied IP address instead of silently overwriting previous connection.
We could add some synchronization for endpoints.
Imagine we had some kind of registry, that would hold the data of already used IP prefixes.
Endpoints could query this registry on startup, so when we create several replicas of an endpoint, each instance would get its own IP prefix, so they just wouldn't have the same config.
Solution 1 seems simple but not universal to me.
Solution 2 seems very promising, but I'm yet to verify how hard (if possible) it would be to implement.
Solution 3 would probably require quite a lot of changes inside all of the endpoints, and we would also introduce a new concept, which would increase complexity of the system, and I'm really not sure it is justified.
Use cases:
Test scenarios:
3d
Use cases:
Test scenarios:
5d
Use cases:
Test scenarios:
1d
stderr F Jun 17 10:50:48.232 [ERRO] [cmd:[/bin/app]] [healServer:processHeal] Failed to heal connection alpine-cl-0: Error returned from sdk/pkg/networkservice/common/authorize/authorizeClient.Request: rpc error: code = PermissionDenied desc = no sufficient privileges
Actual:
Expected:
NSM interface should not be deleted if data plane and control plane are fine
Add examples for iperf testing.
iperf is usually run as a client and a server.
The iperf client can be run as any other workload using the nsc client, similar to Kernel2Kernel or Kernel2Vxlan2Kernel but adding iperf client to as a container to the Pod spec for nsc-kernel and iperf server to the Pod spec for nse-kernel.
Please note: you can use gotestmd to build go based tests in integration-tests and run in integration-k8s-kind. In this way you can simply document how to use iperf and the tests will be automatically generated from that documentation.
While intgration-k8s-kind won't optimize performance, its a very fast environment to get going in while developing those tests.
cmd-nse-supplier-k8s
in apps/
Note: Network service for cmd-nse-supplier-k8s
should be applied via kubectl
networkservicemesh/cmd-nse-icmp-responder#133
networkservicemesh/sdk#825
We need to add examples of using DNS with NSM.
7h
Use cases:
Test scenarios:
6d
Why is the NS name appearing twice in the yaml
and
?
I would expect the NS name to be mapped into the metadata.name...
In remote forwarder healing scenario, when we kill remote forwarder and then, after it's restored, trying to ping from nsc to nse again it is not working.
part of #1174
When running scalability tests (not yet uploaded) I found an issue with heal working indefinitely.
When investigating it, I found that one of the reasons this was happening was that nsmgr never received Close
for some of the connections.
Logs: logs.zip
There is a bunch of connections in these logs. You can use grep 79b30cf1-1629-440c-8507-7e535d60295d
to get the part of the logs that allows you to see the issue.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.