networkservicemesh / deployments-k8s Goto Github PK

View Code? Open in Web Editor NEW

41.0 6.0 34.0 251.98 MB

License: Apache License 2.0

Shell 100.00%

networking nsm network examples cncf service-mesh containers networkservicemesh

deployments-k8s's Issues

TestKernel2Vxlan2Memif become unstable after last merges

Logs:
Build1:
logs1
Build2:
logs2

Potential problem:

By my investigation, I found that we got red ci in kind on these changes: 7b34262...7b7cb58

The problem might be related to

Request for helm based installation examples

Add helmchart based deployment examples

For example, a basic deployment with nsmgr, vpp-forwarder, spire

Destructive Chaos Testing: Integration tests (multi death cases)

Overview

Cover the following heal scenarios with integration tests:

Local NSMgr(r) +
1. Local Endpoint(d)
2. Remote NSMgr(d)
3. Remote Forwarder(d)
4. Remote Endpoint(d)
5. Registry(r)
Remote NSMgr(d) +
1. Remote Endpoint(d)
2. Registry(d)

Blockers

extended sandbox testing - networkservicemesh/sdk#899

Add example or explanation on how to interconnect ns endpoints (nse)

Good afternoon:

With this new approach for NSM, I have noticed that you have uploaded several cases of connectivity between NSC and NSE. However, there is no example on how to interconnect different NSEs in order to perform a bigger/more complex service with NSM in this case.

If possible, do you consider adding an example of such characteristics to this repository? In my humble opinion, I think it could be a very positive addition to the examples that are already present.

Thanks!

nsm-spire image

In NSM old gen there was an nsm-spire sidecar image for handling spire configuration. Is there any plan to create it for the next gen also? Or is there some recommended way how to handle spire workload registration based on config files, without entry create commands?

Add resource limits on all applications

We need to add resource limits for each app in https://github.com/networkservicemesh/deployments-k8s/tree/main/apps

Reduce tokens time from 24h to 10 minutes for each application

Motivation

Previously we've fixed issues with refresh/timeout:

networkservicemesh/sdk#778
networkservicemesh/sdk#650
networkservicemesh/sdk#520

and now we can reduce tokens expiration for each application to 10 minutes (it is 24h at this moment).

Also, we plan to add refresh/timeout examples, but it can be done separately:
https://github.com/orgs/networkservicemesh/projects/1#card-55928687
https://github.com/orgs/networkservicemesh/projects/1#card-55928794

cmd-nsc-vpp: issues with NOT default NSM_NAME

Hi!

When I set the NSM_NAME parameter of cmd-nsc-vpp to anything other than the default, I get this error:
Jun 17 16:19:42.714 [ERRO] [cmd:/bin/cmd-nsc-vpp] (19.1) proxyListener unable to listen on /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: listen unixpacket /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: bind: invalid argument

The interface seems to be in place for a second, but there are no neighbors and the pod keeps restarting.
vppctl show interface address
local0 (dn):
memif1/0 (up):
L3 172.16.1.96/32

I followed this guide: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/use-cases/Memif2Memif
If I don't set the NSM_NAME parameter, it works correctly.

Is it possible that the given name is not handled correctly somewhere, or could you help me on where to look for issues?

Request for information about Roles and ClusterRoles

Could you add some example Roles and ClusterRoles?
I need to define Roles/ClusterRoles for the cmd-registry-memory, cmd-nsmgr and cmd-forwarder-vpp and have no info about exactly what resources and verbs are needed for each of them. An example yaml would be very welcome.

Switch to using annotation for all kernel examples

Please switch all examples to use the annotations instead of just a bare NSC container with an ENV variable patches in.

Destructive Chaos Testing: local forwarder healing test

part of #1174

BasicSuite/TestMemif2Vxlan2Memif is not stable

Build

https://github.com/networkservicemesh/integration-k8s-kind/runs/2216228221

Logs

Containers logs.zip

Add OPA example

Desription

Currently, all examples pass with a correct token chain. To check that the invalid token chain will fail we could add an example with NSC that trying to request service with the expired token.

Motivation

This can help to user show how works OPA policies and this can help us to track regressions related to OPA stuff.

Cover all variants of connection between nsc, nse via cmd-forwarder-vpp

Currently, we have examples for:

kernel2kernel
memif2memif
kernel2vxlan2kernel

https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/basic#includes

These variants currently is not covered by integration:

kernel2memif
memif2kernel
kernel2vxlan2memif
memif2vxlan2kernel
memif2memif

TestRunBasicSuite/TestMemif2Vxlan2Memif can fail on ci

Logs

Container logs

Alpine container for webhook can fail on update

Expected Behavior

Postgresql should be correctly installed

Current Behavior

Postgresql is not installed on alpine, so TestWebhook failed

Failure Information (for bugs)

Sometimes apk update can return the next errors:

ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.13/main: temporary error (try again later)
ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.13/community: temporary error (try again later)

Proposal

Use postgres container as a client instead of alpine

Include cmd-exclude-prefixes-k8s

Add https://github.com/networkservicemesh/cmd-exclude-prefixes-k8s to the apps.
Deploy cmd-exclude-prefixes-k8s with all examples
Provide examples to show cmd-exclude-prefixes-k8s working correctly, and exclude prefixes being excluded from IP ranges

VPN example and SFC topic on NSM

Hello, Is there a plan to provide VPN example ? our team was referring to - https://networkservicemesh.io/docs/examples/vpn/ and it seems to be not available anymore? Also, we noticed SFC is no more listed as a feature on the NSM website. Apologies if this has been described already. It would be great to get some help on this.

Scalability Testing: Decompose Forwarder testing

Test plan

Use cases:

NSMgr sends request to Forwarder.
NSMgr sends refresh request to Forwarder.
NSMgr closes connection on Forwarder.
Forwarder closes expired connection with NSMgr.

Test scenarios:

NSMgr sends request to Forwarder:
- Setup
  1. Start fake NSMgr (handling both Client/Remote and Endpoint/Remote sides).
  2. Start Forwarder.
- Test
  1. Request C connections with Cm Client mechanism and Em Endpoint mechanism.
NSMgr sends refresh request to Forwarder:
- Setup
  1. Start fake NSMgr.
  2. Start Forwarder.
  3. Request C connections with Cm Client mechanism and Em Endpoint mechanism with T refresh time.
- Test
  1. Wait T time for refreshes.
NSMgr closes connection on Forwarder:
- Setup
  1. Start fake NSMgr.
  2. Start Forwarder.
  3. Request C connections with Cm Client mechanism and Em Endpoint mechanism.
- Test
  1. Close connections.
Forwarder closes expired connection with NSMgr:
- Setup
  1. Start fake NSMgr.
  2. Start Forwarder.
  3. Request C connections with Cm Client mechanism and Em Endpoint mechanism with T expiration time.
- Test
  1. Wait T time for closes.

Tasks

Create fake NSMgr CMD.
Estimation: 3h
Create MD integration test for (1) test scenario with C, Cm, Em variable parameters and measure time, CPU, memory usage for Forwarder during the test.
Estimation: 1d
Create MD integration test for (2) test scenario with C, Cm, Em, T variable parameters and measure time, CPU, memory usage for Forwarder during the test.
Estimation: 3h
Create MD integration test for (3) test scenario with C, Cm, Em variable parameters and measure time, CPU, memory usage for Forwarder during the test.
Estimation: 3h
Create MD integration test for (4) test scenario with C, Cm, Em, T variable parameters and measure time, CPU, memory usage for Forwarder during the test.
Estimation: 3h

Estimation

3d

Add resource requests for vpp-applications

Expected Behavior

The application uses only the quantity of resources it needs

Current Behavior

Vpp apps reserve more than they need

Context

If we specify a container's limits, but not its request, then request is set to match its limit.

Automerge job ca stuck and do not merge green PRs from NSMBot

Problem

Actual:

Expected: All green PRs from NSMBot should be merged.

We need to investigate and fix this.

Use spire federation in interdomain examples

Description

Currently, we are using sharing cert for two spire servers to make nsm working over two domains. To generate cert and key we are using openssl. It is can be improved if we'll rework spire deployments and spire examples to use spire federation feature.

Implementation details

Remove using openssl in spire examples:https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/spire
Apply https://spiffe.io/docs/latest/architecture/federation/readme/
Test that all examples are still working on CI.

TestRunMemorySuite/TestKernel2Vxlan2Kernel can fail on ci

Build

https://github.com/networkservicemesh/integration-k8s-kind/runs/2338377317

Logs

Containers logs (6).zip

NSE Composition: MD integration tests

Overview

Migrate secure-intranet example to MD tests for multi repo.

References

secure-intranet - https://github.com/networkservicemesh/examples/tree/master/examples/secure-intranet

Blockers

Depends on ~~networkservicemesh/sdk#871, networkservicemesh/sdk#878, networkservicemesh/sdk#879,~~ networkservicemesh/sdk#880.
Depends on networkservicemesh/cmd-nse-firewall-vpp#1.

Sometimes heal continue working indefinitely

Expected Behavior

Heal never starts, or stops right after client pod deletion, or at the very least after few minutes after client deletion.

Current Behavior

Sometimes heal works indefinitely.

Steps to Reproduce

Deploy clients and endpoints.
Remove everything.
Wait few minutes.
Check logs.
Repeat until you see messages like [ERRO] [cmd:Nsmgr] [healServer:processHeal] Failed to heal connection cbdd4f32-354d-4983-9caf-92ae5fc42f53: no match endpoints or all endpoints fail: context deadline exceeded

I got this issue after running scalability tests (they are not uploaded anywhere at the moment of creating this issue). These tests are basically Kernek2Kerner tests on steroids: they create many clients and endpoints, and each client makes not 1 but many requests, so they should be functionally identical to running Kernek2Kerner test many times.

Context

deployments-k8s git revision 18fdb9c (though, I believe I also saw this exact issue on a revision from 2 weeks ago).

Failure Logs

Warning, 50 MB of logs: nsmgr-2021-07-02T14.01.27+07.00.zip
These are full logs of nsmgr during several scalability tests and some time after the last test ended. I added some delay after each test, to check if heal works. After first few tests there were only register requests from forwarder, but after some time after the last test ended I discovered that logs contain a lot of heal requests.
Because the tests make a lot of requests (50 during last test, 50-100 during previous tests), so logs are not pretty.

Destructive Chaos Testing: Integration tests

Overview

Cover the following heal scenarios with integration tests:

Local NSMgr(r) +
1. none - #1789
2. Local Endpoint(d) - moved to #1928
3. Remote NSMgr(r/d) - moved to #1928
4. Remote Forwarder(d) - moved to #1928
5. Remote Endpoint(d) - moved to #1928
Local Forwarder(d) - #1789
Local Endpoint(r/d) - #1789
Remote NSMgr(r/d) +
1. none - #1789
2. Remote Endpoint(d) - moved to #1928
Remote Forwarder(d) - #1789
Remote Endpoint(r/d) - #1789
Registry(r) - #1789

Blockers

sandbox testing - networkservicemesh/sdk#898

Scalability Testing: Decompose NSMgr testing

Test plan

Use cases:

Endpoint registers on NSMgr.
Endpoint updates itself on NSMgr.
NSMgr unregisters expired Endpoint from itself.
Forwarder registers on NSMgr.
Forwarder updates itself on NSMgr.
NSMgr unregisters expired Forwarder from itself.
Client sends request to NSMgr.
Client sends refresh request to NSMgr.
Client closes connection on NSMgr.

Test scenarios:

Endpoint registers on NSMgr:
- Setup
  1. Start fake Registry (just returning OK to all Register/Unregister events).
  2. Start NSMgr.
- Test
  1. Start E Endpoints each registering itself on NSMgr.
Endpoint updates itself on NSMgr:
- Setup
  1. Start fake Registry (just returning OK to all Register/Unregister events).
  2. Start NSMgr.
  3. Start E Endpoints each registering itself on NSMgr with T update time.
- Test
  1. Wait T time for updates.
NSMgr unregisters expired Endpoint from itself:
- Setup
  1. Start fake Registry (just returning OK to all Register/Unregister events).
  2. Start NSMgr.
  3. Start E Endpoints each registering itself on NSMgr with T expiration time.
- Test
  1. Wait T time for unregisters.
Forwarder registers on NSMgr:
- (1)
Forwarder updates itself on NSMgr:
- (2)
NSMgr unregisters expired Forwarder from itself:
- (3)
Client sends request to NSMgr:
- Setup
  1. Start NSMgr.
  2. Start E Endpoints each registering itself on NSMgr.
- Test
  1. Start C Clients each requesting for E network services (Endpoints).
Client sends refresh request to NSMgr:
- Setup
  1. Start NSMgr.
  2. Start E Endpoints each registering itself on NSMgr.
  3. Start C Clients each requesting for E Endpoints with T refresh time.
- Test
  1. Wait T time for refreshes.
Client closes connection on NSMgr.
- Setup
  1. Start NSMgr.
  2. Start E Endpoints each registering itself on NSMgr.
  3. Start C Clients each requesting for E Endpoints.
- Test
  1. Cancel Clients.
NSMgr closes expired connection with Client.
- Setup
  1. Start NSMgr.
  2. Start E Endpoints each registering itself on NSMgr.
  3. Start C Clients each requesting for E Endpoints with T expiration time.
- Test
  1. Wait T time for closes.

Tasks

Create fake Registry CMD (returns OK for all).
Estimation: 2h
Create MD integration test for (1) test scenario with E variable parameter and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 1d
Create MD integration test for (2) test scenario with E, T variable parameters and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 3h
Create MD integration test for (3) test scenario with E, T variable parameters and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 3h
Create MD integration test for (4) test scenario with E variable parameter and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 1d
Create MD integration test for (5) test scenario with E, T variable parameters and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 3h
Create MD integration test for (6) test scenario with E, T variable parameters and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 3h
Create fake Forwarder CMD (simply requests NSMgr).
Estimation: 2h
Create fake Endpoint CMD (returns OK for all).
Estimation: 2h
Create MD integration test for (7) test scenario with E, C variable parameters and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 1d
Create MD integration test for (8) test scenario with E, C, T variable parameters and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 3h
Create MD integration test for (9) test scenario with E, C variable parameters and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 3h
Create MD integration test for (10) test scenario with E, C, T variable parameters and measure time, CPU, memory usage for NSMgr during the test.
Estimation: 3h

Estimation

8d

TestRunHealSuite/TestRemote_nsmgr_restart is not stable

Build

v1.19.11
https://github.com/networkservicemesh/integration-k8s-kind/runs/3012604125?check_suite_focus=true

Logs

Containers logs_Remote_nsmgr_restart.zip

memif2vxlan2memif: errors in nsmgr, no response for nsc

Hi,

I'm trying to deploy the memif2vxlan2memif example but getting strange errors in the local nsmgr of the nsc. On one machine the connection is fine, on the other the NSC is not getting a response, but the error is there in both try.

I post the logs from the NSC and the NSMGR in both cases (bad, working). Could you please take a look at them?
bad-memif2vxlan2memif-nsc.txt
bad-memif2vxlan2memif-nsmgr.txt
working-memif2vxlan2memif-nsc.txt
working-memif2vxlan2memif-nsmgr.txt

The difference between the machines is the k8s version:
Working machine:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"ec2760d6d916781de466541a6babb4309766c995", GitTreeState:"clean", BuildDate:"2021-02-27T17:18:03Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

NOT working machine:

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:10:43Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-28T05:33:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Thanks!

Please add mechanism permutations for interdomain

Currently interdomain only has Kernel2Vxlan2Kernel, we should test the other permutations as well.

Scale from Zero not using multiple matches

Why is:

deployments-k8s/examples/features/scale-from-zero/autoscale-netsvc.yaml

Lines 1 to 17 in 838d4f0

 --- 

 apiVersion: networkservicemesh.io/v1 

 kind: NetworkService 

 metadata: 

 name: autoscale-icmp-responder 

 namespace: nsm-system 

 spec: 

 payload: ETHERNET 

 name: autoscale-icmp-responder 

 matches: 

 - source_selector: 

 routes: 

 - destination_selector: 

 app: nse-icmp-responder 

 nodeName: "{{.nodeName}}" 

 - destination_selector: 

 app: icmp-responder-supplier

Still being done with a single match rather than two matches as specified:

networkservicemesh/sdk#892

I thought we had fixed this already?

Use nodeName instead of kubernetes.io/hostname label

Overview

Currently this repository uses spec.nodeSelector[kubernetes.io/hostname] label for defining node affinity.
This label is equivalent to spec.nodeName, unless the label was deliberately changed.
Using spec.nodeName field for node affinity would be simpler.

Example:

Replace this:

spec:
  nodeSelector:
    kubernetes.io/hostname: ${NODES[0]}

with this:

spec:
  nodeName: ${NODES[0]}

Request for example of running multiple simultaneous forwarders

Implement/document a test scenario for running and configuring at least two different types of forwarders simultaneously.

For instance the vpp forwarder alongside the sriov forwarder

[Question] Do we need to add Caddyfile based NSE?

Motivation

We could create one cmd-${TODO: consider name}-nse that would parse config based on caddyfile format. The main point is: Have a possible to build NSEs based on SDKs without building new app. That can be used for testing goals (we do not more add new cmd-repo for each new typically NSE) and for users that do not want to code NSE and want just a play with NSM.

Example

To create an copy of current cmd-icmp-responder we could use this config:

my-endpoint {
   point2pointipam
   recvfd
   mechanisms {
      'kernel' {
            kernel
      }
   }
   dnscontext
   sendfd
}

To create a copy of the current cmd-icmp-responder-vpp we could use this config:

my-vpp-endpoint {
   point2pointipam
   mechanisms {
      'memif' {
        sendfd
        up
        connectioncontext
        tag
        memif
      }
   }
}

Let me know if this direction can be interested, then I can provide more technical details on how to achieve this :)

Apply Web admissions controller to deployments-K8s

Currently in deployment-k8s we are manually putting the nsc containers into the Pods... we should switch to doing that with a web admissions controller once:

networkservicemesh/cmd-admission-webhook-k8s#1

Is complete.

IP collisions when using more than 1 replica of icmp-responder

Overview

When we set replicas value for an nse-kernel deployment to more than 1, we create several identical endpoints (=endpoints with the same configuration).

When client makes several requests, and these requests go to different endpoints with the same configuration, these endpoints can give the client the same IP addresses, and second request will overwrite the connection of the first request.
While we could call this an invalid configuration in case of completely distinct endpoints, from user side there is nothing to be done when using replicas, so this must be a valid configuration.

The issue is present when the following preconditions are met:

There are at least 2 endpoints with the same IP prefix in config
A client makes at least 2 requests, which can go the those endpoints

Context

When making scalability tests I slightly modified icmp responder to register itself for several services and set 2 replicas, and added many connections to client config.
I immediately stumbled upon instability in tests: sometimes clients were getting all required connections, sometimes they were few connections short. This was caused, as explained above, by the fact that 1 client went to 2 servers and got the same IP for 2 distinct connections, and second connection overwrote the first.

Possible solutions

Solution 1

We have the excluded_prefixes field in request connection context, which we can use to pass the list of occupied IP addresses, so the endpoint doesn't try to overwrite them.

However, this solution assumes that we know occupied addresses beforehand.
But if we were to make requests from several threads, we wouldn't know which addresses could be taken by other threads.
I'm not sure if it would be possible to solve this without some kind of global mutex in the NSC, which would prevent parallel requests.

Solution 2

Maybe we could have some synchronization in the moment of creating a connection.
At the time of writing this I haven't researched how hard it would be to add such synchronization, and which exact components we would need to change or create.

Also I would think that maybe there is some method of non-destructive interface assignment, so that we would get an error on an attempt to use already occupied IP address instead of silently overwriting previous connection.

Solution 3

We could add some synchronization for endpoints.
Imagine we had some kind of registry, that would hold the data of already used IP prefixes.
Endpoints could query this registry on startup, so when we create several replicas of an endpoint, each instance would get its own IP prefix, so they just wouldn't have the same config.

Intermediate conclusion

Solution 1 seems simple but not universal to me.
Solution 2 seems very promising, but I'm yet to verify how hard (if possible) it would be to implement.
Solution 3 would probably require quite a lot of changes inside all of the endpoints, and we would also introduce a new concept, which would increase complexity of the system, and I'm really not sure it is justified.

Scalability Testing: Decompose Endpoint testing

Test plan

Use cases:

NSMgr sends request to Endpoint.
NSMgr sends refresh request to Endpoint.
NSMgr closes connection on Endpoint.
Endpoint closes expired connection with NSMgr.

Test scenarios:

NSMgr sends request to Endpoint:
- Setup
  1. Start fake NSMgr.
  2. Start Endpoint.
- Test
  1. Request C connections.
NSMgr sends refresh request to Endpoint:
- Setup
  1. Start fake NSMgr.
  2. Start Endpoint.
  3. Request C connections.
- Test
  1. Wait T time for refreshes.
NSMgr closes connection on Endpoint:
- Setup
  1. Start fake NSMgr.
  2. Start Endpoint.
  3. Request C connections.
- Test
  1. Close connections.
Endpoint closes expired connection with NSMgr:
- Setup
  1. Start fake NSMgr.
  2. Start Endpoint.
  3. Request C connections.
- Test
  1. Wait T time for closes.

Tasks

Create fake NSMgr CMD.
Estimation: 3h
Create MD integration test for (1) test scenario with C variable parameter and measure time, CPU, memory usage for Endpoint during the test.
Estimation: 1d
Create MD integration test for (2) test scenario with C, T variable parameters and measure time, CPU, memory usage for Endpoint during the test.
Estimation: 3h
Create MD integration test for (3) test scenario with C variable parameter and measure time, CPU, memory usage for Endpoint during the test.
Estimation: 3h
Create MD integration test for (4) test scenario with C, T variable parameters and measure time, CPU, memory usage for Endpoint during the test.
Estimation: 3h

Estimation

3d

Scalability Testing: Decompose Registry testing

Test plan

Use cases:

NSMgr registers Endpoint on Registry.
NSMgr updates Endpoint on Registry.
NSMgr unregisters Endpoint from Registry.
NSMgr creates a find request to Registry.
NSMgr creates a watching find request to Registry.
Registry unregisters expired Endpoint from itself.

Test scenarios:

NSMgr registers Endpoint on Registry:
- Setup
  1. Start Registry.
- Test
  1. Start N fake NSMgrs each registering E Endpoints.
NSMgr updates Endpoint on Registry:
- Setup
  1. Start Registry.
  2. Start N fake NSMgrs each registering E Endpoints with T update time.
- Test
  1. Wait T time for updates.
NSMgr unregisters Endpoint from Registry:
- Setup
  1. Start Registry.
  2. Start N fake NSMgrs each registering E Endpoints.
- Test
  1. Unregister endpoints.
NSMgr creates a find request to Registry:
- Setup
  1. Start Registry.
  2. Start N fake NSMgrs each registering E Endpoints.
- Test
  1. Create F find requests from each NSMgr to Registry.
NSMgr creates a watching find request to Registry:
- Setup
  1. Start Registry.
  2. Start N fake NSMgrs each registering E Endpoints.
- Test
  1. Create W watching find requests from each NSMgr to Registry.
Registry unregisters expired Endpoint from itself:
- Setup
  1. Start Registry.
  2. Start N fake NSMgrs each registering E Endpoints with T expiration time.
- Test
  1. Wait T time for unregisters.

Tasks

Create fake NSMgr CMD.
Estimation: 1d
Create MD integration test for (1) test scenario with N, E variable parameters and measure time, CPU, memory usage for memory, k8s Registry during the test.
Estimation: 1d
Create MD integration test for (2) test scenario with N, E, T variable parameters and measure time, CPU, memory usage for memory, k8s Registry during the test.
Estimation: 3h
Create MD integration test for (3) test scenario with N, E variable parameters and measure time, CPU, memory usage for memory, k8s Registry during the test.
Estimation: 3h
Create MD integration test for (4) test scenario with N, E, F variable parameters and measure time, CPU, memory usage for memory, k8s Registry during the test.
Estimation: 1d
Create MD integration test for (5) test scenario with N, E, W variable parameters and measure time, CPU, memory usage for memory, k8s Registry during the test.
Estimation: 3h
Create MD integration test for (6) test scenario with N, E, T variable parameters and measure time, CPU, memory usage for memory, k8s Registry during the test.
Estimation: 3h

Estimation

5d

Scalability Testing: Decompose Client testing

Test plan

Use cases:

Client sends request to NSMgr.
Client sends refresh request to NSMgr.

Test scenarios:

Client sends request to NSMgr:
- Setup
  1. Start fake NSMgr (same as fake Endpoint from #1016).
- Test
  1. Start Client requesting C connections.
Client sends refresh request to NSMgr:
- Setup
  1. Start fake NSMgr.
  2. Start Client requesting C connections with T refresh time.
  3. Wait some time before T.
- Test
  1. Wait time until T for refreshes.

Tasks

Create MD integration test for (1) test scenario with C variable parameter and measure time, CPU, memory usage for Client during the test.
Estimation: 3h
Create MD integration test for (2) test scenario with C, T variable parameters and measure time, CPU, memory usage for Client during the test.
Estimation: 3h

Estimation

1d

Add memif2memif example

NSM interface deleting periodically

Logs

stderr F Jun 17 10:50:48.232 [ERRO] [cmd:[/bin/app]] [healServer:processHeal] Failed to heal connection alpine-cl-0: Error returned from sdk/pkg/networkservice/common/authorize/authorizeClient.Request: rpc error: code = PermissionDenied desc = no sufficient privileges

Steps to reproduce

Run webhook example: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/features/webhook
don't cleanup
wait for 20-30 min

Actual:

nsm-toggling.webm.zip

Expected:
NSM interface should not be deleted if data plane and control plane are fine

iperf examples

Add examples for iperf testing.

iperf is usually run as a client and a server.

The iperf client can be run as any other workload using the nsc client, similar to Kernel2Kernel or Kernel2Vxlan2Kernel but adding iperf client to as a container to the Pod spec for nsc-kernel and iperf server to the Pod spec for nse-kernel.

Please note: you can use gotestmd to build go based tests in integration-tests and run in integration-k8s-kind. In this way you can simply document how to use iperf and the tests will be automatically generated from that documentation.

While intgration-k8s-kind won't optimize performance, its a very fast environment to get going in while developing those tests.

TestRunHealSuite/TestRemote_forwarder_death in not stable

Build:

v1.21.1
https://github.com/networkservicemesh/integration-k8s-kind/runs/2997070251?check_suite_focus=true

Logs

Containers logs_Remote_forwarder_death.zip

Add topology aware scale-from-zero running of NSEs in response to NSC demand example

Implementation details

Add new app cmd-nse-supplier-k8s in apps/
Add a new example for the feature suite. See at ipv6 example from feature suite: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/features/ipv6/Kernel2Kernel
Make sure that scenario is working networkservicemesh/sdk#821 (comment)

Note: Network service for cmd-nse-supplier-k8s should be applied via kubectl

References

networkservicemesh/sdk#892

End to end DNS integration: add DNS examples

Description

We need to add examples of using DNS with NSM.

Estimation

Scalability Testing: System NSM testing

Test plan

Use cases:

Client requests Local Endpoint.
Client requests Remote Endpoint.

Test scenarios:

Client requests Local Endpoint.
- Setup
  1. Start NSM (Registry, NSMgr, Forwarder) on single node.
  2. Start E endpoints implementing N network services on the same node.
- Test
  1. Start C clients each requesting R network services on the same node.
Client requests Remote Endpoint.
- Setup
  1. Start NSM (Registry, NSMgr, Forwarder) on 2 nodes.
  2. Start E endpoints implementing N network services on the second node.
- Test
  1. Start C clients each requesting R network services on the first node.

Tasks

Investigate how to run MD integration test with variable parameters to build graph dependencies for time, CPU, memory usage of NSM components during the test over those parameters.
Estimation: 3d
Create MD integration test for (1) test scenario with E, N, C, R variable parameters and measure time, CPU, memory usage for all NSM components during the test.
Estimation: 2d
Create MD integration test for (2) test scenario with E, N, C, R variable parameters and measure time, CPU, memory usage for all NSM components during the test.
Estimation: 1d

Estimation

6d

Why is NS name appearing twice in yaml?

Why is the NS name appearing twice in the yaml

deployments-k8s/examples/features/scale-from-zero/autoscale-netsvc.yaml

Line 5 in 838d4f0

name: autoscale-icmp-responder

and

deployments-k8s/examples/features/scale-from-zero/autoscale-netsvc.yaml

Line 9 in 838d4f0

name: autoscale-icmp-responder

I would expect the NS name to be mapped into the metadata.name...

Destructive Chaos Testing: remote forwarder healing test

Issue

In remote forwarder healing scenario, when we kill remote forwarder and then, after it's restored, trying to ping from nsc to nse again it is not working.

part of #1174

Add wireguard combination examples

Sometimes Close doesn't reach nsmgr

When running scalability tests (not yet uploaded) I found an issue with heal working indefinitely.
When investigating it, I found that one of the reasons this was happening was that nsmgr never received Close for some of the connections.
Logs: logs.zip
There is a bunch of connections in these logs. You can use grep 79b30cf1-1629-440c-8507-7e535d60295d to get the part of the logs that allows you to see the issue.

	---
	apiVersion: networkservicemesh.io/v1
	kind: NetworkService
	metadata:
	name: autoscale-icmp-responder
	namespace: nsm-system
	spec:
	payload: ETHERNET
	name: autoscale-icmp-responder
	matches:
	- source_selector:
	routes:
	- destination_selector:
	app: nse-icmp-responder
	nodeName: "{{.nodeName}}"
	- destination_selector:
	app: icmp-responder-supplier

networkservicemesh / deployments-k8s Goto Github PK

deployments-k8s's Issues

Potential problem:

Overview

Blockers

Motivation

Build

Logs

Desription

Motivation

Logs

Expected Behavior

Current Behavior

Failure Information (for bugs)

Proposal

Test plan

Tasks

Estimation

Expected Behavior

Current Behavior

Context

Problem

Description

Implementation details

Build

Logs

Overview

References

Blockers

Expected Behavior

Current Behavior

Steps to Reproduce

Context

Failure Logs

Overview

Blockers

Test plan

Tasks

Estimation

Build

Logs

Overview

Example:

Motivation

Example

Overview

Context

Possible solutions

Solution 1

Solution 2

Solution 3

Intermediate conclusion

Test plan

Tasks

Estimation

Test plan

Tasks

Estimation

Test plan

Tasks

Estimation

Logs

Steps to reproduce

Build:

Logs

Implementation details

References

Related to

Description

Estimation

Test plan

Tasks

Estimation

Issue

Recommend Projects

Recommend Topics

Recommend Org