Giter Club home page Giter Club logo

coil's People

Contributors

bells17 avatar binoue avatar cellebyte avatar chez-shanpu avatar d-kuro avatar dependabot[bot] avatar dulltz avatar kfyharukz avatar kmdkuk avatar llamerada-jp avatar masa213f avatar masap avatar mitsutaka avatar morimoto-cybozu avatar nyatsume avatar satoru-takeuchi avatar tapih avatar tenstad avatar terassyi avatar tkna avatar ueokande avatar umezawatakeshi avatar ymmt2005 avatar yokaze avatar ysksuzuki avatar yz775 avatar zeroalphat avatar zoetrope avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coil's Issues

Allow modifications to Egress destinations

What

Currently, the destinations field of Egress resource is not editable.
Users need to recreate the Egress resource if they want to modify it, for example, when they want to add another destination.

How

Allow users to edit Egress destinations.
This means Coil should reconfigure NAT rules in the Pod network namespace
and would incur a significant change in coild.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Cannot create AddressPool objects due to mutating webhook?

Describe the bug

When applying an example addresspool pool, it just shows the webhook is a 404

kubectl apply -f addresspool.yaml

Error from server (InternalError): error when creating "0101_addresspool.yaml": Internal error occurred: failed calling webhook "maddresspool.kb.io": the server could not find the requested resource

Environments

  • Coil Version: 2.0.9
  • OS: K8s v1.20.7 on Debian 10 (Via kubespray v2.16.0)

To Reproduce
Steps to reproduce the behavior:

  1. Generate with kustomize build . > coil.yaml
  2. kubectl apply -f coil,.yaml
  3. Create the addresspool.yaml as seen in some documentation
apiVersion: coil.cybozu.com/v2
kind: AddressPool
metadata:
  name: default
spec:
  blockSizeBits: 0
  subnets:
  - ipv4: 192.168.0.0/22
  1. See error Error from server (InternalError): error when creating "0101_addresspool.yaml": Internal error occurred: failed calling webhook "maddresspool.kb.io": the server could not find the requested resource

Expected behavior
A new addresspool resource should be created so I can use the addressblock resource (I think) within my egress resource so the pod can run with an external address from the pool. (Otherwise my egress SNAT pod gets a CNI error about "network: failed to allocate address")

Additional context
my coil.yaml

images:
- name: coil
  newTag: 2.0.9
  newName: ghcr.io/cybozu-go/coil

resources:
- config/default
# If you are using CKE (github.com/cybozu-go/cke) and wwant to use
# its webhook installation feature, comment the above line and
# uncomment the below line.
#- config/cke

# If you want to enable coil-router, uncomment the following line.
# Note that coil-router can work only for clusters where all the
# nodes are in a flat L2 network.
- config/pod/coil-router.yaml

# If your cluster has enabled PodSecurityPolicy, uncomment the
# following line.
#- config/default/pod_security_policy.yaml

patchesStrategicMerge:
# Uncomment the following if you want to run Coil with Calico network policy.
- config/pod/compat_calico.yaml

# Edit netconf.json to customize CNI configurations
configMapGenerator:
- name: coil-config
  namespace: system
  files:
  - cni_netconf=./netconf.json

# Adds namespace to all resources.
namespace: kube-system

# Labels to add to all resources and selectors.
commonLabels:
  app.kubernetes.io/name: coil

Multiple Blocks from a single BlockRequest

Describe the bug
When updating the status of a BlockRequest fails (in our case due to a flaky API server), this can lead to multiple Blocks being created from a single request. In our case this lead to completely filling up all available Pools.

Environments

  • Version: v2.0.13

To Reproduce
Tricky, since it's on the error path.

Expected behavior
A single BlockRequest should always only yield a single Block.

Suggestion
In general, status shouldn't be used as a basis for reconciliation decisions, unless it was computed in the current iteration. Controllers should always look at the actual state instead.
I suggest to either derive the Block name from the request in a deterministic way, or add a label to the Block with the UID (or name) of the BlockRequest it was created from. This way the controller can simply query the API for the block it would create, and abort if it already exists.
This way it no longer relies on a correct and up to date status of the BlockRequest and is more resilient to errors.

v2: Metrics

What

Define, gather, and export important metrics of Coil v2 for Prometheus.

How

Describe how to address the issue.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Failed to create AddressBlock

Describe the bug

A Cluster IP did not assign to a pod.
The pod remained ContainerCreating status.

$ kubectl get pod -n logging ingester-0 -o wide
NAME         READY   STATUS              RESTARTS   AGE     IP       NODE          NOMINATED NODE   READINESS GATES
ingester-0   0/1     ContainerCreating   0          5h28m   <none>   10.69.1.132   <none>           <none>

$ kubectl describe pod -n logging ingester-0
(...)

Events:
  Type     Reason                  Age                   From     Message
  ----     ------                  ----                  ----     -------
  Warning  FailedCreatePodSandBox  2m33s (x78 over 92m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f4036648c160210acd505f6419a0c02b578454ac25762391be76b309a21c8716": failed to allocate address; aborting new block request: context deadline exceeded

A coil-controller logged the following messages.
It seems that the creation of an AddressBlock was failed.

cybozu@gcp0-boot-0:~$ stern -n kube-system coil-controller-
(...)
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767715.798942,"logger":"blockrequest-reconciler","msg":"internal error","blockrequest":"req-default-10.69.1.132","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767715.7990007,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"coil.cybozu.com","reconcilerKind":"BlockRequest","controller":"blockrequest","name":"req-default-10.69.1.132","namespace":"","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767716.9866533,"logger":"pool-manager","msg":"failed to create AddressBlock","pool":"default","index":7,"node":"10.69.0.4","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767716.9867475,"logger":"blockrequest-reconciler","msg":"internal error","blockrequest":"req-default-10.69.0.4","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767716.9867713,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"coil.cybozu.com","reconcilerKind":"BlockRequest","controller":"blockrequest","name":"req-default-10.69.0.4","namespace":"","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767726.7400553,"logger":"pool-manager","msg":"failed to create AddressBlock","pool":"default","index":7,"node":"10.69.1.132","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767726.7401295,"logger":"blockrequest-reconciler","msg":"internal error","blockrequest":"req-default-10.69.1.132","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767726.7401562,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"coil.cybozu.com","reconcilerKind":"BlockRequest","controller":"blockrequest","name":"req-default-10.69.1.132","namespace":"","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767727.8927827,"logger":"pool-manager","msg":"failed to create AddressBlock","pool":"default","index":7,"node":"10.69.0.4","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767727.892883,"logger":"blockrequest-reconciler","msg":"internal error","blockrequest":"req-default-10.69.0.4","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}
coil-controller-cc56ff6f-sw5rq coil-controller {"level":"error","ts":1617767727.892939,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"coil.cybozu.com","reconcilerKind":"BlockRequest","controller":"blockrequest","name":"req-default-10.69.0.4","namespace":"","error":"addressblocks.coil.cybozu.com \"default-7\" already exists"}

Environments

  • Version: v2.0.5
  • OS:

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Support Kubernetes 1.25 and update dependencies

What

Support Kubernetes 1.25.

How

Previous Pull Request:

Remove PSP. PSP is deprecated in k8s v1.21

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

v2: Implementation

What

Implement Coil v2 features:

  • Use CRD to configure Coil instead of CLI tool coilctl.
  • Use CRD/ConfigMap to store status data.
  • Decouple the pool name and namespace name. Use annotations to specify the pool to be used.
  • Use gRPC instead of REST for local inter-process communication.
  • Use CRD for communication between the controller and the node pods.
  • Use leader-election of the controller for better availability.
  • Implement egress NAT using foo-over-udp.
  • Prometheus metrics including Go runtime and gRPC.

How

Completely rewrite the code under v2/ directory.
All codes from v1 are moved under v1/ directory.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Coil-egress accidentally deletes a peer

Describe the bug
Coil-egress accidentally deletes a peer and a client pod that still holds a FOU tunnel with the nat pod can't communicate over the nat because of this.

Environments

  • Version: 3139.2.3
  • OS: Flatcar

To Reproduce
We don't know yet.

Expected behavior
Coil-egress doesn't delete a peer that still or recover the peer if this happens.

Additional context
Add logs to coil-egress so that we can investigate what is going on if this issue occurs.

Egress NAT Deployment does not rollout restart

Describe the bug

I tried to restart the Egress NAT Deployment by kubectl rollout restart.
But the NAT Deployment does not restart.

$ kubectl rollout restart deploy test-nat
$ kubectl get pod -w
NAME                           READY   STATUS    RESTARTS   AGE
test-nat-5f67c6947d-54rjk      1/1     Running   0          108s
test-nat-5f67c6947d-szxst      1/1     Running   0          108s
test-nat-7f9df9488b-pjvbg      0/1     Pending   0          0s   // The new pod is created.
test-nat-7f9df9488b-pjvbg      0/1     Pending   0          0s
test-nat-7f9df9488b-pjvbg      0/1     ContainerCreating   0          0s
test-nat-7f9df9488b-pjvbg      0/1     Terminating         0          0s // The new pod is terminated immediately.
test-nat-7f9df9488b-pjvbg      1/1     Terminating         0          1s
test-nat-7f9df9488b-pjvbg      0/1     Terminating         0          32s
test-nat-7f9df9488b-pjvbg      0/1     Terminating         0          32s
test-nat-7f9df9488b-pjvbg      0/1     Terminating         0          32s

Environments

  • Version: coil v2.0.14

To Reproduce

  1. Create an Egress resource.
  2. kubectl rollout restart deploy <NAT_Deployment>

Expected behavior
The old NAT pods be terminated. And the new pods become running.

Additional context

kubectl rollout restart deployment adds a kubectl.kubernetes.io/restartedAt annotation to the pod template of the Deployment for creating the new ReplicaSet.
But the egress controller will overwirte the annotation.
https://github.com/cybozu-go/coil/blob/v2.0.14/v2/controllers/egress_controller.go#L118

Coil egress has downtime due to the timing of updating coild and coil controller

When we update Coil, we have downtime in the coil-egress due to the timing of updating coild and coil controller.

What

When we update coil controller, deployment resources tied to Egress restart due to updating the image.
At this time, the Pod changes to the terminating state.
But, when coild hasn't started from the restart, kubelet can't delete the Pod, and kubelet has to wait to start coild.

So, even though the container process of the Pod is deleted, Cilium still recognizes that the Pod can handle the existing traffic in this case.
This causes downtime on coil egress.

How

We have to control the order of the update of coild and coil controller; the first is coild, and the second is coil controller.

TODO

Install CNI Race condition when using multiple cni plugins.

Describe the bug
When I install an additional cni e.g. linkderd-cni.
To accomplish more complex Network functionality.

I get race conditions between the 2 cnis.
Either both work or the other cni is broken.

Environments

  • Version: 2.0.13
  • OS: Ubuntu 20.04.3

To Reproduce
Steps to reproduce the behavior:

  1. Install coil cni with an default Addresspool
  2. Install linkerd-cni with linkerd proxies.
  3. Restart coild pods in kube-system namespace
  4. See linkerd-cni not working anymore.
  5. Restarting them fixes the problem, again.

Expected behavior
Both CNI plugins should coexist.
Coil should check if the cni config got updated and only change the parts which are referenced in the initial configmap.
It should also subscribe to file changes and do the check accordingly.
It should also imply to be the first plugin in the list and only update the parts from the initial configmap.

Additional context
For this to be possible there need to be a fundamental change in how the cni configuration gets applied on initialization.
So that coil is not bricking other cnis.

We currently need it to accomplish cross region encrypted connectivity between clusters and don't want to use the sidecar approach for it so we looked into a cni approach for a servicemesh and we discovered this race condition bug.

AddressBlocks not auto-removed and not manually removable

Describe the bug
Removing addressblocks for a drained node results in an infinite wait time.
We had a node PSU failure which resulted in the sudden (and longer than a few days) downtime. Meanwhile we wanted the NAT service to re-schedule on another node. This failed because the addressblock was never freed by Coil. (as the NAT consists of a /32 public IP, there were no spare addresses).

I expected that when draining a node, all addressblocks assigned to that node would be removed. (Please note: the node in question did not work anymore due to a failed PSU, so the controller evicted all pods with --force and --grace-period=0, because otherwise the draining would hang on the egress NAT deployment.

Environments

  • Version: 20.04
  • OS: Ubuntu

To Reproduce
Steps to reproduce the behavior:

  1. Add nodes to cluster
  2. Create /32 address pool
  3. Asign address pool to namespace
  4. Create egress resource in namespace created in step 3.
  5. Suddenly disconnect the node on which the NAT egress pod was running from the cluster
  6. re-scheduling fails and the addressblock is locked in place.

Expected behavior
addressblock gets cleared and egress pod reschedules on another node.

Use node InternalIP for a host-side veth IP

What

Use node InternalIP for a host-side veth IP so that Cilium recognizes a source IP from the node as localhost address.

How

Retrieve node's InternalIP address from a node resource and set it to a host-side veth

Checklist

  • Finish implementation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Remove the unnecessary code block for v1 migration

This code is no longer needed.
This is for v1 migration.

Remove it.

// This is for migration from Coil v1.
// TODO: eventually this block should be removed.
if args.Netns != "" {
err := ns.WithNetNSPath(args.Netns, func(_ ns.NetNS) error {
return ip.DelLinkByName(args.Ifname)
})
if err != nil {
logger.Sugar().Errorw("intentionally ignoring error for v1 migration", "error", err)
}
}

Enhance CNI delete delay implementation

What

Currently, Coil waits for the 30s before destroying the pod network in the CNI delete operation. Why Coil is doing so is to keep connectivity for network components that need time to gracefully shut down active TCP connections. For example, Envoy waits for TCP connections to drain in its preStop hook before shutting down. The CNI delete is called as soon as the pod has the deletion timestamp, and destroying the pod network disrupts connections to Envoy and breaks the graceful shutdown assumption.

However, this implementation forces all pods including those that don't need such delay to wait in the CNI delete operation. For instance, the nat pods derived from coil Egress which only receive connection-less UDP packets have to wait for the 30s even though it's not necessary.

EDIT:
We later found out that k8s calls the StopSandbox API of the container runtime after killing container processes. So coil doesn't need to sleep in its delete implementation.
https://github.com/kubernetes/kubernetes/blob/02f9b2240814d2e952eaf7dca3a665a675004f21/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L979

related to #164

How

Options

  • Check if all the container processes have gracefully shutdown, and proceed delete operation instead of waiting for a fixed time.
  • Provide an annotation that allows pods to opt out of the delay in the CNI delete operation.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Enhance the graceful termination for Egress NAT

What

Rolling restart of Egress NAT pods causes a brief outage

  1. Set deletionTimestamp to a nat pod, and it becomes terminating
  2. kubelet sends SIGTERM to the container process <- brief outage from here until cilium removes the backend
  3. NAT pod gets removed <- cilium removed the backend, and send packets to a new backend

How

  1. Set deletionTimestamp to a nat pod, and it becomes terminating
  2. Sleep for a while during its preStop hook <- cilium notices that the backend becomes inactive and selects a new backend
  3. kubelet sends SIGTERM to the container process <- no outage since cilium has already selected a new backend
  4. NAT pod gets removed

Cilium selects a new backend if the client hits the same old tuple for syn packets, but it doesn't consider UDP packets. So we need to send a PR for it.
cilium/cilium#20407

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

MTU auto configuration

What

It is handy to configure the MTU of veth links automatically.

c.f. https://itnext.io/benchmark-results-of-kubernetes-network-plugins-cni-over-10gbit-s-network-updated-august-2020-6e1b757b9e49

How

In coild,

  1. List all physical Device links with up status: https://pkg.go.dev/github.com/vishvananda/netlink#Device
  2. Choose the minimum MTU among the links
  3. Create veth links with the detected value

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Multi-architecture images

Describe the bug
There is currently only amd64 containers, which will not work on arm64 nodes.

Environments

  • Version: 2.0.13
  • OS: Ubuntu Server 20.04 ARM64

To Reproduce
Steps to reproduce the behavior:

  1. Install Ubuntu and Kubernetes on ARM64 node (eg. Amazon EC2 or Oracle Cloud)
  2. Try to deploy
  3. See error

Expected behavior
Expected it to work on ARM64

Additional context

$ sudo docker inspect ghcr.io/cybozu-go/coil:2.0.13
[
    {
        "Id": "sha256:60305a706525ca0c5a29d3ffae25755a6e8e0f9fa7996a5f1a2b3fb3a17ae282",
        "RepoTags": [
            "ghcr.io/cybozu-go/coil:2.0.13"
        ],
        "RepoDigests": [
            "ghcr.io/cybozu-go/coil@sha256:8133e128835f7c05f1ca1fd900eaa37fa17640f520f07330fd20a76a01b48dc4"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2021-10-26T02:38:28.511583682Z",
        "Container": "ded8846ab6f07808c9f3069eb0871f61f234841a68d9f1c36771567337853036",
        "ContainerConfig": {
            "Hostname": "ded8846ab6f0",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/coil:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "#(nop) ",
                "ENV PATH=/usr/local/coil:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Image": "sha256:413e977f9618f6e5e8c36f5b96d3515dee85a274f2479e2934ae25c5c826cbee",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "org.opencontainers.image.source": "https://github.com/cybozu-go/coil"
            }
        },
        "DockerVersion": "20.10.9+azure-1",
        "Author": "",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/coil:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/bash"
            ],
            "Image": "sha256:413e977f9618f6e5e8c36f5b96d3515dee85a274f2479e2934ae25c5c826cbee",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "org.opencontainers.image.source": "https://github.com/cybozu-go/coil"
            }
        },
        "Architecture": "amd64", # THIS RIGHT HERE
        "Os": "linux",
        "Size": 268482846,
        "VirtualSize": 268482846,
        "GraphDriver": {
            "Data": null,
            "Name": "btrfs"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:da55b45d310bb8096103c29ff01038a6d6af74e14e3b67d1cd488c3ab03f5f0d",
                "sha256:686277369035b640c16f04679389a897bf76539a025db4663e4993004de3ee36",
                "sha256:4717130c466b131a283c53643dcd50f79803c872793fa5b1a0b012262470194a",
                "sha256:d4a394709e4d956943dec2da5d0f16160e99a168c20089b3ca6de1ca0f15d48a",
                "sha256:bd20f117fab2f9489b962226d5bdf266277fc0cc3c2c816b2b7d887cc28dc805",
                "sha256:2d53babe35538bb43e8e7209e7e73a4f700480f4b2e734bd9bacc80cb205a0b8"
            ]
        },
        "Metadata": {
            "LastTagTime": "0001-01-01T00:00:00Z"
        }
    }
]

Unable to delete AddressPool

Describe the bug
Unable to delete AddressPool

Environments

  • Version: Coil v2.4.0

To Reproduce

Create an AddressPool and Delete it.

$ kubectl apply -f <manifest file of AddressPool>
$ kubectl delete addresspool <AddressPool Name>

Then coil-controller outputs the following error.

{
  "level": "error",
  "ts": "2023-12-12T02:04:11Z",
  "msg": "Reconciler error",
  "controller": "addresspool",
  "controllerGroup": "coil.cybozu.com",
  "controllerKind": "AddressPool",
  "AddressPool": {
    "name": "test"
  },
  "namespace": "",
  "name": "test",
  "reconcileID": "6f2ae24d-86de-4ec0-b52c-96ded1c22ae6",
  "error": "failed to remove finalizer from address pool: addresspools.coil.cybozu.com \"test\" is forbidden: User \"system:serviceaccount:kube-system:coil-controller\" cannot update resource \"addresspools\" in API group \"coil.cybozu.com\" at the cluster scope",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"
}

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

[BUG] When using the following DualStack Pool I cannot create an interface with IPs for the pod.

This bug could be related to the following issue.
vishvananda/netlink#576

coil: ghcr.io/cybozu-go/coil:2.0.14
Kubernetes Version: 1.21.11
Container Runtime: cri-o 1.21.6
Linux OS: AlmaLinux 4.18.0-348.20.1.el8_5.x86_64

apiVersion: coil.cybozu.com/v2
kind: AddressPool
metadata:
  name: lb
spec:
  blockSizeBits: 6
  subnets:
  - ipv4: 100.126.16.0/20
    ipv6: 2001:7c7:2100:42f:ffff:ffff:ffff:f000/116

Namespace with requirement for DualStack Pods

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    coil.cybozu.com/pool: lb
  name: gitlab-runner
  resourceVersion: "407513963"
  uid: a698674e-6c56-4728-bfdf-952bd2248433
spec:
  finalizers:
  - kubernetes
status:
  phase: Active
  Warning  FailedCreatePodSandBox  13s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_runner-zaevnzet-project-392-concurrent-0cdzrw_gitlab-runner_abfc57ce-1626-444f-a0f0-635ac3ce9e27_0(372c1a10840209f43cb46fce286f263d0a25da436c6c2157a86517d590f730a4): error adding pod gitlab-runner_runner-zaevnzet-project-392-concurrent-0cdzrw to CNI network "k8s-pod-network": failed to setup pod network; netlink: failed to add a hostIPv4 address: numerical result out of range
Mar 18 22:56:18 wuerfelchen-w-3 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 18 22:56:18 wuerfelchen-w-3 kernel: IPv6: ADDRCONF(NETDEV_UP): veth88f11a25: link is not ready
Mar 18 22:56:18 wuerfelchen-w-3 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth88f11a25: link becomes ready
Mar 18 22:56:18 wuerfelchen-w-3 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 18 22:56:20 wuerfelchen-w-3 kernel: netlink: 'coild': attribute type 2 has an invalid length.

The code lines which could be the issue are here.

coil/v2/pkg/ipam/node.go

Lines 289 to 312 in 1db9b2d

func (n *nodeIPAM) NodeInternalIP(ctx context.Context) (net.IP, net.IP, error) {
n.mu.Lock()
defer n.mu.Unlock()
if err := n.getNode(ctx); err != nil {
return nil, nil, err
}
var ipv4, ipv6 net.IP
for _, a := range n.node.Status.Addresses {
if a.Type != corev1.NodeInternalIP {
continue
}
ip := net.ParseIP(a.Address)
if ip.To4() != nil {
ipv4 = ip.To4()
continue
}
if ip.To16() != nil {
ipv6 = ip.To16()
}
}
return ipv4, ipv6, nil

{"level":"info","ts":1647643830.4085627,"logger":"node-ipam","msg":"requesting a new block","pool":"lb"}
{"level":"info","ts":1647643830.4251256,"logger":"node-ipam","msg":"waiting for request completion","pool":"lb"}
{"level":"info","ts":1647643830.4395173,"logger":"node-ipam","msg":"adding a new block","pool":"lb","name":"lb-0","block-pool":"lb","block-node":"wuerfelchen-w-3"}
{"level":"info","ts":1647643830.4395475,"logger":"node-ipam","msg":"allocated","pool":"lb","block":"lb-0","ipv4":"100.126.16.0","ipv6":"2001:7c7:2100:42f:ffff:ffff:ffff:f000"}
{"level":"info","ts":1647643830.4438093,"logger":"route-exporter","msg":"synchronizing routing table","table-id":119}
{"level":"info","ts":1647643831.7375743,"logger":"node-ipam","msg":"freeing an empty block","pool":"lb","block":"lb-0"}
{"level":"info","ts":1647643831.768794,"logger":"route-exporter","msg":"synchronizing routing table","table-id":119}
{"level":"error","ts":1647643831.7710552,"logger":"grpc","msg":"failed to setup pod network","grpc.start_time":"2022-03-18T22:50:30Z","grpc.request.deadline":"2022-03-18T22:51:30Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Add","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","peer.address":"@","grpc.request.pod.name":"debug-pod","error":"netlink: failed to add a hostIPv4 address: numerical result out of range"}
{"level":"error","ts":1647643831.7711568,"logger":"grpc","msg":"finished unary call with code Internal","grpc.start_time":"2022-03-18T22:50:30Z","grpc.request.deadline":"2022-03-18T22:51:30Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Add","grpc.request.pod.name":"debug-pod","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","peer.address":"@","error":"rpc error: code = Internal desc = failed to setup pod network","grpc.code":"Internal","grpc.time_ms":1368.211}
{"level":"info","ts":1647643831.7817273,"logger":"grpc","msg":"waiting before destroying pod network","grpc.start_time":"2022-03-18T22:50:31Z","grpc.request.deadline":"2022-03-18T22:51:31Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Del","peer.address":"@","grpc.request.pod.name":"debug-pod","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","duration":"30s"}
{"level":"error","ts":1647643861.784251,"logger":"grpc","msg":"intentionally ignoring error for v1 migration","grpc.start_time":"2022-03-18T22:50:31Z","grpc.request.deadline":"2022-03-18T22:51:31Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Del","peer.address":"@","grpc.request.pod.name":"debug-pod","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","error":"link not found"}
{"level":"info","ts":1647643861.7843103,"logger":"grpc","msg":"finished unary call with code OK","grpc.start_time":"2022-03-18T22:50:31Z","grpc.request.deadline":"2022-03-18T22:51:31Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Del","peer.address":"@","grpc.request.pod.name":"debug-pod","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","grpc.code":"OK","grpc.time_ms":30002.637}

failed to move veth to host netns: invalid argument

Describe the bug

A Pod failed to start due to the following CNI error:

$ kubectl describe pod ...
  Type     Reason                  Age                     From                 Message
  ----     ------                  ----                    ----                 -------
  Warning  FailedCreatePodSandBox  2m14s (x6051 over 21h)  kubelet, 10.69.0.21  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...": failed to move veth to host netns: invalid argument

This indicates that coil failed to remove existing veth in some cases.

Note that veth names generated by coil are not random.
They are generated as same as Calico to work with Calico NetworkPolicy implementation.

Environments

  • neco release-2019.09.04-6385

To Reproduce
Steps to reproduce the behavior:

  1. Construct neco k8s cluster.
  2. Choose one of the coil-node pods, and get the veth name on the node which contains the pod.
  3. Delete the coil-node pod.
  4. Enter the node, then confirm the veth exists by $ ip l.

Additional context

You can get the veth name from the pod name by the following code.

package main

import (
	"crypto/sha1"
	"encoding/hex"
	"fmt"
)

func main() {
	h := sha1.New()
	h.Write([]byte(fmt.Sprintf("%s.%s", "topolvm-system", "csi-topolvm-node-tnjfp")))
	fmt.Printf("%s%s", "veth", hex.EncodeToString(h.Sum(nil))[:11])
}

v2: Migration from v1

What

Implement data migration from v1.

How

The migration should be able to run online.
That is, it must keep the running Pods alive.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Send ICMP Host Unreachable for non-existing Pods

What

It is nice to send back ICMP host unreachable when a client sends packets to a non-existing Pod.

How

Describe how to address the issue.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Install on an existing Flannel-based cluster

Hello, this is not an issue, but I could not find any other way to contact you therefore writing here.

The situation is as follows: I have a cluster already up and running which uses Flannel. That is why I don't know whether it is possible to install Coil over that setup.
Taking into account that Flannel is a L3 network plugin, I think that simply uninstalling driver won't work because I will untie the entire cluster network...

Am I right? If the answer is yes, then any experience with a migration like this?

Extra info: I found this article https://github.com/kubernetes/kops/blob/master/docs/networking.md#switching-between-networking-providers which is under the Kops repo saying that, in a nutshell, changing between CNI providers is not really recommended.

Thanks guys

Fix the IP address allocation logic from AddressBlock

What

We want to avoid not reusing the address released in a short time.

Now Coil allocates an IP Address from an AddressBlock, it allocates the address of the smallest available index.
Therefore, we should fix not to use the address released in a short time.

How

The AddressBlock resource should have the index last allocated by Coil (lastAllocatedIndex).
And when Coil allocates the new address to a pod, Coil should pick the address next to lastAllocatedIndex.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Create PDB for Egress NAT pods

What

All Egress NAT pods can disappear at the same time when rebooting nodes because currently there's no PDB for them.

How

Create PDB for Egress NAT pods

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Coild doesn't release AddressBlock which is no longer needed

Describe the bug
Coild doesn't release AddressBlock which is no longer needed when it starts-up.

To Reproduce
Steps to reproduce the behavior:

  1. Shut down a node where a Pod is running.
  2. Delete the Pod and reschedule it to another node.
  3. Start up the node.
  4. Run kubectl get addressblock and see an AddressBlock which is no longer needed is not released

Expected behavior
Coild releases unused AddressBlocks.

Pods cannot communicate directly

Describe the bug
After creating a kubernetes cluster with the default service IP's and installing Coil as a CNI (no other CNI's) pods are not able to communicate directly.

Environments

  • Kubernetes version:
    Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
  • Kernel version: Linux HostnameHere 5.4.0-96-generic #109-Ubuntu SMP Wed Jan 12 16:49:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • OS: Ubuntu 20.04.03 LTS (focal)

To Reproduce
Steps to reproduce the behavior:

  1. Configure a default address pool
    `
    spec:
    blockSizeBits: 5
    subnets:
    • ipv4: 10.100.0.0/16
      `
  2. Configure BIRD (setup BGP sessions and import/export from Coil routing table as described in the example)
  3. Apply the default nginx-ingress controller with some minor changes (type: LoadBalancer, loadBalancerIP: public_bird_ip, externalTrafficPolicy: Cluster)
  4. Create a second address pool containing the exact same public_bird_ip
  5. Assign second address pool to namespace B
  6. Deploy Egress resource called "nat" in namespace B
  7. Create Deployment / Service / Ingress in default namespace and add annotations for namespace B: nat in pod template(s)
  8. Now look at the cluster IPs of two different pods (both in same namespace, but doesn't seem to matter)
  9. curl localhost in both pods to verify that nginx inside the pod responds to requests (confirmed, works)
  10. curl the other pods ClusterIP --> does not work --> NGINX results in 502 (bad gateway) errors and no traffic is shown in the access.log from nginx inside the pod.

Expected behavior
Being able to access other pods in same (or other) namespace.

Additional context
It doesn't matter if both pods are scheduled on the same node, traceroute makes it seem like traffic cannot be delivered to the pod.
Example traceroute (simplified):


  1. IP address of node the pod is running on
  2. IP address of node the destination pod is running on




when CURL'ing the services' ClusterIP from the node itself, or even from another node, everything works as expected.

Is this a mis-configuration?

Coil v2 fails to build with kustomize v4 following setup.md

Describe the bug
I cannot seem to build coil v2 with kustomize following the setup guide in docs/setup.md.

Environments

  • Version: coil 2780752 with kustomize v4.1.3 and k8s 1.21.1
  • OS: Alpine Linux v3.13

To Reproduce

$ git clone https://github.com/cybozu-go/coil
<SNIP>
$ cd coil/v2
$ make certs
go run ./cmd/gencert -outdir=/root/coil/coil2/v2/config/default
$ # kustomization.yaml needed no change
$ # netconf.json changed just like in setup.md but with MTU 9000
$ kustomize build . > coil.yaml
Error: field specified in var '{CACERT ~G_v1_Secret {data[ca.crt]}}' not found in corresponding resource

Expected behavior
kustomize builds coil.yaml successfully.

Test matrix for Kubernetes versions

What

Coil should support the latest three Kubernetes versions.
As of Aug. 2019, it should support Kubernetes 1.13, 1.14, and 1.15.

Run end-to-end test on all these versions.

How

Describe how to address the issue.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Use encap-sport auto in FOU tunnel setting for coil-egress

What

Currently, coil-egress uses the fixed encap-sport 5555, and it causes some issues.

  • Underlying network components, e.g., Cilium(eBPF), don't have a chance to pick a new backend during the graceful termination process. They always select the same backend until the backend finally gets removed.
  • Cilium sometimes wrongly tracks connections for the packets sent from client pods that rely on the egress pod in the opposite direction, breakings communications between client pods and egress pods. Reply packets from the egress pod rely on the rev SNAT and the connection needs to be tracked correctly.

How

Investigate the tunnel collect metadata mode and use it to lift the limitation described above.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

addressblocks are not freed when scheduled on master nodes

Describe the bug
I had a coredns pod that was being scheduled on a master node over and over, which failed due to CNI version incompatibility. Each restart resulted in a new addressblock reservation. Address blocks did not clear after each failed attemt and the full /16 is now used up, resulting in pods stuck in creating phase.

Environments

  • Version: 20.04
  • OS: Ubuntu

To Reproduce
image

Expected behavior
Block is being cleared on pod finalization, even on master nodes.

Support Kubernetes 1.22

What

Support Kubernetes 1.22

How

  • Update kubernetes libraries
  • Add 1.22 to test matrix

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Update controller-runtime to 0.8.3 and add node-role.kubernetes.io/control-plane label

What

Coil already supports k8s 1.20, but it depends on controller-runtime 0.6.3.
node-role.kubernetes.io/master label is deprecated in k8s 1.20 and node-role.kubernetes.io/control-plane is introduced.

How

Update controller-runtime to 0.8.3 and other dependencies.
Add node-role.kubernetes.io/control-plane label to tolerations.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

[Bug] Coild is not updating `podCIDR` and `podCIDRs` in the node objects.

Describe the bug

When I describe node objects I expect correct podCIDR to have easier debugging capabilities.

Environments

  • Version: v2.
  • OS: Ubuntu 20.04.3

To Reproduce
Steps to reproduce the behavior:

  1. Go to kubectl get nodes
  2. Describe a node object
  3. Scroll down to podCIDR podCIDRs
  4. See non updated CIDR notations

Expected behavior
If coild assigns podCIDR it should also update the node object accordingly.

Additional context

kubectl describe node
PodCIDR:                      172.30.8.0/24
PodCIDRs:                     172.30.8.0/24

kubectl get addressblocks
default-4     <nodename>                   default   172.30.1.0/26 

Add client count metrics to coil-egress

What

The number of clients for a coil-egress Pod can be useful metrics for HPA.

How

Add a gauge like this:

coil_egress_clinets{namespace="foo",egress="nat"} 30

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

AddressPool should have a finalizer

Describe the bug

Due to a bug in kube-apiserver, although there are still AddressBlocks
curved from an AddressPool, the AddressPool would be deleted even
though AddressBlock has ownerReferences with blockOwnerDeletion=true.

kubernetes/kubernetes#86509 (comment)

So, AddressPool should have a finalizer until all of its child AddressBlocks get deleted.

Environments

  • Version:
  • OS:

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

useless replace usage left in go.mod

It seems that module github.com/cybozu-go/coil/v2 does not depend on github.com/dgrijalva/jwt-go any more, both directly and indirectly.
So, replace usage left in go.mod makes no sense. Should it be dropped?

$ go mod why -m github.com/golang-jwt/jwt/v4
# github.com/golang-jwt/jwt/v4
(main module does not need module github.com/golang-jwt/jwt/v4)

https://github.com/cybozu-go/coil/blob/main/v2/go.mod#L5

replace github.com/dgrijalva/jwt-go => github.com/golang-jwt/jwt/v4 v4.4.2

v2: Design and documents

What

Design the new representation of configurations and states of Coil v2.
They should be represented as CustomResourceDefinition (CRD) or ConfigMaps.

How

Write documents about them.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

CNI issue in kind-created cluster

I was setup the coil natgateway on the kind created cluster according to the READ.ME command.

$ cd v2
$ make certs
$ make image

$ cd e2e
$ make start
$ make install-coil

but I encountered some CNI issues.
There are some pending pods and core DNS was holding on the ContainerCreating.

NAMESPACE            NAME                                         READY   STATUS              RESTARTS   AGE   IP           NODE                 NOMINATED NODE   READINESS GATES
kube-system          coil-controller-866b8fd666-5b5l4             0/1     Pending             0          14m   <none>       <none>               <none>           <none>
kube-system          coil-controller-866b8fd666-f5zmw             1/1     Running             0          14m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          coil-router-6w6sp                            1/1     Running             0          14m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          coil-router-fp7rb                            1/1     Running             0          14m   172.18.0.6   coil-worker2         <none>           <none>
kube-system          coil-router-xkvzl                            1/1     Running             0          14m   172.18.0.7   coil-worker          <none>           <none>
kube-system          coild-d7fh7                                  1/1     Running             0          14m   172.18.0.7   coil-worker          <none>           <none>
kube-system          coild-j66hh                                  1/1     Running             0          14m   172.18.0.6   coil-worker2         <none>           <none>
kube-system          coild-nmsnv                                  1/1     Running             0          14m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          coredns-bd6b6df9f-fmgd6                      0/1     ContainerCreating   0          17m   <none>       coil-control-plane   <none>           <none>
kube-system          coredns-bd6b6df9f-qqd47                      0/1     ContainerCreating   0          17m   <none>       coil-control-plane   <none>           <none>
kube-system          etcd-coil-control-plane                      1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          kube-apiserver-coil-control-plane            1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          kube-controller-manager-coil-control-plane   1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          kube-proxy-4rdh7                             1/1     Running             0          16m   172.18.0.6   coil-worker2         <none>           <none>
kube-system          kube-proxy-j4b48                             1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          kube-proxy-zwnnd                             1/1     Running             0          17m   172.18.0.7   coil-worker          <none>           <none>
kube-system          kube-scheduler-coil-control-plane            1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
local-path-storage   local-path-provisioner-6fd4f85bbc-vbv58      0/1     ContainerCreating   0          17m   <none>       coil-control-plane   <none>           <none>

when I describe the pods
kubectl describe pods coredns-bd6b6df9f-fmgd6 -n kube-system

It's show

 failed (add): failed to allocate address; aborting new block request: context deadline exceeded
  Warning  FailedCreatePodSandBox  50s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6516770d68bb935f8746c5a0ef17636d028ecd932ba0dbe2fd686a63e19ff935": plugin type="coil" failed (add): failed to allocate address; aborting new block request: context deadline exceeded
  Warning  FailedCreatePodSandBox  20s    kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3d0c988e6c39ff1183964d6adc34105998919be5ec61af6c1214526781fc6d52": plugin type="coil" failed (add): failed to allocate address; aborting new block request: context deadline exceeded

Would you like to share some experience about how to trouble shooting this issue?

Update the image using Go 1.15 and Ubuntu 20.04

What

Add support for Kubernetes 1.19.

How

Since coil is built using kubebuilder and controller-runtime and they have not used client-go of k8s 1.19 yet, we cannot upgrade client libraries for k8s 1.19.
Add k8s 1.19 to the matrix of CI and fix problems if any.

Upgrade the container base image from Ubuntu 18.04 to Ubuntu 20.04.
Use Go 1.15 to build binaries.

Checklist

  • Finish implentation of the issue
  • Test all functions
  • Have enough logs to trace activities
  • Notify developers of necessary actions

Egress traffics is disconnected for about 30 seconds when deleting an Egress Pod

Describe the bug
Egress traffics is disconnected for about 30 seconds when deleting an Egress Pod.

Environments

  • Version: coil v2.0.14

To Reproduce

  1. Create an Egress resource. (.spec.replicas is 2 or more.)
  2. Create a client pod which uses the Egress NAT.
  3. ping to the external network via the Egress NAT from the client pod.
  4. Delete the Egress Pods one by one.

When deleting an Egress Pod, sometimes ping will fail (packets lost) for about 30 seconds.

Expected behavior
There are some replicas. So I want to switch the Egress Pod immediately.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.