Giter Club home page Giter Club logo

cluster-network-operator's Introduction

Cluster Network Operator

The Cluster Network Operator installs and upgrades the networking components on an OpenShift Kubernetes cluster.

It follows the Controller pattern: it reconciles the state of the cluster against a desired configuration. The configuration specified by a CustomResourceDefinition called Network.config.openshift.io/v1, which has a corresponding type.

Most users will be able to use the top-level OpenShift Config API, which has a Network type. The operator will automatically translate the Network.config.openshift.io object in to a Network.operator.openshift.io.

To see the network operator:

$ oc get -o yaml network.operator cluster

When the controller has reconciled and all its dependent resources have converged, the cluster should have an installed network plugin and a working service network. In OpenShift, the Cluster Network Operator runs very early in the install process -- while the boostrap API server is still running.

Configuring

The network operator gets its configuration from two objects: the Cluster and the Operator configuration. Most users only need to create the Cluster configuration - the operator will generate its configuration automatically. If you need finer-grained configuration of your network, you will need to create both configurations.

Any changes to the Cluster configuration are propagated down in to the Operator configuration. In the event of conflicts, the Operator configuration will be updated to match the Cluster configuration.

For example, if you want to use OVN networking instead of the default SDN networking, do the following:

Create the cluster using openshift-install and generate the install-config. Use a convenient directory for the cluster.

$ openshift-install --dir=MY_CLUSTER create install-config

Edit the MY_CLUSTER/install-config.yaml and change the networkType: to, for example, OVNKubernetes

After that go on with the install.

When you want to change the default networing parameters, for example, you want to use a different VXLAN port for OpenShiftSDN, then you will need to create the manifest files.

$ openshift-install --dir=MY_CLUSTER create manifests

The MY_CLUSTER/manifests/cluster-network-02-config.yml contains the cluster network operator configuration. It is the basis of the operator configuration and can't be changed. In particular the "networkType" can't be changed. See above for how to set the "networkType".

The cluster-network-02-config.yml file is copied to a new file and that file is edited for new configuration.

$ cp MY_CLUSTER/manifests/cluster-network-02-config.yml MY_CLUSTER/manifests/cluster-network-03-config.yml

Edit the new file:

  • change first line apiVersion: config.openshift.io/v1 to apiVersion: operator.openshift.io/v1

When all configuration changes are complete, go on and create the cluster:

$ openshift-install --dir=MY_CLUSTER create cluster

The following sections detail how to configure the cluster-network-03-config.yml file for different needs.

Configuration objects

Cluster config

  • Type Name: Network.config.openshift.io
  • Instance Name: cluster
  • View Command: oc get Network.config.openshift.io cluster -oyaml
  • File: install-config.yaml

Operator config

  • Type Name: operator.openshift.io/v1
  • Instance Name: cluster
  • View Command: oc get network.operator cluster -oyaml
  • File: manifests/cluster-network-03-config.yml as described above

Example configurations

Cluster Config manifests/cluster-network-02-config.yml

The fields in this file can't be changed. The installer created it from the install.config.yaml file (above).

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16

Alternatively, ovn-kubernetes is configured when networkType: OVNKubernetes.

Corresponding Operator Config manifests/cluster-network-03-config.yml

This config file starts as a copy of manifests/cluster-network-02-config.yml. You can add to the file but you can't change lines in the file.

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks: null
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  defaultNetwork:
    type: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16

Configuring IP address pools

The ClusterNetworks and ServiceNetwork are configured in the MY_CLUSTER/install-config from above. They cannot be changed in the manifests.

Users must supply at least two address pools - ClusterNetwork for pods, and ServiceNetwork for services. Some network plugins, such as OpenShiftSDN and OVNKubernetes, support multiple ClusterNetworks. All address blocks must be non-overlapping and a multiple of hostPrefix.

For future expansion, multiple serviceNetwork entries are allowed by the configuration but not actually supported by any network plugins. Supplying multiple addresses is invalid.

Each clusterNetwork entry has an additional parameter, hostPrefix, that specifies the address size to assign to each individual node. For example,

cidr: 10.128.0.0/14
hostPrefix: 23

means 512 nodes would get blocks of size /23, or 512 addresses. If the hostPrefix field is not used by the plugin, it can be left unset.

IP address pools are always read from the Cluster configuration and propagated "downwards" into the Operator configuration. Any changes to the Operator configuration are ignored.

Currently, changing the address pools once set is not supported. In the future, some network providers may support expanding the address pools.

Example:

spec:
  serviceNetwork:
  - "172.30.0.0/16"
  clusterNetwork:
    - cidr: "10.128.0.0/14"
      hostPrefix: 23
    - cidr: "192.168.0.0/18"
      hostPrefix: 23

Configuring the default network provider

The default network provider is configured in the MY_CLUSTER/install-config from above. It cannot be changed in the manifests. Different network providers have additional provider-specific settings.

The network type is always read from the Cluster configuration.

Currently, the understood values for networkType are:

  • OpenShiftSDN
  • OVNKubernetes

Other values are ignored. If you wish to use use a third-party network provider not managed by the operator, set the network type to something meaningful to you. The operator will not install or upgrade a network provider, but all other Network Operator functionality remains.

Configuring OpenShiftSDN

OpenShiftSDN supports the following configuration options, all of which are optional:

  • mode: one of "Subnet" "Multitenant", or "NetworkPolicy". Configures the isolation mode for OpenShift SDN. The default is "NetworkPolicy".
  • vxlanPort: The port to use for the VXLAN overlay. The default is 4789
  • MTU: The MTU to use for the VXLAN overlay. The default is the MTU of the node that the cluster-network-operator is first run on, minus 50 bytes for overhead. If the nodes in your cluster don't all have the same MTU then you will need to set this explicitly.
  • useExternalOpenvswitch: boolean. If the nodes are already running openvswitch, and OpenShiftSDN should not install its own, set this to true. This only needed for certain advanced installations with DPDK or OpenStack.
  • enableUnidling: boolean. Whether the service proxy should allow idling and unidling of services.

These configuration flags are only in the Operator configuration object.

Example from the manifests/cluster-network-03-config.yml file:

spec:
  defaultNetwork:
    type: OpenShiftSDN
    openshiftSDNConfig:
      mode: NetworkPolicy
      vxlanPort: 4789
      mtu: 1450
      enableUnidling: true
      useExternalOpenvswitch: false

Additionally, you can configure per-node verbosity for openshift-sdn. This is useful if you want to debug an issue, and can reproduce it on a single node. To do this, create a special ConfigMap with keys based on the Node's name:

kind: ConfigMap
apiVersion: v1
metadata:
  name: env-overrides
  namespace: openshift-sdn
data:
  # to set the node processes on a single node to verbose
  # replace this with the node's name (from oc get nodes)
  ip-10-0-135-96.us-east-2.compute.internal: |
    OPENSHIFT_SDN_LOG_LEVEL=5
  # to enable verbose logging in the sdn controller, use
  # the special node name of _master
  _master: |
    OPENSHIFT_SDN_LOG_LEVEL=5

Configuring OVNKubernetes

OVNKubernetes supports the following configuration options, all of which are optional and once set at cluster creation, they can't be changed except for gatewayConfig and IPsec which can be changed at runtime:

  • MTU: The MTU to use for the geneve overlay. The default is the MTU of the node that the cluster-network-operator is first run on, minus 100 bytes for geneve overhead. If the nodes in your cluster don't all have the same MTU then you may need to set this explicitly.
  • genevePort: The UDP port to use for the Geneve overlay. The default is 6081.
  • hybridOverlayConfig: hybrid linux/windows cluster (see below).
  • ipsecConfig: enables and configures IPsec for pods on the pod network within the cluster.
  • policyAuditConfig: holds the configuration for network policy audit events.
  • gatewayConfig: holds the configuration for node gateway options.
    • routingViaHost: If set to true, pod egress traffic will touch host networking stack before being sent out.
  • egressIPConfig: holds the configuration for EgressIP options.
    • reachabilityTotalTimeoutSeconds: Set EgressIP node reachability total timeout in seconds, 0 means disable reachability check and the default is 1 second.

These configuration flags are only in the Operator configuration object.

Example from the manifests/cluster-network-03-config.yml file:

spec:
  defaultNetwork:
    type: OVNKubernetes
    ovnKubernetesConfig:
      mtu: 1400
      genevePort: 6081
      gatewayConfig:
        routingViaHost: false
      egressIPConfig:
        reachabilityTotalTimeoutSeconds: 5

Additionally, you can configure per-node verbosity for ovn-kubernetes. This is useful if you want to debug an issue, and can reproduce it on a single node. To do this, create a special ConfigMap with keys based on the Node's name:

kind: ConfigMap
apiVersion: v1
metadata:
  name: env-overrides
  namespace: openshift-ovn-kubernetes
  annotations:
data:
  # to set the node processes on a single node to verbose
  # replace this with the node's name (from oc get nodes)
  ip-10-0-135-96.us-east-2.compute.internal: |
    OVN_KUBE_LOG_LEVEL=5
    OVN_LOG_LEVEL=dbg
  # to adjust master log levels, use _master
  _master: |
    OVN_KUBE_LOG_LEVEL=5
    OVN_LOG_LEVEL=dbg

Configuring OVNKubernetes On a Hybrid Cluster

OVNKubernetes supports a hybrid cluster of both Linux and Windows nodes on x86_64 hosts. The ovn configuration is done as described above. In addition the hybridOverlayConfig can be included as follows:

Add the following to the spec: section

Example from the manifests/cluster-network-03-config.yml file:

spec:
  defaultNetwork:
    type: OVNKubernetes
    ovnKubernetesConfig:
      hybridOverlayConfig:
        hybridClusterNetwork:
        - cidr: 10.132.0.0/14
          hostPrefix: 23

The hybridClusterNetwork cidr and hostPrefix are used when adding windows nodes. This CIDR must not overlap the ClusterNetwork CIDR or serviceNetwork CIDR.

There can be at most one hybridClusterNetwork "CIDR". A future version may supports multiple cidr.

Configuring IPsec with OVNKubernetes at cluster creation

OVNKubernetes supports IPsec encryption of all pod traffic using the OVN IPsec functionality. Add the following to the spec: section of the operator config:

spec:
  defaultNetwork:
    type: OVNKubernetes
    ovnKubernetesConfig:
      ipsecConfig: {}

Configuring IPsec with OVNKubernetes at runtime

OVN Kubernetes supports IPsec encryption dynamic enablement/disablement at runtime. The IPsec protocol adds an ESP header to tenant traffic which stores security data that is needed by each IPsec endpoint to encrypt/decrypt the tenant traffic.

In order for IPsec to function properly the cluster MTU size must be decreased by 46 bytes to fit the additional ESP header added to each packet. This adjustment is not currently automatic and must be performed by the cluster administrator before enabling IPsec at runtime.

Example of enabling the IPsec at runtime:

  1. Decrease the Cluster MTU size by 46 bytes (for ESP header):
    1. Add the following to the spec: section of the operator config:
spec:
migration:
  mtu:
    machine:
      from: 1500
      to: 1500
    network:
      from: 1400
      to: 1354
  1. wait until Machine-Config-Operator updates the machines, it will reboot each node one by one:
$ oc get mcp
  1. finalize the MTU migration process by adding the following to the spec: section of the operator config:
spec:
  migration: null
  defaultNetwork:
    ovnKubernetesConfig:
      ...
      mtu: 1354

For more information about MTU change at runtime

  1. Enable IPsec: Add the following to the spec: section of the operator config:
spec:
  defaultNetwork:
    type: OVNKubernetes
    ovnKubernetesConfig:
      ipsecConfig: {}

Example of disabling IPsec at runtime:

$ oc patch networks.operator.openshift.io cluster --type=json -p='[{"op":"remove", "path":"/spec/defaultNetwork/ovnKubernetesConfig/ipsecConfig"}]'

Configuring Network Policy audit logging with OVNKubernetes

OVNKubernetes supports audit logging of network policy traffic events. Add the following to the spec: section of the operator config:

spec:
  defaultNetwork:
    type: OVNKubernetes
    ovnKubernetesConfig: 
      policyAuditingConfig:
        maxFileSize: 1
        rateLimit: 5
        destination: libc
        syslogFacility: local0

To understand more about each field, and to see the default values check out the Openshift api definition

Configuring kube-proxy

Some plugins (like OpenShift SDN) have a built-in kube-proxy, some plugins require a standalone kube-proxy to be deployed, and some (like ovn-kubnernetes) don't use kube-proxy at all.

The deployKubeProxy flag can be used to indicate whether CNO should deploy a standalone kube-proxy, but for supported network types, this will default to the correct value automatically.

The configuration here can be used for third-party plugins with a separate kube-proxy process as well.

For plugins that use kube-proxy (whether built-in or standalone), you can configure the proxy via kubeProxyConfig

  • iptablesSyncPeriod: The interval between periodic iptables refreshes. Default: 30 seconds. Increasing this can reduce the number of iptables invocations.
  • bindAddress: The address to "bind" to - the address for which traffic will be redirected.
  • proxyArguments: additional command-line flags to pass to kube-proxy - see the documentation.

The top-level flag deployKubeProxy tells the network operator to explicitly deploy a kube-proxy process. Generally, you will not need to provide this; the operator will decide appropriately. For example, OpenShiftSDN includes an embedded service proxy, so this flag is automatically false in that case.

Example from the manifests/cluster-network-03-config.yml file:

spec:
  deployKubeProxy: false
  kubeProxyConfig:
   iptablesSyncPeriod: 30s
   bindAddress: 0.0.0.0
   proxyArguments:
     iptables-min-sync-period: ["30s"]

Configuring Additional Networks

Users can configure additional networks, based on Kubernetes Network Plumbing Working Group's Kubernetes Network Custom Resource Definition De-facto Standard Version 1.

  • name: name of network attachment definition, required
  • namespace: namespace for the network attachment definition. The default is default namespace
  • type: specify network attachment definition type, required

Currently, the understood values for type are:

  • Raw
  • SimpleMacvlan

Example from the manifests/cluster-network-03-config.yml file:

spec:
  additionalNetworks:
  - name: test-network-1
    namespace: namespace-test-1
    type: ...

Then it generates the following network attachment definition:

$ oc -n namespace-test-1 get network-attachment-definitions.k8s.cni.cncf.io
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: test-network-1
  namespace: namespace-test-1
  # (snip)
spec:
  # (snip)

Attaching additional network into Pod

Users can attach the network attachment through Pod annotation, k8s.v1.cni.cncf.io/networks, such as:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod-01
  namespace: namespace-test-1
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
            { "name": "test-network-1" }
    ]'
spec:
  containers:
# (snip)

Please take a look into the spec, Kubernetes Network Plumbing Working Group's Kubernetes Network Custom Resource Definition De-facto Standard Version 1, for its detail.

Configuring Raw CNI

Users can configure network attachment definition with CNI json as following options required:

  • rawCNIConfig: CNI JSON configuration for the network attachment

Example from the manifests/cluster-network-03-config.yml file:

spec:
  additionalNetworks:
  - name: test-network-1
    namespace: namespace-test-1
    rawCNIConfig: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "eth1", "mode": "bridge", "ipam": { "type": "dhcp" } }'
    type: Raw

This config will generate the following network attachment definition:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  # (snip)
  name: test-network-1
  namespace: namespace-test-1
  ownerReferences:
  - apiVersion: operator.openshift.io/v1
    # (snip)
spec:
  config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "eth1", "mode": "bridge", "ipam": { "type": "dhcp" } }'

Configuring SimpleMacvlan

SimpleMacvlan provides user to configure macvlan network attachments. macvlan creates a virtual copy of a master interface and assigns the copy a randomly generated MAC address. The pod can communicate with the network that is attached to the master interface. The distinct MAC address allows the pod to be identified by external network services like DHCP servers, firewalls, routers, etc. macvlan interfaces cannot communicate with the host via the macvlan interface. This is because traffic that is sent by the pod onto the macvlan interface is bypassing the master interface and is sent directly to the interfaces underlying network. Before traffic gets sent to the underlying network it can be evaluated within the macvlan driver, allowing it to communicate with all other pods that created their macvlan interface from the same master interface.

Users can configure macvlan network attachment definition with following parameters, all of which are optional:

  • master: master is the host interface to create the macvlan interface from. If not specified, it will be default route interface
  • mode: mode is the macvlan mode: bridge, private, vepa, passthru. The default is bridge
  • mtu: mtu is the mtu to use for the macvlan interface. if unset, host's kernel will select the value
  • ipamConfig: IPAM (IP Address Management) configration: dhcp or static. The default is dhcp
spec:
  additionalNetworks:
  - name: test-network-2
    type: SimpleMacvlan
    simpleMacvlanConfig:
      master: eth0
      mode: bridge
      mtu: 1515
      ipamConfig:
        type: dhcp

Configuring Static IPAM

Users can configure static IPAM with following parameters:

  • addresses:
    • address: Address is the IP address in CIDR format, optional (if no address, assume address will be supplied as pod annotation, k8s.v1.cni.cncf.io/networks)
    • gateway: Gateway is IP inside of subnet to designate as the gateway, optional
  • routes: optional
    • destination: Destination points the IP route destination
    • gateway: Gateway is the route's next-hop IP address. If unset, a default gateway is assumed (as determined by the CNI plugin)
  • dns: optional
    • nameservers: Nameservers points DNS servers for IP lookup
    • domain: Domain configures the domainname the local domain used for short hostname lookups
    • search: Search configures priority ordered search domains for short hostname lookups
spec:
  additionalNetworks:
  - name: test-network-3
    type: SimpleMacvlan
    simpleMacvlanConfig:
      ipamConfig:
        type: static
        staticIPAMConfig:
          addresses:
          - address: 198.51.100.11/24
            gateway: 198.51.100.10
          routes:
          - destination: 0.0.0.0/0
            gateway: 198.51.100.1
          dns:
            nameservers:
            - 198.51.100.1
            - 198.51.100.2
            domain: testDNS.example
            search:
            - testdomain1.example
            - testdomain2.example

Using

The operator is expected to run as a pod (via a Deployment) inside a kubernetes cluster. It will retrieve the configuration above and reconcile the desired configuration. A suitable manifest for running the operator is located in manifests/.

Unsafe changes

Most network changes are unsafe to roll out to a production cluster. Therefore, the network operator will stop reconciling if it detects that an unsafe change has been requested.

Safe changes to apply:

It is safe to edit the following fields in the Operator configuration:

  • deployKubeProxy
  • all of kubeProxyConfig
  • OpenshiftSDN enableUnidling, useExternalOpenvswitch.

Force-applying an unsafe change

Administrators may wish to forcefully apply a disruptive change to a cluster that is not serving production traffic. To do this, first they should make the desired configuration change to the CRD. Then, delete the network operator's understanding of the state of the system:

oc -n openshift-network-operator delete configmap applied-cluster

Be warned: this is an unsafe operation! It may cause the entire cluster to lose connectivity or even be permanently broken. For example, changing the ServiceNetwork will cause existing services to be unreachable, as their ServiceIP won't be reassigned.

cluster-network-operator's People

Contributors

abhat avatar alexanderconstantinescu avatar andreaskaris avatar danehans avatar danwinship avatar dcbw avatar dougbtv avatar dulek avatar jacobtanenbaum avatar jluhrsen avatar kyrtapz avatar martinkennelly avatar maysamacedo avatar msherif1234 avatar npinaeva avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar pecameron avatar pliurh avatar pperiyasamy avatar rcarrillocruz avatar ricky-rav avatar s1061123 avatar smarterclayton avatar squeed avatar trozet avatar tssurya avatar yuvalk avatar zshi-redhat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-network-operator's Issues

[RFE] Provide Support for Third Party Network Providers

As of today, CNO only supports OpenShiftSDN, OVNKubernetes, Kuryr and Raw network types. OpenShiftSDN is the default if nothing is specified. I could not find a way for CNO to not deploy CNI plugin and other network components. This results in a conflict when third party network providers deploy their own set of network components to manage networking in an OpenShift cluster.

It would be nice if there is a way to disable deploying default network components and let third party providers to manage OpenShift cluster network.

add validation for cluster and operator config objects

I did oc edit NetworkConfig.networkoperator.openshift.io default and changed hostSubnetLength from 9 to nine.

CNO started logging:

E0122 09:49:12.552833   10497 reflector.go:205] github.com/openshift/cluster-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1.NetworkConfig: v1.NetworkConfigList.Items: []v1.NetworkConfig: v1.NetworkConfig.Spec: v1.NetworkConfigSpec.ClusterNetworks: []v1.ClusterNetwork: v1.ClusterNetwork.HostSubnetLength: readUint32: unexpected character: ๏ฟฝ, error found in #10 byte of ...|tLength":"nine"}],"d|..., bigger context ...|rks":[{"cidr":"10.128.0.0/14","hostSubnetLength":"nine"}],"defaultNetwork":{"openshiftSDNConfig":{"m|...

once a second.

(Probably a reconciler bug? Just filing this so I don't forget.)

Don't try to guess default MTU when running from outside the cluster

If you deploy a cluster with run-locally.sh, the operator shouldn't guess the cluster's MTU based on the local machine's MTU. In particular, in 4.0 you will currently end up with a cluster network MTU of 1450 when it really should be 8950 on AWS.

This doesn't need to be fixed in 4.0 (since we don't support running the operator locally for real installs, and the too-low MTU will still work well enough for dev installs) but it might become more of a problem with future install types. (Eg, running the operator on an MTU 1500 laptop while installing onto MTU 1450 VMware VMs.)

If we wanted to be super clever we could have run-locally.sh deploy a hostNetwork pod to figure out the cluster MTU before starting the CNO (or even, to have the CNO itself deploy a hostNetwork DaemonSet to check the MTU on every node, thus also preventing people from accidentally deploying a broken network when their cluster is heterogeneous) (or else, having the Node Feature Discovery operator label nodes with their MTUs, if we end up in a world where NFD runs before CNO).

But the simpler fix would be to just have run-locally.sh know the correct default MTU for each supported cluster type...

the CR network-attachment-definitions did not created after the cluster network updated

I try to create the additional network follow the document https://docs.openshift.com/container-platform/4.2/networking/multiple-networks/configuring-macvlan.html#configuring-macvlan on OCP 4.2, based on the document describe, after the cluster network CR updated by adding the additional network, the CNO should create the related network-attachment-definition CR, but in my test environment, the CR did not created after I update the cluster network for a long time.

ca-injector

Why does the ca injector not create a secret like the service ca injector annotation does?

Openshift-sdn listens on undeclared ports

As a general rule, we should declare all ports for processes that run in host-network, so we can easily catch conflicts. I ran a quick auditing script:

[root@test1-worker-0-rpzmc core]# ~core/sockaudit
pod sdn-pzh2f process openshift-sdn listening on NotDeclared port 31712
pod sdn-pzh2f process openshift-sdn listening on NotDeclared port 30119
pod sdn-pzh2f process openshift-sdn listening on NotDeclared port 10256
pod sdn-pzh2f process openshift-sdn listening on NotDeclared port 32562

We need to either declare these ports or stop listening.

ca bundle injection as jks

Have the ability to inject the ca trust bundle as a jks file for java consumption in addition to just pem files

multus takes a long time to deploy (!)

eg, from https://storage.googleapis.com/origin-ci-test/pr-logs/pull/21905/pull-ci-openshift-origin-master-e2e-aws/3736/artifacts/e2e-aws/pods/openshift-network-operator_network-operator-6568d54ddc-2j4pz_network-operator.log.gz:

2019/02/11 19:08:37 Reconciling update to DaemonSet openshift-multus/multus
time="2019-02-11T19:08:37Z" level=info msg="updated clusteroperator: &v1.ClusterOperator{ ... Status:v1.ClusterOperatorStatus{ ... Message:\"DaemonSet \\\"openshift-multus/multus\\\" is not available (awaiting 3 nodes) ...
2019/02/11 19:09:35 Reconciling update to DaemonSet openshift-multus/multus
time="2019-02-11T19:09:35Z" level=info msg="updated clusteroperator: &v1.ClusterOperator{ ... Status:v1.ClusterOperatorStatus{ ... Message:\"DaemonSet \\\"openshift-multus/multus\\\" is not available (awaiting 2 nodes) ...
2019/02/11 19:09:41 Reconciling update to DaemonSet openshift-multus/multus
time="2019-02-11T19:09:41Z" level=info msg="updated clusteroperator: &v1.ClusterOperator{ ... Status:v1.ClusterOperatorStatus{ ... Message:\"DaemonSet \\\"openshift-multus/multus\\\" is not available (awaiting 1 nodes) ...
2019/02/11 19:09:54 Reconciling update to DaemonSet openshift-multus/multus
time="2019-02-11T19:09:54Z" level=info msg="updated clusteroperator: &v1.ClusterOperator{ ... Status:v1.ClusterOperatorStatus{ Type:\"Available\", Status:\"True\"...

The multus pods appear to spend a lot of time in CrashLoopBackoff.

Different master for macvlan (additional) interface in networks.operator.openshift.io cluster

I want to configure additional interface for pods in the cluster using macvlan.

I am referring to:

Now, in the spec section (simple and raw), there is a field master: <master> to specify the host interface to use for macvlan.
Is there a way I can specify this host's interface master for macvlan different for each host?

For example, in the cluster on worker1 node the master interface is: eth1
on worker2 node the master interface is: ensf1.

How can specify this using the config for networks.operator.openshift.io cluster ?

TestRenderOpenshiftSDN failed

"renderOpenshiftSDN" in openshift_sdn_test.go is evaluating the object len of "../../manifests/network/openshift-sdn/dummy.yaml" instead of given "OpenshiftSDNConfig", this causes unit test failure as below:

--- FAIL: TestRenderOpenshiftSDN (0.01s)
testing_t_support.go:22:
/home/zshi/git/golang/src/github.com/openshift/openshift-network-operator/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:69 +0x1ed
github.com/openshift/openshift-network-operator/vendor/github.com/onsi/gomega/internal/assertion.(*Assertion).To(0xc420380b00, 0x1247ee0, 0xc4203aa978, 0x0, 0x0, 0x0, 0x1256620)
/home/zshi/git/golang/src/github.com/openshift/openshift-network-operator/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:35 +0xae
github.com/openshift/openshift-network-operator/pkg/operator.TestRenderOpenshiftSDN(0xc42040c1e0)
/home/zshi/git/golang/src/github.com/openshift/openshift-network-operator/pkg/operator/openshift_sdn_test.go:36 +0x335
testing.tRunner(0xc42040c1e0, 0x11cf248)
/usr/local/go/src/testing/testing.go:777 +0xd0
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:824 +0x2e0

	Expected
	    <[]*unstructured.Unstructured | len:2, cap:2>: [
	        {
	            Object: {
	                "apiVersion": "apps/v1",
	                "kind": "Deployment",
	                "metadata": {
	                    "name": "dummy",
	                    "namespace": "default",
	                },
	                "spec": {
	                    "selector": {
	                        "matchLabels": {"app": "dummy"},
	                    },
	                    "template": {
	                        "metadata": {
	                            "labels": {"app": "dummy"},
	                        },
	                        "spec": {
	                            "containers": [
	                                {
	                                    "command": [
	                                        "/bin/sh",
	                                        "-c",
	                                        "while true; do echo dummy-dep-1; sleep 10; done",
	                                    ],
	                                    "image": "busybox",
	                                    "name": "dummy",
	                                },
	                            ],
	                        },
	                    },
	                },
	            },
	        },
	        {
	            Object: {
	                "apiVersion": "apps/v1beta2",
	                "kind": "DaemonSet",
	                "metadata": {
	                    "labels": {"app": "dummy-ds"},
	                    "name": "dummy-ds",
	                    "namespace": "default",
	                },
	                "spec": {
	                    "selector": {
	                        "matchLabels": {"app": "dummy-ds"},
	                    },
	                    "template": {
	                        "metadata": {
	                            "labels": {"app": "dummy-ds"},
	                        },
	                        "spec": {
	                            "containers": [
	                                {
	                                    "command": [
	                                        "/bin/sh",
	                                        "-c",
	                                        "while true; do echo dummy-ds-1; sleep 10; done",
	                                    ],
	                                    "image": "busybox",
	                                    "name": "dummy",
	                                },
	                            ],
	                        },
	                    },
	                    "updateStrategy": {
	                        "rollingUpdate": {"maxUnavailable": 1},
	                        "type": "RollingUpdate",
	                    },
	                },
	            },
	        },
	    ]
	to have length 1

FAIL
exit status 1
FAIL github.com/openshift/openshift-network-operator/pkg/operator 0.017s

Configuration improvements

Some configuration improvements we should do before 1.0:

  • Get rid of the backwards bit for host subnet length
  • Make ServiceNetwork an array
  • Fix all the capitalization - if @danwinship hasn't found it yet :-)
  • Explicit Multus mode?

"Running manually" instructions are incomplete

The current "Running manually against a test cluster" instructions are incomplete; you need to set (at least) KUBERNETES_SERVICE_PORT and KUBERNETES_SERVICE_HOST in the environment as well in order for the sdn-controller daemonset to function correctly. (Possibly others?)

Followup #255

Taken care of remind comments in #255

  • updating README.md with this new configuration.
  • "cniVersion" could be 0.3.1

/bin/bash: line 16: [: too many arguments

Reconcile should only return an error on transient errors

If you return an error from Reconcile, the controller will keep retrying the request again (with a backoff):

2019/01/22 09:50:45 Reconciling NetworkConfig.networkoperator.openshift.io default
2019/01/22 09:50:45 Not applying unsafe change: invalid configuration: [cannot change ClusterNetworks]
2019/01/22 09:50:46 Reconciling NetworkConfig.networkoperator.openshift.io default
2019/01/22 09:50:46 Not applying unsafe change: invalid configuration: [cannot change ClusterNetworks]
2019/01/22 09:50:49 Reconciling NetworkConfig.networkoperator.openshift.io default
2019/01/22 09:50:49 Not applying unsafe change: invalid configuration: [cannot change ClusterNetworks]
2019/01/22 09:50:54 Reconciling NetworkConfig.networkoperator.openshift.io default
2019/01/22 09:50:54 Not applying unsafe change: invalid configuration: [cannot change ClusterNetworks]

If the configuration is invalid, then "not applying" is a successful reconciliation, so we should not be returning an error. (This probably applies to other cases as well.)

On OCP4.2 Multitenant mode, can't change kube-system's NETID.

I deployed an OCP 4.2 cluster with Multitenant mode, the default NETID of kube-system is 1.

# oc get netnamespaces kube-system
NAME          NETID   EGRESS IPS
kube-system   1

I try to make kube-system global with below commands :

oc adm pod-network make-projects-global kube-system
oc adm pod-network join-projects --to=default kube-system

but seems neither commands work, wait for a while, the NETID is reset to 1 again.

QUESTION, how can I change kube-system NETID to 0 in Multitenant mode ?

Should capitalize "OpenShift" correctly in config data

There's a lot of inconsistency between "OpenShift" and "Openshift" in internal type names, but for anything that might actually show up in a config object that an end user might have to edit, it seems to me like we should be consistently correct? (eg, regardless of what the name of NetworkTypeOpenshiftSDN is, the value should be "OpenShiftSDN" not "OpenshiftSDN")

Openshift 4.2 with proxy enabled : cant deploy images from internal registry

Version

$ openshift-install version
openshift-install unreleased-master-1601-g4e204c5e509de1bd31113b0c0e73af1a35e52c0a
built from commit 4e204c5e509de1bd31113b0c0e73af1a35e52c0a
release image registry.svc.ci.openshift.org/origin/release:4.2

Platform:

None

What happened?

cant deploy using the internal registry because there is no no_proxy for .svc default.

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Install openshift 4.2 with proxy enabled and deploy default jenkins empheral template

Anything else we need to know?

No not at this time

kube-proxy cluster-cidr is omitted breaking external service access when multiple cluster CIDRs are provided

This line here: https://github.com/openshift/cluster-network-operator/blob/master/pkg/network/kube_proxy.go#L45 doesn't pass a cluster-cidr to kube-proxy when the number of ClusterNetworks isn't one. Omitting this field means services can't be accessed from outside the cluster.

The ClusterNetworks are immutable, so if the user does configure two and later discover that they want to access services from outside the cluster, they won't be able to (and it's not easy to debug why). It's sometimes better to set too small a cluster CIDR (leading to some unwanted NAT) than not set one at all - but probably not always.

When using Calico, for example, additional IP ranges can be given to pods beyond those configured here, so this ends up being a trap. I feel like a validation failure if len() > 1 would be more helpful? I'd also like to see the field be mutable (although I understand that perhaps OpenShift SDN can't support that yet?).

Additing additional "Delegate" CNIs in multus config

I am referring to multus config here.

In multus config, it is possible to specify multiple delegates (delegates ([]map,required): number of delegate details in the Multus).

So, using above multiple delegates options in Multus, it is possible to create pods with multiple interfaces without any need for any additional annotations, as mentioned here.
This way all the pods in the cluster will get the additional interface (corresponding to number of delegates configured in Multus).

However, in the cluster specs (using: oc edit networks.operator.openshift.io cluster), how can I add additional Multus delegates?

cnibincopy.sh Script in ConfigMap cni-binary-copy-script fails on CentOS

I'm trying to enable deployment of CentOS Worker nodes on OKD 4.5.

When the cluster-network-operator tries to deploy kube-multus on a node the initContainers copy various files to the host, depending on the host's distribution. To determine the distribution the script reads /etc/os-release and check the ID field. This fails on CentOS, which identifies itself as "centos".

If the ID field on a CentOS node is changed to "rhel", the binaries are copied and the kube-multus container is successfully deployed. Please consider changing the line 37 of bindata/network/multus/multus.yaml from
rhel)
to
rhel|centos).

Thank you.

./hack/run-locally.sh isn't working

I0121 20:47:27.951103 1943746 log.go:184] Controller "ConnectivityCheckController" resync interval is set to 0s which might lead to client request throttling
I0121 20:47:27.954195 1943746 base_controller.go:66] Waiting for caches to sync for ConnectivityCheckController
I0121 20:47:38.170078 1943746 trace.go:205] Trace[211140115]: "Reflector ListAndWatch" name:k8s.io/[email protected]/tools/cache/reflector.go:167 (21-Jan-2022 20:47:27.956) (total time: 10213ms):
Trace[211140115]: ---"Objects listed" 10213ms (20:47:38.169)
Trace[211140115]: [10.213414132s] [10.213414132s] END
I0121 20:47:39.841632 1943746 trace.go:205] Trace[1136817626]: "Reflector ListAndWatch" name:k8s.io/[email protected]/tools/cache/reflector.go:167 (21-Jan-2022 20:47:27.957) (total time: 11884ms):
Trace[1136817626]: ---"Objects listed" 11884ms (20:47:39.841)
Trace[1136817626]: [11.884121269s] [11.884121269s] END
I0121 20:47:39.842773 1943746 trace.go:205] Trace[1297844340]: "Reflector ListAndWatch" name:k8s.io/[email protected]/tools/cache/reflector.go:167 (21-Jan-2022 20:47:27.955) (total time: 11887ms):
Trace[1297844340]: ---"Objects listed" 11887ms (20:47:39.842)
Trace[1297844340]: [11.887396601s] [11.887396601s] END
I0121 20:47:40.872985 1943746 trace.go:205] Trace[2118805739]: "Reflector ListAndWatch" name:k8s.io/[email protected]/tools/cache/reflector.go:167 (21-Jan-2022 20:47:27.956) (total time: 12916ms):
Trace[2118805739]: ---"Objects listed" 12916ms (20:47:40.872)
Trace[2118805739]: [12.916855373s] [12.916855373s] END

F0121 20:49:27.950351 1943746 operator.go:93] Failed to start controller-runtime manager: failed to wait for pki-controller caches to sync: timed out waiting for cache to be synced
goroutine 470 [running]:
k8s.io/klog/v2.stacks(0x1)
	k8s.io/klog/[email protected]/klog.go:1026 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x3bb4d40, 0x3, {0x0, 0x0}, 0xc00044cd90, 0x0, {0x2df7d5f, 0xb}, 0x0, 0x0)
	k8s.io/klog/[email protected]/klog.go:975 +0x63d
k8s.io/klog/v2.(*loggingT).printf(0x0, 0xaa17a0, {0x0, 0x0}, {0x0, 0x0}, {0x24245af, 0x2e}, {0xc0007f66c0, 0x1, ...})
	k8s.io/klog/[email protected]/klog.go:753 +0x1e5
k8s.io/klog/v2.Fatalf(...)
	k8s.io/klog/[email protected]/klog.go:1514
github.com/openshift/cluster-network-operator/pkg/operator.RunOperator.func2()
	github.com/openshift/cluster-network-operator/pkg/operator/operator.go:93 +0xd5
created by github.com/openshift/cluster-network-operator/pkg/operator.RunOperator
	github.com/openshift/cluster-network-operator/pkg/operator/operator.go:90 +0x585

cluster-network-operator goes SIGSEV if no current context in kubeconfig on node

Hi, as per title, if there is a valid kubeconfig but with no current context in it on the node, the cluster-network-operator will segfault and go in crashloopback. I got that by manually rotating certs and creating a new kubeconfig on the machines but without logging to the cluster.
See:

clusterName := kubeconfig.Contexts[kubeconfig.CurrentContext].Cluster

Baremetal IPI: can't download image via proxy

when using a proxy the image-cache container is trying to download the image through a proxy, but using a internal url so instead getting a error. ironic is then writing the contents of this error message to disk as a qcow image

metal3-state.openshift-machine-api needs to be added to no_proxy

From rhcos-48.84.202104271417-0-openstack.x86_64.qcow2 (contains a proxy error message instead of a qcow image)

<p>The following error was encountered while trying to retrieve the URL: <a href="http://metal3-state.openshift-machine-api:6180/images/rhcos-48.84.202104271417-0-openstack.x86_64.qcow2/rhcos-48.84.202104271417-0-openstack.x86_64.qcow2">http://metal3-state.openshift-machine-api:6180/images/rhcos-48.84.202104271417-0-openstack.x86_64.qcow2/rhcos-48.84.202104271417-0-openstack.x86_64.qcow2</a></p>
<pre>Name Error: The domain name does not exist.</pre>

From squid logs
1621432495.708 29 fd00:1101::6ef0:c42d:33f4:c2f TCP_MISS/503 4587 GET http://metal3-state.openshift-machine-api:6180/images/rhcos-47.83.202103251640-0-openstack.x86_64.qcow2/rhcos-47.83.202103251640-0-openstack.x86_64.qcow2 - HIER_NONE/- text/html
and the metal3-machine-os-downloader container in the image-cache pod

    env:                                                      
    - name: RHCOS_IMAGE_URL                       
      value: http://metal3-state.openshift-machine-api:6180/images/rhcos-48.84.202104271417-0-openstack.x86_64.qcow2/rhcos-48.84.202104271417-0-openstack.x86_64.qcow2
    - name: HTTP_PROXY                        
      value: http://[fd00:1101::1]:3128
    - name: HTTPS_PROXY               
      value: http://[fd00:1101::1]:3128         
    - name: NO_PROXY  
      value: .cluster.local,.svc,127.0.0.1,9999,api-int.ostest.test.metalkube.org,fd00:1101::/64,fd01::/48,fd02::/112,fd2e:6f44:5dd8:c956::/120,localhost

bindata/network/openshift-sdn/controller.yaml contains invalid value for daemonset

bindata/network/openshift-sdn/controller.yaml contains invalid
replicas: 1 in the daemonset of sdn-controller.

The daemonset does not allow number of replicas as:

$ kubectl explain daemonset.spec --api-version=apps/v1
  ...
FIELDS:
   minReadySeconds	<integer>
   ...
   revisionHistoryLimit	<integer>
   ..
   selector	<Object> -required-
   ..
   template	<Object> -required-
   ..
   updateStrategy	<Object>

Hence, kubectl create -f out.yaml always gets following error:

$ _output/linux/amd64/cluster-network-renderer --config sample-config.yaml --out out.yaml
$ kubectl create -f out.yaml
...
error validating "out.yaml": error validating data: ValidationError(DaemonSet.spec): unknown field "replicas" in io.k8s.api.apps.v1.DaemonSetSpec; if you choose to ignore these errors, turn validation off with --validate=false

Node never becomes Ready

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/657/pull-ci-openshift-installer-master-e2e-aws/1345?log#log
so saw this error in one of the CI runs, one of the master failed to become Ready:

NAME                           STATUS     ROLES     AGE       VERSION
ip-10-0-1-38.ec2.internal      Ready      master    29m       v1.11.0+d4cacc0
ip-10-0-128-19.ec2.internal    Ready      worker    24m       v1.11.0+d4cacc0
ip-10-0-156-32.ec2.internal    Ready      worker    24m       v1.11.0+d4cacc0
ip-10-0-174-183.ec2.internal   Ready      worker    23m       v1.11.0+d4cacc0
ip-10-0-27-9.ec2.internal      NotReady   master    29m       v1.11.0+d4cacc0
ip-10-0-46-179.ec2.internal    Ready      master    29m       v1.11.0+d4cacc0

and seeing the ocs pod on that node.
https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/657/pull-ci-openshift-installer-master-e2e-aws/1345/artifacts/e2e-aws/pods/openshift-sdn_ovs-fn8d2_openvswitch.log.gz

/etc/openvswitch/conf.db does not exist ... (warning).
Creating empty database /etc/openvswitch/conf.db ovsdb-tool: I/O error: /etc/openvswitch/conf.db: failed to lock lockfile (Resource temporarily unavailable)
[FAILED]

/cc @squeed

Letting openshft-ovs play nice with processes using openvswitch outside the cluster

I was wondering if it's possible to use openvswitch outside the cluster without openshift-ovs complaining with error: "warning: Another process is currently managing OVS, waiting 15s ". After x tries the pod restarts.

In my specific situation i'm deploying openstack nova/neutron with openvswitch as pods in openshift. Everything seems to be working and i'm using de ovsdb tcp connection to the node openvswitch configuration, but openshift-sdn/ovs complains about that another process is managing OVS (which is true because of neutron openvswitch). De openshift-ovs pod then restarts and causes a loop.
Like I said, everything seems to be working but the pod openshift-ovs pods keeps restarting.

Any advice or pointers?

./hack/ovn-kind-cno.sh failing on kubectl cp : permission denied

Running ./hack/ovn-kind-cno.sh as both root and non-root, seems to be hitting an issue with kubectl cp

WARNING: patching CNO operator pod for OVN-K8S, deployment will no longer function if this pod is restarted                                        
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.                   
tar: ovnkube-node.yaml: Cannot open: File exists                                                                                                   
tar: Exiting with failure status due to previous errors                                                                                            
command terminated with exit code 2

Assign a priority class to pods

Priority classes docs:
https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

Example: https://github.com/openshift/cluster-monitoring-operator/search?q=priority&unscoped_q=priority

Notes: The pre-configured system priority classes (system-node-critical and system-cluster-critical) can only be assigned to pods in kube-system or openshift-* namespaces. Most likely, core operators and their pods should be assigned system-cluster-critical. Please do not assign system-node-critical (the highest priority) unless you are really sure about it.

review the rbac role names

The role openshift-ovn-kubernetes-sbdb is defined as

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: openshift-ovn-kubernetes-sbdb
  namespace: openshift-ovn-kubernetes
rules:
- apiGroups: [""]
  resources:
  - endpoints
  verbs:
  - create
  - update
  - patch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - create
  - get
  - list
  - update

but pods need to use this serviceaccount to get access to leases without needing access to databases. We should review rbac names to ensure they sync with their function

All Masters Never Become Ready

Multiple folks on our team are seeing this today.

Using an image built from: openshift/installer@08018ca

We end up completing install, mostly:

level=debug msg="API not up yet: the server could not find the requested resource"
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 54.82.254.148:6443: connect: connection refused"                                         
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 52.1.146.45:6443: connect: connection refused"                                           
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 54.82.254.148:6443: connect: connection refused"                                         
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 52.3.185.104:6443: connect: connection refused"                                          
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 54.159.140.198:6443: connect: connection refused"                                        
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 34.197.191.119:6443: connect: connection refused"                                        
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 54.158.200.103:6443: connect: connection refused"                                        
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 52.1.146.45:6443: connect: connection refused"                                           
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 54.82.254.148:6443: connect: connection refused"                                         
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 52.3.185.104:6443: connect: connection refused"                                          
level=debug msg="API not up yet: Get https://dgoodwin1-api.new-installer.openshift.com:6443/version?timeout=32s: dial tcp 54.159.140.198:6443: connect: connection refused"                                        
level=info msg="API v1.11.0+d4cacc0 up"
level=debug msg="added kube-scheduler.1566c06fe7813814: ip-10-0-0-76_c653648b-e76c-11e8-8d31-0ec82e7bcc70 became leader"                                                                                           
level=debug msg="added kube-controller-manager.1566c070f2c6967b: ip-10-0-0-76_c6645265-e76c-11e8-860d-0ec82e7bcc70 became leader"                                                                                  
level=warning msg="RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 219"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=219&watch=true: dial tcp 52.1.146.45:6443: connect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=219&watch=true: dial tcp 54.159.140.198:6443: connect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=219&watch=true: dial tcp 52.1.146.45:6443: connect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=219&watch=true: dial tcp 52.3.185.104:6443: connect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=219&watch=true: dial tcp 34.197.191.119:6443: connect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=219&watch=true: dial tcp 54.158.200.103:6443: connect: connection refused"
level=warning msg="Failed to connect events watcher: Get https://dgoodwin1-api.new-installer.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=219&watch=true: dial tcp 54.82.254.148:6443: connect: connection refused"
level=error msg="waiting for bootstrap-complete: watch closed before UntilWithoutRetry timeout"
level=info msg="Install complete! Run 'export KUBECONFIG=/output/auth/kubeconfig' to manage your cluster."
level=info msg="After exporting your kubeconfig, run 'oc -h' for a list of OpenShift client commands."

Masters are permanently stuck notready:

$ k get nodes
NAME                          STATUS     ROLES     AGE       VERSION
ip-10-0-19-84.ec2.internal    NotReady   master    1h        v1.11.0+d4cacc0
ip-10-0-38-133.ec2.internal   NotReady   master    1h        v1.11.0+d4cacc0
ip-10-0-8-183.ec2.internal    NotReady   master    1h        v1.11.0+d4cacc0

Their status contains a condition like:

    - lastHeartbeatTime: 2018-11-13T18:59:05Z
      lastTransitionTime: 2018-11-13T17:52:16Z       
      message: 'runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady
        message:Network plugin returns error: cni config uninitialized'
      reason: KubeletNotReady            
      status: "False"
      type: Ready

With a similar error in the kubelet systemd logs:

Nov 13 19:00:12 ip-10-0-8-183 hyperkube[841]: E1113 19:00:12.951372     841 kubelet.go:2101] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni config uninitialized

cluster-network-operator pod logs show:

E1113 18:53:06.044313       1 reflector.go:205] github.com/openshift/cluster-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1.NetworkConfig: Get https://dgoodwin1-api.new-installer.openshift.com:6443/apis/networkoperator.openshift.io/v1/networkconfigs?limit=500&resourceVersion=0: dial tcp 10.0.12.190:6443: connect: connection refused                      
2018/11/13 18:53:14 Reconciling NetworkConfig /default

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.