Giter Club home page Giter Club logo

sriov-network-operator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sriov-network-operator's Issues

Suggestion request - remove the need of skopeo in case the env vars present (env.sh)

https://github.com/openshift/sriov-network-operator/blob/release-4.6/hack/env.sh#L1

Looking on env.sh we see skopeo is mandatory since release 4.6,
one can run the cluster without skopeo installed, but with the variables exported
and it would failed, even we actually dont need to use skopeo and to reset the vars.
This might be the situation we will have, in the machines that wont have sudo to install skopeo
(unless its a must for other things).

Option 1:

CNI_IMAGE_DIGEST=$(skopeo inspect docker://quay.io/openshift/origin-sriov-cni | jq --raw-output '.Digest')
export SRIOV_CNI_IMAGE=${SRIOV_CNI_IMAGE:-quay.io/openshift/origin-sriov-cni@${CNI_IMAGE_DIGEST}}

can be changed to something like

if [ -z $CNI_IMAGE_DIGEST ]; then
  CNI_IMAGE_DIGEST=$(skopeo inspect docker://quay.io/openshift/origin-sriov-cni | jq --raw-output '.Digest') 
  export SRIOV_CNI_IMAGE=quay.io/openshift/origin-sriov-cni@${CNI_IMAGE_DIGEST}
fi

if it would be done for all the variables, the skopeo need would be removed.
Saving a small time downloading / installing it, in case running in a volatile container,
and not fail in case installing isn't possible.

Option 2: Supply a flag that will override all the need of those skopeo use,
verifying just that all values are set.

wdyt please?
see please ref PR which implements option 2

Thanks

There is no indication that sriov-operator is ready to use

I am trying to automate sriov-operator deployment and I realized there is no indication that the operator is ready to use after deployment, webhooks certificate rotation or after creating SriovNetworkNodePolicy.
For example: during webhooks certificate rotation or creating SriovNetworkNodePolicy the sriov-operator pods are up and ready but there is NoSchedule taint on cluster nodes.

To verify sriov-opearator is ready after deployment we constantly check that all pods under kube-system and sriov-network-operator namespaces conditions are Ready=true

Post certificate rotation for validation and mutation webhooks https://github.com/kubevirt/kubevirtci/blob/master/cluster-up/cluster/kind-k8s-sriov-1.17.0/config_sriov.sh#L214:

  • Create certificate and patch caBundle for the webhooks
  • Wait for taint NoSchedue to present.
  • Wait for taint NoSchedue absence.
  • Verify kube-system and sriov-network-operator are up and ready

Post SriovNetworkNodePolicy creation https://github.com/kubevirt/kubevirtci/blob/master/cluster-up/cluster/kind-k8s-sriov-1.17.0/config_sriov.sh#L247:

  • Create SriovNetworkNodePolicy
  • Wait for sriov-cni and sriov-device-plugin pods Ready condition to be true
  • Wait for taint NoSchedue to present
  • Wait for taint NoSchedue absence.
    I noticed that on intel card (X710) if there were VF's already configured on the host, sriov-operator will restart that node configuring VF's again.
    so it is necessary to catch that taint when the node is restarted.
  • Verify kube-system and sriov-network-operator are up and ready

It would be great if we have Ready conditions under sriov-operator Status to indicate what I described,
or a single condition to indicate that the sriov-operator is ready to use (create sriovNetworks)

Deploying operator is failling

Due to adding the validation in this commit :
5db74be
the operator is failing in deploying the default nodePolicy, it is violating the new validation

~/go/src/github.com/openshift/sriov-network-operator
[root@r-cloudx3-07 sriov-network-operator]# kubectl get pods --all-namespaces -o wide
NAMESPACE                NAME                                      READY   STATUS   RESTARTS   AGE   IP              NODE                         NOMINATED NODE   READINESS GATES
sriov-network-operator   sriov-network-operator-58cc6c7d48-r9rbm   0/1     Error    0          2s    10.209.36.115   r-cloudx3-07.mtr.labs.mlnx   <none>           <none>
[root@r-cloudx3-07 sriov-network-operator]# kubectl logs sriov-network-operator-58cc6c7d48-r9rbm -n sriov-network-operator
{"level":"info","ts":1562749150.4414575,"logger":"cmd","msg":"Go Version: go1.10.8"}
{"level":"info","ts":1562749150.441567,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1562749150.4415843,"logger":"cmd","msg":"Version of operator-sdk: v0.7.0+git"}
{"level":"info","ts":1562749150.442389,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1562749150.506561,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1562749150.5066419,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1562749150.5368276,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1562749150.537275,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"sriovnetwork-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1562749150.5375216,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"sriovnetworknodepolicy-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1562749150.618428,"logger":"cmd.createDefaultPolicy","msg":"Create a default SriovNetworkNodePolicy"}
{"level":"error","ts":1562749150.62057,"logger":"cmd","msg":"","error":"SriovNetworkNodePolicy.sriovnetwork.openshift.io \"default\" is invalid: []: Invalid value: map[string]interface {}{\"apiVersion\":\"sriovnetwork.openshift.io/v1\", \"kind\":\"SriovNetworkNodePolicy\", \"metadata\":map[string]interface {}{\"creationTimestamp\":\"2019-07-10T08:59:10Z\", \"generation\":1, \"name\":\"default\", \"namespace\":\"sriov-network-operator\", \"uid\":\"53bdddb7-dc2c-420f-93a8-b6f1a2651e29\"}, \"spec\":map[string]interface {}{\"nicSelector\":map[string]interface {}{}, \"nodeSelector\":interface {}(nil), \"numVfs\":0, \"resourceName\":\"\"}}: validation failure list:\nspec.nodeSelector in body must be of type object: \"null\"\nspec.numVfs in body should be greater than or equal to 1","stacktrace":"github.com/openshift/sriov-network-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/sriov-network-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/openshift/sriov-network-operator/cmd/manager/main.go:124\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:198"}

DaemonSet container missing the pci.ids file

DaemonSet Container is missing the pci.ids file

The current code has a dependency:
https://github.com/openshift/sriov-network-operator/blob/master/pkg/utils/utils.go#L20

Which has a dependency:
https://github.com/jaypipes/ghw/blob/master/pci.go#L15

Which looks for the pci.ids file in the local filesystem and when not found download from an external location (https://pci-ids.ucw.cz/v2.2/pci.ids.gz)
https://github.com/jaypipes/pcidb/blob/master/discover.go#L19

When the deploying the SRIOV operator in air-gapped or disconnected environments, since the container does not have that file in /usr/share/misc/pci.ids, it tries to fetch the pci.ids from the external location which is not available and the deployment fails.

The pcidb has several environment variables options to work with this:

  • PCIDB_DISABLE_NETWORK_FETCH
  • PCIDB_CHROOT

The container mount the host filesystem into /host but setting the PCIDB_CHROOT=/host was not working for us. We could only make it work by adding another volume to map host /usr/share/hwdata/ (which is part of the hwdata package) to the container /usr/share/misc/.

The default behavior should avoid dependencies of external third party site and instead consider the use of the packages already included in the underlying distribution.

Issue in building kustomize and controller-gen with `go get: disabled by -mod=vendor`

Issue in building kustomize and controller-gen with go get: disabled by -mod=vendor

GOPATH=/root/golang
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin:/usr/local/kubebuilder/bin
[root@openshift-jumpserver-0 ~]# go version
go version go1.13.15 linux/amd64

Start from a clean sheet:

rm -Rf golang
mkdir golang

Get source:

go get github.com/openshift/sriov-network-operator
cd $GOPATH/src/github.com/openshift/sriov-network-operator/

Run make deploy-setup:

[root@openshift-jumpserver-0 sriov-network-operator]# make deploy-setup
which: no controller-gen in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin:/usr/local/kubebuilder/bin)
which: no kustomize in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin:/usr/local/kubebuilder/bin)
go: creating new go.mod: module tmp
go get: disabled by -mod=vendor
make: *** [Makefile:127: controller-gen] Error 1

What fails here is the build of kustomize and controller-gen:

[root@openshift-jumpserver-0 sriov-network-operator]# make controller-gen
which: no controller-gen in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin:/usr/local/kubebuilder/bin)
which: no kustomize in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin:/usr/local/kubebuilder/bin)
go: creating new go.mod: module tmp
go get: disabled by -mod=vendor
make: *** [Makefile:127: controller-gen] Error 1
[root@openshift-jumpserver-0 sriov-network-operator]# make kustomize
which: no controller-gen in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin:/usr/local/kubebuilder/bin)
which: no kustomize in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin:/usr/local/kubebuilder/bin)
go: creating new go.mod: module tmp
go get: disabled by -mod=vendor
make: *** [Makefile:142: kustomize] Error 1

The problem is this here in the Makefile:

[root@openshift-jumpserver-0 sriov-network-operator]# grep GOFLAGS Makefile 
export GOFLAGS=-mod=vendor
[root@openshift-jumpserver-0 sriov-network-operator]# grep GO111 Makefile 
export GO111MODULE=on

So, I had to modify the Makefile:

[root@openshift-jumpserver-0 sriov-network-operator]# diff Makefile.old Makefile
132c132
< 	go get sigs.k8s.io/controller-tools/cmd/[email protected] ;\
---
> 	GOFLAGS="" go get sigs.k8s.io/controller-tools/cmd/[email protected] ;\
147c147
< 	go get sigs.k8s.io/kustomize/kustomize/[email protected] ;\
---
> 	GOFLAGS="" go get sigs.k8s.io/kustomize/kustomize/[email protected] ;\

And then, make deploy-setup goes through.

Cannot clone repository with "go get ..."

Cannot clone repository with:

go get github.com/openshift/sriov-network-operator

Clone operation is interrupted with error:

# go get github.com/openshift/sriov-network-operator
go: finding github.com/openshift/sriov-network-operator latest
go: downloading github.com/openshift/sriov-network-operator v0.0.0-20201102134141-c9886b63e60a
go: extracting github.com/openshift/sriov-network-operator v0.0.0-20201102134141-c9886b63e60a
go get: github.com/openshift/[email protected] requires
	k8s.io/[email protected]: reading k8s.io/kubectl/go.mod at revision v0.0.0: unknown revision v0.0.0

System info:

# cat /etc/redhat-release 
CentOS Linux release 8.2.2004 (Core)
# uname -a
Linux node1 4.18.0-193.19.1.el8_2.x86_64 #1 SMP Mon Sep 14 14:37:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
# go version
go version go1.13.15 linux/amd64

Thanks,
Vitaliy

Network-attachment-definition not updated after network update

This issue appears when updating the SriovNetwork object with networkNamespace enabled. The NetworkAttachmentDefinition object should be updated accordingly, but the operator doesn't reconcile the desired state of the network with the "net-att-def".

In this case, VLAN id was updated in the network, but not in the "net-att-def".

apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovNetwork
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"sriovnetwork.openshift.io/v1","kind":"SriovNetwork","metadata":{"annotations":{},"name":"example-sriovnetwork","namespace":"sriov-network-operator"},"spec":{"ipam":"{\n  \"type\": \"host-local\",\n  \"subnet\": \"10.56.217.0/24\",\n  \"rangeStart\": \"10.56.217.10\",\n  \"rangeEnd\": \"10.56.217.11\",\n  \"routes\": [{\n    \"dst\": \"0.0.0.0/0\"\n  }],\n  \"gateway\": \"10.56.217.1\"\n}\n","networkNamespace":"default","resourceName":"intelnics","vlan":1050}}
    creationTimestamp: "2019-07-22T14:41:24Z"
    generation: 2
    name: example-sriovnetwork
    namespace: sriov-network-operator
    resourceVersion: "1822482"
    selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/sriov-network-operator/sriovnetworks/example-sriovnetwork
    uid: c755acc9-ac8e-11e9-a85c-0cc47a8eed4c
  spec:
    ipam: |
      {
        "type": "host-local",
        "subnet": "10.56.217.0/24",
        "rangeStart": "10.56.217.10",
        "rangeEnd": "10.56.217.11",
        "routes": [{
          "dst": "0.0.0.0/0"
        }],
        "gateway": "10.56.217.1"
      }
    networkNamespace: default
    resourceName: intelnics
    vlan: 2000
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
oc get network-attachment-definition -A -o yaml
apiVersion: v1
items:
- apiVersion: k8s.cni.cncf.io/v1
  kind: NetworkAttachmentDefinition
  metadata:
    annotations:
      k8s.v1.cni.cncf.io/resourceName: openshift.com/intelnics
    creationTimestamp: "2019-07-22T14:41:24Z"
    generation: 1
    name: example-sriovnetwork
    namespace: default
    ownerReferences:
    - apiVersion: sriovnetwork.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: SriovNetwork
      name: example-sriovnetwork
      uid: c755acc9-ac8e-11e9-a85c-0cc47a8eed4c
    resourceVersion: "1571932"
    selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/default/network-attachment-definitions/example-sriovnetwork
    uid: c765f3e8-ac8e-11e9-a85c-0cc47a8eed4c
  spec:
    config: |
      {"cniVersion":"0.3.1","name":"sriov-net","type":"sriov",  "vlan":1050,"ipam":{"type":"host-local","subnet":"10.56.217.0/24","rangeStart":"10.56.217.10","rangeEnd":"10.56.217.11","routes":[{"dst":"0.0.0.0/0"}],"gateway":"10.56.217.1"}}
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Decreasing the numVfs in the policy doesn't seem to work

I've deployed the sriov operator and created a policy with the following yaml:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: eno1-network-node-policy
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: false
  mtu: 1500
  nicSelector:
    pfNames:
    - eno1
    rootDevices:
    - 0000:01:00.0
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 63
  priority: 10
  resourceName: sriovnics

Tested a sriovnetwork, pods with the net-attach-def with static ips and everything works. Now I want to decrease the numVfs, so I've edited the resource and set it to 4:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: eno1-network-node-policy
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: false
  mtu: 1500
  nicSelector:
    pfNames:
    - eno1
    rootDevices:
    - 0000:01:00.0
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  priority: 10
  resourceName: sriovnics

But in the actual hosts the number of VFs is still the same. I've tried to manually resize them with:

# echo 4 > /sys/class/net/eno1/device/sriov_numvfs
-bash: echo: write error: Device or resource busy

And in dmesg:

[177015.556541] ixgbe 0000:01:00.0: 63 VFs already enabled. Disable before enabling 4 VFs

So I'm assuming it needs to be disabled/enabled.

Thanks.

[RFE] Allow non supported hardware for testing purposes

The current operator version injects udev rules to avoid NetworkManager to manage the VFs but only for a few cards.
It would be nice to have instructions on how to provide the same behaviour for unsupported NICs.
It can be as simple as applying a machine config such as:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: null
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-etc-udev-rules-d-11-nm-unmanaged-rules
spec:
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 2.2.0
    networkd: {}
    passwd: {}
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,QUNUSU9OPT0iYWRkfGNoYW5nZSIsIEFUVFJTe2RldmljZX09PSIweDE1MTUiLCBFTlZ7Tk1fVU5NQU5BR0VEfT0iMSIK
          verification: {}
        filesystem: root
        group:
          name: root
        mode: 420
        path: /etc/udev/rules.d/11-nm-unmanaged.rules
        user:
          name: root
    systemd: {}
  osImageURL: ""

Where the udev rule is:

ACTION=="add|change", ATTRS{device}=="0x1515", ENV{NM_UNMANAGED}="1"

(where 0x1515 is the PCI ID)

DeviceType: vfio-pci is not recognized on OKD 4.6

My cluster version is "4.6.0-0.okd-2021-01-17-185703" and I tried to use SR-IOV for KubeVirt vm.
On OpenShift official documents, only vfio-pci deviceType can be used for KubeVirt vm.
So I installed openshift-sriov-network-operator plugin by manual (git clone and make deploy-setup) and turned off web hook to use unsupported NIC as below:
image

After I configured host's /etc/default/grub file, executed mkconfig and rebooted the host, I can see the VFIO modules from lsmod command results. So I created sriov-network-policy yaml file, but it seems not created VFs as I want.

[sriov-device-plugin pod logs]
image

Do anyone have idea to fix it?
I share my configurations and information as below to help your understand:

  • OS: Fedora CoreOS 33.20210104.10.0
  • OKD version: 4.6.0-0.okd-2021-01-17-185703
  • Used Branch: master
# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt rd.driver.pre=vfio-pci"
GRUB_DISABLE_RECOVERY="true"``


# cat np-sriov-vfio.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ens3f1-vfio-np
  namespace: openshift-sriov-network-operator
spec:
  resourceName: ens3f1vfio
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 1500
  numVfs: 4
  nicSelector:
    vendor: "8086"
    deviceID: "10fb"
    pfNames: ["ens3f1#0-3"]
    rootDevices: ["0000:08:00.1"]
  deviceType: vfio-pci
  isRdma: false

RFE: allow to split VFs of the same PF into multiple resource pools

Use case: a node has a single PF with multiple VFs. An admin would like to allow scheduling different workloads to the same node. Specifically, the admin would like the cluster users to schedule both workloads that rely on VFIO (e.g. kubevirt VMIs) as well as those that use the regular network driver (regular pods using netlink interfaces).

Problem: right now, the operator spec format doesn't allow to specify that e.g. first 10 VFs should be configured for VFIO while the rest left for netdevice mode of the device plugin.

Related info: the device plugin now allows to list indices of VFs when using PF name selector: k8snetworkplumbingwg/sriov-network-device-plugin#157 But since there are issues with requiring listing particular PF names in the spec + not being able to select a particular node, the new feature may not rely on explicit name/indices.

Related discussion: there was a discussion among operator maintainers and kubevirt engineers (via email) with ideas about how this feature could be implemented that I will capture below for reference.

====

If we want to support it, I prefer to minimize modification to the API. Here is a thought. Please see if it can fulfil your use case.

Currently, there is priority field in the SriovNetworkNodePolicy CR. When there are multiple policies select the same set of SRIOV NICs, and specify different numVFs, the higher priority policy will overruled the lower ones. Maybe we can add a "sharedPF" field to overlook this priority behavior, when it's true, operator sums up the total numVFs in all polices, then config the PF accordingly.

In following example, there're 2 policies defined for 2 resource names. We can let operator to config 6+6=12 VFs for NIC "eno3", then load vfio driver for 6 VFs, and load default driver for the rest. And in the config of device plugin, we can have two resourceNames which has different "driver selector".

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-1
  namespace: sriov-network-operator
spec:
  resourceName: intelnics
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  sharedPF: true
  numVfs: 6
  nicSelector:
    vendor: "8086"
    pfName: eno3
    rootDevices: ['0000:86:00.1']
  deviceType: netdevice
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-2
  namespace: sriov-network-operator
spec:
  resourceName: intelnics-dpdk
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  sharedPF: true
  numVfs: 6
  nicSelector:
    vendor: "8086"
    pfName: eno3
    rootDevices: ['0000:86:00.1']
  deviceType: vfio-pci

===

Just to share another thought on splitting VFs from same PF as different resource names.
There is a open PR in device plugin to enhance PF selector to specify VF index, for example:
PF filter will be like ["eth0#1-3"] indicating that 3 VFs will need to be created from eth0 PF and VFs with index 1 to 3 will be grouped as a single resource name. I think SR-IOV Operator might be able to leverage this feature, inspect the PF filter to get VF index numbers before merging policies, then decide on how many VFs should be provisioned on each PF.
This has advantage of not changing any operator APIs and allows use of VFs from same PF as different resources even they have same driver, but it requires PF selector(optional) be configured in order to split VFs from same PF.

k8snetworkplumbingwg/sriov-network-device-plugin#165

===

I think the original asking is to group VF devices with different types (netdevice or vfio-pci) to separate resource pools.
This is already supported in DP by using driver selector. I don't think we need to add another selector in DP to support this particular use case.

The missing parts are:

  1. we need to find a way in SR-IOV Operator to generate such configuration for DP.
  2. operator cannot provision single PF to VFs of different device types. Because according to SR-IOV node network policy design, only one policy with highest priority matching PF device will be applied. If device type is specified as vfio-pci, then all provisioned VFs will be bind to vfio-pci driver.

The suggestion of using VF index in policy is to allow Operator to configure VFs from same PF differently. Operator can have its own format of defining VF index, for example, it may be appended to each element of rootDevices list which is used to identify PF device and provision VFs on it. This could solve the second issue.

For the 1) issue, if PF name selector is not configured in SR-IOV node network policy (final merged policy doesn't contain PF name selector), then generated DP configuration won't contain PF name.

Add support for incorporating metadata from an OpenStack-based virtual deployment

When OpenShift is deployed on OpenStack, OpenStack provides additional metadata about networks that can be used by the sriov-network-operator. This metadata specifies which OpenStack networks (UUID) are connected to which interface of the VM. In addition, if Device Role Tagging is used, the metadata can also provide labels that allow the grouping of VF resources.

This RFE is contingent upon the approval of this PR.

An example of the metadata is below:

{
  "uuid": "bc90bd5c-96b8-42a2-a502-6b0b6dbb66af",
  "admin_pass": "...",
  "hostname": "worker-0.fdp.nfv",
  "name": "worker-0.fdp.nfv",
  "launch_index": 0,
  "availability_zone": "nova",
  "random_seed": "....",
  "project_id": "ecdb8ecc548146febe3e1e0145bb6dfb",
  "devices": [
    {
      "vf_trusted": false,
      "type": "nic",
      "mac": "fa:16:3e:5e:d2:74",
      "bus": "pci",
      "address": "0000:00:03.0",
      "tags": [
        "sdn"
      ]
    },
    {
      "vlan": 108,
      "vf_trusted": true,
      "type": "nic",
      "mac": "fa:16:3e:e9:bb:7c",
      "bus": "pci",
      "address": "0000:00:07.0",
      "tags": [
        "uplink"
      ]
    },
    {
      "vlan": 108,
      "vf_trusted": true,
      "type": "nic",
      "mac": "fa:16:3e:8b:7a:97",
      "bus": "pci",
      "address": "0000:00:08.0",
      "tags": [
        "uplink"
      ]
    },
    {
      "vlan": 118,
      "vf_trusted": true,
      "type": "nic",
      "mac": "fa:16:3e:37:a2:44",
      "bus": "pci",
      "address": "0000:00:09.0",
      "tags": [
        "downlink"
      ]
    },
    {
      "vlan": 118,
      "vf_trusted": true,
      "type": "nic",
      "mac": "fa:16:3e:33:c8:51",
      "bus": "pci",
      "address": "0000:00:0a.0",
      "tags": [
        "downlink"
      ]
    },
    {
      "vf_trusted": false,
      "type": "nic",
      "mac": "fa:16:3e:f2:b6:87",
      "bus": "pci",
      "address": "0000:00:04.0",
      "tags": [
        "dpdk1"
      ]
    },
    {
      "vf_trusted": false,
      "type": "nic",
      "mac": "fa:16:3e:b7:c7:a7",
      "bus": "pci",
      "address": "0000:00:05.0",
      "tags": [
        "dpdk2"
      ]
    }
  ]
}

This example shows 7 virtual nics that have been attached to this instance of an OpenShift worker.

  • OpenShift SDN interface
  • dpdk1 network
  • dpdk2 network
  • Two connections to the downlink network
  • Two connections to the uplink network

The "tags" field has been specified using the Device Role Tagging feature of the nova boot command.

An additional set of metadata specifies networking information:

{
  "links": [
    {
      "id": "tap436f0927-61",
      "vif_id": "436f0927-6123-4c7d-a8e4-d35b56fff0b5",
      "type": "vhostuser",
      "mtu": 1450,
      "ethernet_mac_address": "fa:16:3e:5e:d2:74"
    },
    {
      "id": "tap6552d990-e8",
      "vif_id": "6552d990-e8a4-4403-bde6-6245ebac4313",
      "type": "hw_veb",
      "mtu": 1500,
      "ethernet_mac_address": "fa:16:3e:e9:bb:7c"
    },
    {
      "id": "tap86ca8861-b8",
      "vif_id": "86ca8861-b8c3-4233-8189-920972659dc3",
      "type": "hw_veb",
      "mtu": 1500,
      "ethernet_mac_address": "fa:16:3e:8b:7a:97"
    },
    {
      "id": "tapab277d63-af",
      "vif_id": "ab277d63-af39-4ff7-89bf-dfe29b7cb984",
      "type": "hw_veb",
      "mtu": 1500,
      "ethernet_mac_address": "fa:16:3e:37:a2:44"
    },
    {
      "id": "tap600657be-16",
      "vif_id": "600657be-1638-482c-abd2-c3cc6403c0ed",
      "type": "hw_veb",
      "mtu": 1500,
      "ethernet_mac_address": "fa:16:3e:33:c8:51"
    },
    {
      "id": "tapf503c3b2-0c",
      "vif_id": "f503c3b2-0c96-4844-aa7d-88591983b2e6",
      "type": "vhostuser",
      "mtu": 1450,
      "ethernet_mac_address": "fa:16:3e:f2:b6:87"
    },
    {
      "id": "tapd3d2b570-6b",
      "vif_id": "d3d2b570-6b63-4d94-a8cf-3024247abf8f",
      "type": "vhostuser",
      "mtu": 1450,
      "ethernet_mac_address": "fa:16:3e:b7:c7:a7"
    }
  ],
  "networks": [
    {
      "id": "network0",
      "type": "ipv4_dhcp",
      "link": "tap436f0927-61",
      "network_id": "a97e59fb-7f3e-407b-9adc-3caa5e79a1b2"
    },
    {
      "id": "network1",
      "type": "ipv4_dhcp",
      "link": "tap6552d990-e8",
      "network_id": "22300314-3743-4e7a-8f87-f73a9d39cef4"
    },
    {
      "id": "network2",
      "type": "ipv4_dhcp",
      "link": "tap86ca8861-b8",
      "network_id": "22300314-3743-4e7a-8f87-f73a9d39cef4"
    },
    {
      "id": "network3",
      "type": "ipv4_dhcp",
      "link": "tapab277d63-af",
      "network_id": "ea24bd04-8674-4f69-b0ee-fa0b3bd20509"
    },
    {
      "id": "network4",
      "type": "ipv4_dhcp",
      "link": "tap600657be-16",
      "network_id": "ea24bd04-8674-4f69-b0ee-fa0b3bd20509"
    },
    {
      "id": "network5",
      "type": "ipv4_dhcp",
      "link": "tapf503c3b2-0c",
      "network_id": "863b5c10-3f21-4b0d-a3d1-2e20c1b6997d"
    },
    {
      "id": "network6",
      "type": "ipv4_dhcp",
      "link": "tapd3d2b570-6b",
      "network_id": "90f783e0-a793-47ab-89c9-f47d27021169"
    }
  ],
  "services": [
    {
      "type": "dns",
      "address": "192.168.122.1"
    }
  ]
}

By connecting the information provided by both of these sources of metadata, one can determine the OpenStack Network UUID and if present, any Device Role Tagging information.

Two new fields networkID and networkTag can be added to the SriovNetworkNodeState and SriovNetworkNodePolicy CRDs. The additional fields in the SriovNetworkNodeState CR will allow the admin write a SriovNetworkNodePolicy that selects interfaces grouped by networkID and networkTag. An example SriovNetworkNodePolicy is shown below:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: radio-downlink
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  nicSelector:
    networkID: ea24bd04-8674-4f69-b0ee-fa0b3bd20509
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: 'true'
  numVfs: 1
  priority: 99
  resourceName: intelnics_radio_downlink

All interface connected to the network ea24bd04-8674-4f69-b0ee-fa0b3bd20509 will be grouped as the resource intelnics_radio_downlink. Similarly, the networkTag field could group by Device Role Tagging.

The sriov-network-device-plugin configure can remain the same if the sriov-plugin-configmap generation uses the networkID and networkTag fields to select devices and then populate the appropriate rootDevices array in the configMap.

"Link not found" occurs when trying to set VF Admin mac

Hi,
NIC: X700
driver: i40e
version: 2.10.19.82
firmware-version: 6.80 0x80003c64 1.2007.0
Operator version: Latest
K8s: Vanilla

I have noticed โ€œLink not foundโ€ error occurs regularly. When the error occurs, and the logic flow restarts, it works fine then.
I was going to dig deeper on this but first I am wondering if you have encountered it when you use the SyncNodeState() function specially the setVfsAdminMac() function?

setVfsAdminMac(): unable to get VF link for device &{Name:ens785f0 Mac:68:05:ca:2d:ea:f0 Driver:i40e PciAddress:0000:02:00.0 Vendor:8086 DeviceID:1572 Mtu:1500 NumVfs:0 LinkSpeed:-1 Mb/s LinkType:ETH TotalVfs:32 VFs:[]} "Link not found"

I think adding a retry function here may help prevent returning an error and restarting the Apply() flow. What you think?

DaemonSet should not assume it must be deployed to all worker nodes or into a single role

The current nodeSelector for the DaemonSet match all worker nodes

https://github.com/openshift/sriov-network-operator/blob/master/bindata/manifests/daemon/daemonset.yaml#L25-L27

  1. This approach limit the ability to use the operator in environment with multiple worker roles (e.g. worker-gpu, worker-rt, worker-group-a, etc)
  2. This approach may deploy the DaemonSet into Nodes that do not require or support SRIOV. For example, environment with worker nodes where some may have NIC cards with support for SRIOV and other nodes in the same role which doesn't.

Consider the use of a custom node label to determine the Nodes into which to deploy the DaemonSet. (e.g. sriov.openshift.io=true)

SriovNetwork clientset is missing

Hello
I am trying to vendor SriovNetwork to another project in order to create SriovNetwork object in a cluster with go kubernetes client,
but It is not possible because there is no clientset for SriovNetwork.

support for sriov-network-operator in vanilla kubernetes

I'm trying to install sriov-network-operator with k8s v1.16.2 on a Ubuntu 16.04.6 server with the following command.

export OPERATOR_EXEC=kubectl
make deploy-setup-k8s

The following error is thrown.

# make deploy-setup-k8s
/bin/sh: 1: Syntax error: "(" unexpected
Makefile:61: recipe for target 'deploy-setup' failed
make: *** [deploy-setup] Error 2

Then changed the Makefile like below.

diff --git a/Makefile b/Makefile
index 41fdb0d..de1378b 100644
--- a/Makefile
+++ b/Makefile
@@ -58,7 +58,7 @@ gencode: operator-sdk
        @operator-sdk generate openapi

 deploy-setup:
-       @EXCLUSIONS=() hack/deploy-setup.sh $(NAMESPACE)
+       @EXCLUSIONS="()" hack/deploy-setup.sh $(NAMESPACE)

 deploy-setup-k8s: export NAMESPACE=sriov-network-operator

Then the following messages are shown.

$ make deploy-setup-k8s
hack/env.sh: line 1: skopeo: command not found
hack/env.sh: line 3: skopeo: command not found
hack/env.sh: line 5: skopeo: command not found
hack/env.sh: line 7: skopeo: command not found
hack/env.sh: line 9: skopeo: command not found
hack/env.sh: line 11: skopeo: command not found
~/cnis/sriov-network-operator/deploy ~/cnis/sriov-network-operator
namespace/sriov-network-operator created
...
...

Is sriov-network-operator supported in other Linux distributions and vanilla kubernetes cluster ?

[Enhancement] Support Generic SR-IOV configuration via system services

Adding this for further discussion and for this to be considered as an Enhancement for OpenShift 4.8

This Enhancement proposes to move generic_plugin SR-IOV configuration to boot time, to be performed by a system service,
Similarly to how switchdev is being configured today via SR-IOV network operator.

Doing so, When a node is rebooted, would not require a node to cordon, drain on boot as well as restart of device plugin as the machine will be properly configured on boot, making the node available to the k8s cluster faster and would keep pods running on the node after reboot in case they were created without a controller.

Proposal is to take a similar pattern to mco_plugin and create a sriov_config_service plugin to inject an sriov configuration service and its configuration to the node. This plugin would then be optionally enabled to replace the existing generic_plugin
via a configuration knob in SR-IOV config daemon.

mco_plugin can later be extended to support this functionality for OpenShift cluster, utilizing MCO to inject these services.

Suggestion: Update DiscoverSriovDevices to use `/sys/class/net` in order to support netns isolation

We found out that there is a case in which the user needs to distribute the PFs exclusivity in a few network namespaces.
For example running two clusters, each with its own netns,
and each netns with one PF exclusivity (assigned by ip link set <PF> netns <NS>).
One use case, for example, is to run 2 prow jobs on the same node, each with its own PF and netns.

Since current config-daemon DiscoverSriovDevices detects the interfaces via /sys/devices/pci*,
all the PFs would be visible because the daemon runs on host netns.
As a result the unconfigured PFs will be reset in resetSriovDevice which is called by SyncNodeState.
This will cause one cluster to corrupt the 2nd cluster, even if the PF isn't in its own netns.

Please consider using /sys/class/net/*/device/uevent for discovering instead.
Tested it for the above scenario and it fixed the problem,
i could run two clusters, each with its own PF, side by side on the same node.

As we spoke, it should be discussed if there are use cases where the daemon still needs to discover all the interfaces, via /sys/devices/pci* and then a flag should be added in order to select the desired discovery method.

see U/S k8snetworkplumbingwg/sriov-network-operator#2

/cc @zshi-redhat

OpenShift SRIOV network operator deployment failed on K8s cluster with Ubuntu 18.04 OS

Deployment of OpenShift SRIOV network operator failed on K8s cluster with Ubuntu OS with error in Makefile.

Steps to reproduce:

  1. Install K8s cluster
  2. Install GO on mater node with
    snap install go --classic
  3. Download repo:
     mkdir /sriov 
     cd /sriov
     export GOPATH=/sriov
     go get github.com/openshift/sriov-network-operator
    
  4. Start operator deployment
export PATH=$GOPATH/bin:$PATH
snap install jq
snap install skopeo --edge --devmode
cd /sriov/src/github.com/openshift/sriov-network-operator/
make deploy-setup-k8s

Installation stdout:

go: creating new go.mod: module tmp
go: found sigs.k8s.io/controller-tools/cmd/controller-gen in sigs.k8s.io/controller-tools v0.3.0
go: downloading k8s.io/apimachinery v0.18.2
go: downloading github.com/spf13/cobra v0.0.5
go: downloading gopkg.in/yaml.v3 v3.0.0-20190905181640-827449938966
go: downloading k8s.io/apiextensions-apiserver v0.18.2
go: downloading golang.org/x/tools v0.0.0-20190920225731-5eefd052ad72
go: downloading k8s.io/api v0.18.2
go: downloading sigs.k8s.io/yaml v1.2.0
go: downloading github.com/fatih/color v1.7.0
go: downloading github.com/spf13/pflag v1.0.5
go: downloading github.com/mattn/go-colorable v0.1.2
go: downloading github.com/inconshreveable/mousetrap v1.0.0
go: downloading github.com/gogo/protobuf v1.3.1
go: downloading k8s.io/utils v0.0.0-20200324210504-a9aa75ae1b89
go: downloading github.com/google/gofuzz v1.1.0
go: downloading k8s.io/klog v1.0.0
go: downloading github.com/mattn/go-isatty v0.0.8
go: downloading github.com/gobuffalo/flect v0.2.0
go: downloading gopkg.in/inf.v0 v0.9.1
go: downloading sigs.k8s.io/structured-merge-diff/v3 v3.0.0
go: downloading golang.org/x/net v0.0.0-20191004110552-13f9640d40b9
go: downloading gopkg.in/yaml.v2 v2.2.8
go: downloading golang.org/x/sys v0.0.0-20191022100944-742c48ecaeb7
go: downloading github.com/json-iterator/go v1.1.8
go: downloading github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
go: downloading github.com/modern-go/reflect2 v1.0.1
go: downloading golang.org/x/text v0.3.2
/sriov/bin/controller-gen "crd:crdVersions={v1},trivialVersions=true" webhook paths="./..." output:crd:artifacts:config=config/crd/bases
go: creating new go.mod: module tmp
go: downloading sigs.k8s.io/kustomize/kustomize/v3 v3.5.4
go: downloading sigs.k8s.io/kustomize v2.0.3+incompatible
go: downloading sigs.k8s.io/kustomize/api v0.3.2
go: downloading k8s.io/client-go v0.17.0
go: downloading github.com/pkg/errors v0.8.1
go: downloading sigs.k8s.io/yaml v1.1.0
go: downloading sigs.k8s.io/kustomize/cmd/config v0.0.5
go: downloading gopkg.in/yaml.v2 v2.2.4
go: downloading k8s.io/api v0.17.0
go: downloading k8s.io/apimachinery v0.17.0
go: downloading sigs.k8s.io/kustomize/kyaml v0.0.6
go: downloading github.com/posener/complete/v2 v2.0.1-alpha.12
go: downloading github.com/evanphx/json-patch v4.5.0+incompatible
go: downloading github.com/google/gofuzz v1.0.0
go: downloading github.com/olekukonko/tablewriter v0.0.4
go: downloading github.com/go-errors/errors v1.0.1
go: downloading k8s.io/kube-openapi v0.0.0-20191107075043-30be4d16710a
go: downloading github.com/gogo/protobuf v1.2.2-0.20190723190241-65acae22fc9d
go: downloading github.com/Azure/go-autorest/autorest v0.9.0
go: downloading github.com/davecgh/go-spew v1.1.1
go: downloading github.com/googleapis/gnostic v0.0.0-20170729233727-0c5108395e2d
go: downloading golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45
go: downloading sigs.k8s.io/kustomize/cmd/kubectl v0.0.3
go: downloading github.com/mattn/go-runewidth v0.0.7
go: downloading github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510
go: downloading github.com/Azure/go-autorest/autorest/adal v0.5.0
go: downloading github.com/xlab/treeprint v0.0.0-20181112141820-a009c3971eca
go: downloading github.com/hashicorp/go-multierror v1.0.0
go: downloading gopkg.in/yaml.v3 v3.0.0-20191026110619-0b21df46bc1d
go: downloading github.com/gophercloud/gophercloud v0.1.0
go: downloading k8s.io/kubectl v0.0.0-20191219154910-1528d4eea6dd
go: downloading github.com/go-openapi/spec v0.19.5
go: downloading k8s.io/cli-runtime v0.17.0
go: downloading google.golang.org/appengine v1.5.0
go: downloading github.com/hashicorp/errwrap v1.0.0
go: downloading github.com/golang/protobuf v1.3.2
go: downloading github.com/dgrijalva/jwt-go v3.2.0+incompatible
go: downloading github.com/go-openapi/jsonreference v0.19.3
go: downloading github.com/go-openapi/swag v0.19.5
go: downloading golang.org/x/crypto v0.0.0-20190923035154-9ee001bba392
go: downloading github.com/imdario/mergo v0.3.5
go: downloading github.com/Azure/go-autorest/autorest/date v0.1.0
go: downloading github.com/Azure/go-autorest/tracing v0.5.0
go: downloading github.com/mailru/easyjson v0.7.0
go: downloading cloud.google.com/go v0.38.0
go: downloading k8s.io/utils v0.0.0-20191114184206-e782cd3c129f
go: downloading github.com/PuerkitoBio/purell v1.1.1
go: downloading github.com/go-openapi/jsonpointer v0.19.3
go: downloading github.com/emicklei/go-restful v2.9.5+incompatible
go: downloading github.com/exponent-io/jsonpath v0.0.0-20151013193312-d6023ce2651d
go: downloading github.com/chai2010/gettext-go v0.0.0-20160711120539-c6fed771bfd5
go: downloading github.com/Azure/go-autorest/logger v0.1.0
go: downloading k8s.io/component-base v0.17.0
go: downloading github.com/mitchellh/go-wordwrap v1.0.0
go: downloading github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578
go: downloading github.com/jonboulle/clockwork v0.1.0
go: downloading github.com/ghodss/yaml v1.0.0
go: downloading golang.org/x/sys v0.0.0-20190922100055-0a153f010e69
go: downloading golang.org/x/time v0.0.0-20190308202827-9d24e82272b4
go: downloading github.com/docker/docker v0.7.3-0.20190327010347-be7ac8be2ae0
go: downloading github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7
go: downloading github.com/russross/blackfriday v1.5.2
go: downloading github.com/docker/spdystream v0.0.0-20160310174837-449fdfce4d96
go: downloading github.com/MakeNowJust/heredoc v0.0.0-20170808103936-bb23615498cd
go: downloading github.com/posener/script v1.0.4
go: downloading github.com/liggitt/tabwriter v0.0.0-20181228230101-89fcab3d43de
go: downloading github.com/peterbourgon/diskv v2.0.1+incompatible
go: downloading github.com/hashicorp/golang-lru v0.5.1
go: downloading github.com/google/go-cmp v0.3.0
go: downloading github.com/google/btree v1.0.0
go: downloading github.com/Azure/go-ansiterm v0.0.0-20170929234023-d6e3b3328b78
go: downloading github.com/sirupsen/logrus v1.4.2
go: downloading github.com/konsorten/go-windows-terminal-sequences v1.0.1
/sriov//bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/sriovibnetworks.sriovnetwork.openshift.io created
customresourcedefinition.apiextensions.k8s.io/sriovnetworknodepolicies.sriovnetwork.openshift.io created
customresourcedefinition.apiextensions.k8s.io/sriovnetworknodestates.sriovnetwork.openshift.io created
customresourcedefinition.apiextensions.k8s.io/sriovnetworks.sriovnetwork.openshift.io created
customresourcedefinition.apiextensions.k8s.io/sriovoperatorconfigs.sriovnetwork.openshift.io created
/bin/sh: 1: Syntax error: "(" unexpected
Makefile:169: recipe for target 'deploy-setup' failed
make: *** [deploy-setup] Error 2

Request: Option to use self signed webhooks when using "make deploy-setup-k8s"

We are using make deploy-setup-k8s
Until now we used sriov-operator 4.4 and created a self signed CaBundles,
and then patched the 3 webhooks to have the CaBundles:
validatingwebhookconfiguration sriov-operator-webhook-config
mutatingwebhookconfiguration sriov-operator-webhook-config
mutatingwebhookconfiguration network-resources-injector-config
All went fine.

When we tried to use sriov-operator version 4.8, we saw that the CaBundle is removed after a minute or two.
The reason is that it has now a owner which reconcile it

func (r *SriovOperatorConfigReconciler) syncCAConfigMap(name types.NamespacedName) error {

Even if we tried to inject the CaBundle to the configmap in the code above,
the configmap as well was reconciled.

It will be great to have please a method that will allow us to use self signed certs on k8s installation,
without the need to disable the webhook.

Thanks

See k8snetworkplumbingwg/sriov-network-operator#3

/cc @zshi-redhat

Device ID selector is not working correctly

Adding a deviceID selector to the node policy, with VF device ID, will not filter out all the PFs, this is related to the hardcoded supported device and the filtering mechanism

var SriovPfVfMap = map[string](string){
        "1583": "154c",
        "10fb": "10ed",
        "1015": "1016",
        "1017": "1018",
}
if s.NicSelector.DeviceID != "" {
                if ((s.NumVfs == 0 && s.NicSelector.DeviceID != iface.DeviceID) || (s.NumVfs > 0 && s.NicSelector.DeviceID != SriovPfVfMap[iface.DeviceID])) {
                        return false
                }
        }

I tried the following node policy but got no changes

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-2
  namespace: sriov-network-operator
spec:
  resourceName: mlxnics
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 4
  mtu: 9000
  numVfs: 8
  nicSelector:
    vendor: "15b3"
    deviceID: "1018"
    rootDevices: ['0000:04:00.0', '0000:04:00.1']
  deviceType: netdevice
  isRdma: true

Cannot config numVfs for mellanox NICs on RHCOS node

https://bugzilla.redhat.com/show_bug.cgi?id=1733897

Description of problem:
When created Sriov Network Node Policy with vondor is 15b3, the VF cannot be initialized.

Version-Release number of selected component (if applicable):

How reproducible:
always

Steps to Reproduce:

  1. setup the baremetal env
  2. installed the sriov operator
  3. Create the Sriov Network Node Policy on mellanox PF
  4. check the nodestatus
    oc get sriovnetworknodestates.sriovnetwork.openshift.io -o yaml
  5. check the sriov daemon logs

Actual results:
4. no 'Vfs' is created
5. oc logs sriov-network-config-daemon-pc8gl
daemon logs:
I0729 03:56:58.218766 15417 mellanox_plugin.go:59] mellanox-plugin OnNodeStateAdd()
I0729 03:56:58.218800 15417 mellanox_plugin.go:66] mellanox-Plugin OnNodeStateChange()
I0729 03:56:58.218813 15417 mellanox_plugin.go:267] mellanox-plugin isMlnxNicAndInNode(): device 0000:5e:00.0
I0729 03:56:58.218823 15417 mellanox_plugin.go:181] mellanox-plugin getMlnxNicFwData(): for device 0000:5e:00.0
I0729 03:56:58.218828 15417 mellanox_plugin.go:252] mellanox-plugin isSinglePortNic(): device 0000:5e:00.0
I0729 03:56:58.218831 15417 mellanox_plugin.go:157] mellanox-plugin mstconfigReadData(): try to read [LINK_TYPE] for device 0000:5e:00.0
I0729 03:56:58.218854 15417 mellanox_plugin.go:169] mellanox-plugin runCommand(): mstconfig [-d 0000:5e:00.0 q LINK_TYPE]
I0729 03:56:58.225057 15417 writer.go:107] setNodeStateStatus(): syncStatus: InProgress, lastSyncError:
E0729 03:56:58.235747 15417 mellanox_plugin.go:163] mellanox-plugin mstconfigReadData(): failed : exit status 3 : -E- Failed to open the device
I0729 03:56:58.235796 15417 mellanox_plugin.go:157] mellanox-plugin mstconfigReadData(): try to read [LINK_TYPE_P2] for device 0000:5e:00.0
I0729 03:56:58.235819 15417 mellanox_plugin.go:169] mellanox-plugin runCommand(): mstconfig [-d 0000:5e:00.0 q LINK_TYPE_P2]
E0729 03:56:58.244693 15417 mellanox_plugin.go:163] mellanox-plugin mstconfigReadData(): failed : exit status 3 : -E- Failed to open the device
E0729 03:56:58.244779 15417 daemon.go:147] nodeStateAddHandler(): plugin mellanox_plugin error: exit status 3
I0729 03:56:58.244822 15417 daemon.go:240] nodeStateChangeHandler(): Interface not changed
W0729 03:56:58.244845 15417 daemon.go:115] Got an error: exit status 3
E0729 03:56:58.244916 15417 start.go:105] failed to run daemon: exit status 3

Expected results:

VF for mellanox can be worked

No synchronization between nodes in case of reboot

In case the change of the NetworkNodePolicy trigger a reboot of the node (i.e. to add the needed kernel parameters, or to re-set a mellanox card), the reboot of the node is immediate.

This risks to jeopardize a zone if the same policy is applied at the same time to all the nodes.
The reboot should be gradual and involve a small group of nodes at the same time, while keeping the zone functioning.

Operator doesn't work as expected when specifying "deviceID" selector in node policy

Hello,
Env
Centos 7.8
Vanilla k8 1.17
Admission controller turned off

I was deploying the operator and following the quick start guide.
When I was creating a node policy with a deviceID selector, it did not work as expected. When I set the deviceID as the targets PF deviceID, I could see the VFs being created, however sriov device plugin failed to find any devices for the resource pool.

When I remove deviceID field, everything works as expected. Is this expected behavior?

Thank you,

Martin

Request: Allow developers to use unsupported SRIOV NICs without disabling the webhook

Since the following commit
c3132b2#diff-8fd5d83d7413d77ee2eab8e5c1a2ea1623d63e2aec69017a95811db9ad9edf52R70

Clusters that have unsupported SRIOV NICs are required to disable the webhook as documented in the Note here:
https://github.com/openshift/sriov-network-operator/blob/master/doc/quickstart.md

We would need please, if possible, a method that would allow us to run the sriov-network-operator with unsupported SRIOV NIC, but without disabling the webhook.
Since the webhook validates several parameters, we prefer to keep it.

Thanks

Network config daemon crashing with Mellanox ConnectX4LX

This seems related to #43 but the error message is slightly different.

I am trying to setup our kubevirt CI to use the sriov operator instead of setting up manually. The tests run using a kind cluster but I don't think this is relevant since the output of the mstconfig tool is the same on the bare metal host.

I am testing with a mellanox ConnectX4LX card:

mstconfig -d 0000:05:00.1 q 

Device #1:
----------

Device type:    ConnectX4LX     
Name:           N/A             
Description:    N/A             
Device:         0000:05:00.1 

Which do not happen to have the LINK_TYPE properties nor LINK_TYPE_P1:

[root@zeus08 kubevirt]# mstconfig -d 0000:05:00.1 q | grep LINK
         KEEP_ETH_LINK_UP_P1                 True(1)         
         KEEP_IB_LINK_UP_P1                  False(0)        
         KEEP_LINK_UP_ON_BOOT_P1             False(0)        
         KEEP_LINK_UP_ON_STANDBY_P1          False(0)        
         KEEP_ETH_LINK_UP_P2                 True(1)         
         KEEP_IB_LINK_UP_P2                  False(0)        
         KEEP_LINK_UP_ON_BOOT_P2             False(0)        
         KEEP_LINK_UP_ON_STANDBY_P2          False(0) 

This is causing the sriov-network-config-daemon to crash because isSinglePortNic is returning an error .

config daemon logs:

I0731 16:50:14.687456    6475 writer.go:107] setNodeStateStatus(): syncStatus: InProgress, lastSyncError: 
E0731 16:50:14.744306    6475 mellanox_plugin.go:163] mellanox-plugin mstconfigReadData(): failed : exit status 3 : 
Device #1:
----------

Device type:    ConnectX4LX     
Name:           N/A             
Description:    N/A             
Device:         0000:05:00.0    

Configurations:                              Next Boot
-E- Unknown Parameter: LINK_TYPE
I0731 16:50:14.744344    6475 mellanox_plugin.go:157] mellanox-plugin mstconfigReadData(): try to read [LINK_TYPE_P2] for device 0000:05:00.0
I0731 16:50:14.744363    6475 mellanox_plugin.go:169] mellanox-plugin runCommand(): mstconfig [-d 0000:05:00.0 q LINK_TYPE_P2]
E0731 16:50:14.819727    6475 mellanox_plugin.go:163] mellanox-plugin mstconfigReadData(): failed : exit status 3 : 
Device #1:
----------

Device type:    ConnectX4LX     
Name:           N/A             
Description:    N/A             
Device:         0000:05:00.0    

Configurations:                              Next Boot
-E- The Device doesn't support LINK_TYPE_P2 parameter
E0731 16:50:14.819765    6475 daemon.go:147] nodeStateAddHandler(): plugin mellanox_plugin error: exit status 3
I0731 16:50:14.819810    6475 daemon.go:240] nodeStateChangeHandler(): Interface not changed

Issue with unsupported NICs such as the X557 which share the same VF device ID as supported cards

In #133, I added support for unsupported NIC models. It turns out that a few unsupported models such as the X557 share the same VF 154c device ID.

The following configmap for the Intel X557 is never annotated and webhook validation fails:

apiVersion: v1
data:
X557: 8086 1589 154c
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"X557":"8086 1589 154c"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"unsupported-nic-ids","namespace":"sriov-network-operator"}}
creationTimestamp: "2020-12-17T08:00:20Z"
name: unsupported-nic-ids
namespace: sriov-network-operator
resourceVersion: "1949"
selfLink: /api/v1/namespaces/sriov-network-operator/configmaps/unsupported-nic-ids
uid: 79857698-2fce-4d05-bcad-2493bd66d80e

var NicIdMap = []string{

We write only the VF code into the configmap for the udevrule, and only annotate the configmap if the udev rule on disk changes. Given that VF device code 154c already exists, the 2 strings are identical and the operator does not annotate the configmap. That part was made to avoid that we always rewrite the udev rule and also to avoid that we constantly rewrite the configmap.

On the other hand, the supported vendor model function checks the vendor ID and the PF ID:

if sriovnetworkv1.IsSupportedModel(iface.Vendor, iface.DeviceID) {

	// check the vendor/device ID to make sure only devices in supported list are allowed.
	if sriovnetworkv1.IsSupportedModel(iface.Vendor, iface.DeviceID) {
		return true
	}

The implicit assumption was that there is a unique VF vendor ID per vendor ID / PF device ID combination - and that's not the case.

As a solution, at the moment, the easiest that I see is to add a comment to the udev file, listing all devices which are supported, and making sure that the file content is identical.

Another thing to ask ourselves ... is it possible to have vendor = a , pf = b, vf = c and vendor = a, pf =b and vf = d? In that case, we might have an issue with the logic of the IsSupportedModel at that location and might rather check for IsSupportedVF or something like that.

  • Andreas

Operator is not passing the spec to the vendor plugins when the number of VFs is 0

When create a policy:

[root@r-cloudx2-01 sriov-network-operator]# cat policy4-1.yaml 
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy4-1
  namespace: sriov-network-operator
spec:
  nodeSelector:
    kubernetes.io/hostname: r-cloudx2-04
  resourceName: mlxnics41
  priority: 11
  mtu: 9000
  numVfs: 21
  nicSelector:
    vendor: "15b3"
    pfNames: ['enp3s0f1']
    rootDevices: ['0000:03:00.1']
  deviceType: netdevice
  isRdma: true

The plugins run and do their work, also when update the numVfs to any positive value, but when change it to 0 the plugin doesn't get the state.Spec.

I can see that the generic plugin is reacting to the change to 0 and update the number of the VF, but the vendor plugins are not getting this change.

I0518 16:23:56.028619   16047 intel_plugin.go:43] intel-plugin OnNodeStateChange()
I0518 16:23:56.028698   16047 mellanox_plugin.go:71] mellanox-Plugin OnNodeStateChange()
I0518 16:23:56.028747   16047 mellanox_plugin.go:80] mellanox-Plugin OnNodeStateChange() state &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:r-cloudx2-04 GenerateName: Namespace:sriov-network-operator SelfLink:/apis/sriovnetwork.openshift.io/v1/namespaces/sriov-network-operator/sriovnetworknodestates/r-cloudx2-04 UID:64b22c46-399a-438f-b25d-c31fee727987 ResourceVersion:1777319 Generation:48 CreationTimestamp:2020-05-18 07:48:12 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[{APIVersion:sriovnetwork.openshift.io/v1 Kind:SriovNetworkNodePolicy Name:default UID:9566b241-fd4a-4260-b147-a1cb42227606 Controller:0xc000ac94ba BlockOwnerDeletion:0xc000ac94b9}] Finalizers:[] ClusterName: ManagedFields:[{Manager:sriov-network-config-daemon Operation:Update APIVersion:sriovnetwork.openshift.io/v1 Time:2020-05-18 16:19:55 +0000 UTC FieldsType:FieldsV1 FieldsV1:&FieldsV1{Raw:*[123 34 102 58 115 116 97 116 117 115 34 58 123 34 46 34 58 123 125 44 34 102 58 105 110 116 101 114 102 97 99 101 115 34 58 123 125 44 34 102 58 115 121 110 99 83 116 97 116 117 115 34 58 123 125 125 125],}} {Manager:sriov-network-operator Operation:Update APIVersion:sriovnetwork.openshift.io/v1 Time:2020-05-18 16:23:55 +0000 UTC FieldsType:FieldsV1 FieldsV1:&FieldsV1{Raw:*[123 34 102 58 109 101 116 97 100 97 116 97 34 58 123 34 102 58 111 119 110 101 114 82 101 102 101 114 101 110 99 101 115 34 58 123 34 46 34 58 123 125 44 34 107 58 123 92 34 117 105 100 92 34 58 92 34 57 53 54 54 98 50 52 49 45 102 100 52 97 45 52 50 54 48 45 98 49 52 55 45 97 49 99 98 52 50 50 50 55 54 48 54 92 34 125 34 58 123 34 46 34 58 123 125 44 34 102 58 97 112 105 86 101 114 115 105 111 110 34 58 123 125 44 34 102 58 98 108 111 99 107 79 119 110 101 114 68 101 108 101 116 105 111 110 34 58 123 125 44 34 102 58 99 111 110 116 114 111 108 108 101 114 34 58 123 125 44 34 102 58 107 105 110 100 34 58 123 125 44 34 102 58 110 97 109 101 34 58 123 125 44 34 102 58 117 105 100 34 58 123 125 125 125 125 44 34 102 58 115 112 101 99 34 58 123 34 46 34 58 123 125 44 34 102 58 100 112 67 111 110 102 105 103 86 101 114 115 105 111 110 34 58 123 125 125 125],}}]} Spec:{DpConfigVersion:1777315 Interfaces:[]} Status:{Interfaces:[{Name:eno1 Mac: Driver:igb PciAddress:0000:01:00.0 Vendor:8086 DeviceID:1521 Mtu:1500 NumVfs:0 LinkSpeed: TotalVfs:7 VFs:[]} {Name:eno2 Mac: Driver:igb PciAddress:0000:01:00.1 Vendor:8086 DeviceID:1521 Mtu:1500 NumVfs:0 LinkSpeed: TotalVfs:7 VFs:[]} {Name:enp3s0f0 Mac: Driver:mlx5_core PciAddress:0000:03:00.0 Vendor:15b3 DeviceID:1017 Mtu:1500 NumVfs:0 LinkSpeed: TotalVfs:21 VFs:[]} {Name:enp3s0f1 Mac: Driver:mlx5_core PciAddress:0000:03:00.1 Vendor:15b3 DeviceID:1017 Mtu:1500 NumVfs:21 LinkSpeed: TotalVfs:21 VFs:[{Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:02.7 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1450 VfID:0} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:03.0 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:1} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:04.1 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:10} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:04.2 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:11} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:04.3 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:12} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:04.4 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:13} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:04.5 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:14} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:04.6 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:15} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:04.7 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1450 VfID:16} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:05.0 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:17} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:05.1 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:18} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:05.2 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:19} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:03.1 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:2} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:05.3 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:20} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:03.2 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1450 VfID:3} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:03.3 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:4} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:03.4 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:5} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:03.5 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1450 VfID:6} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:03.6 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:7} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:03.7 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:8} {Name: Mac: Assigned: Driver:mlx5_core PciAddress:0000:03:04.0 Vendor:15b3 DeviceID:1018 Vlan:0 Mtu:1500 VfID:9}]} {Name:enp129s0 Mac: Driver:mlx4_core PciAddress:0000:81:00.0 Vendor:15b3 DeviceID:1007 Mtu:1500 NumVfs:0 LinkSpeed: TotalVfs:8 VFs:[]}] SyncStatus:Succeeded LastSyncError:}}
I0518 16:23:56.029255   16047 mellanox_plugin.go:178] mellanox-plugin needDrain false needReboot false
I0518 16:23:56.029285   16047 mellanox_plugin.go:179] mellanox-plugin attributes to change: map[]
I0518 16:23:56.029356   16047 generic_plugin.go:80] generic-plugin OnNodeStateChange()
I0518 16:23:56.029422   16047 daemon.go:329] nodeStateChangeHandler(): reqDrain true, reqReboot false
I0518 16:23:56.029446   16047 daemon.go:332] nodeStateChangeHandler(): drain node
I0518 16:23:56.029455   16047 daemon.go:507] drainNode(): Update prepared

Operator keep draining nodes

When deploying the operator then change the NumOfVfs the operator keeps removing pods, and then they get re-deployed again (Daemonsets pods), I noticed that there VFs being created and removed all the time.

Raising a meaningful error message if iommu is not enabled in bios

When configuring a device to use the vfio driver, if IOMMU is not enabled in the bios of the machine, the resulting SriovNetworkNodeState will report a failure along the line of write error: No such device when trying to bind a device to the vfio driver.

My suggestion here is to check if iommu is enabled (by checking the content of /sys/class/iommu`, and provide a more descriptive error message, so that the admin can understand how to fix the error.

[doc] quickstart.md needs to be updated

The quickstart guide should reflect the latest commit to the Makefile:

The section here is no longer required and will only output an error that the rule is missing (which it now is)

Install the Operator-SDK. The following commands will put operator-sdk to your $GOPATH/bin, please make sure that path is included in your $PATH.

cd $GOPATH/src/github.com/openshift/sriov-network-operator
make operator-sdk

This is the error:

[root@openshift-jumpserver-0 sriov-network-operator]# make operator-sdk
which: no controller-gen in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin)
which: no kustomize in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/go/bin:/root/golang/bin)
make: *** No rule to make target 'operator-sdk'.  Stop.

Race condition in udev makes NM ignore rules in /host/etc/udev/rules.d/10-nm-unmanaged.rules

Race condition in udev makes NM ignore rules in /host/etc/udev/rules.d/10-nm-unmanaged.rules

Add "move" event to the list of ACTIONs, e.g. in case of Mellanox change /etc/udev/rules.d/10-nm-unmanaged.rules from:

ACTION=="add|change", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1"

To:

ACTION=="add|change|move", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1"

SRIOV-CNI failed to load netconf

I tried to use SR-IOV on OKD 4.5 GA version.
Installation and execution of SR-IOV Plug-in was succeed and I wrote the yaml for SR-IOV network as below.
(Use NIC which is SR-IOV Feature available, but not supported on OpenShift officially.)

image

image

After that, attach the SR-IOV to sample Pod but error occurred as below:
image

I do not understand why It try to get VF information from other NIC which is not I configured.

  • Configured NIC : 0000:04:00.0
  • Try to get from : 0000:05:00.1

Also one more thing is that I could not see "intel.com/intel-nics" when I execute command below:
$ kubectl get no -o json | jq -r '[.items[] | {name:.metadata.name, allocable:.status.allocatable}]'

image

Do anyone have experience about this?

Building plugins fails after adding new dependency packages

I add a new dependency package github.com/Mellanox/sriovnet to support switchdev mode for mellanox cards but each time I try to build the plugins I get this error

[root@r-cloudx3-07 sriov-network-operator]# make plugins
Using version from git...
Building github.com/openshift/sriov-network-operator/pkg/plugins (f1cd9be5-dirty)
Using version from git...
Building github.com/openshift/sriov-network-operator/pkg/plugins (f1cd9be5-dirty)
../../mellanox/sriovnet/sriovnet_switchdev.go:11:2: cannot find package "github.com/Mellanox/sriovnet/pkg/utils/filesystem" in any of:
        /usr/local/go/src/github.com/Mellanox/sriovnet/pkg/utils/filesystem (from $GOROOT)
        /root/go/src/github.com/Mellanox/sriovnet/pkg/utils/filesystem (from $GOPATH)
make: *** [_plugin-mellanox] Error 1

I updated the dependencies with dep ensure --add github.com/Mellanox/sriovnet
Same thing when building the config daemon image

Operator deployment cannot create NodeState dicovery configuration

Deployment operator on vanilla K8s cluster failed without creating sriovnetworknodestates configurations for each worker nodes in K8s cluster.

Deployment stuck on state:

kubectl -n sriov-network-operator get all
NAME                                          READY   STATUS    RESTARTS   AGE
pod/sriov-network-operator-69465dd48f-zj4fv   1/1     Running   0          12m

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sriov-network-operator   1/1     1            1           12m

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/sriov-network-operator-69465dd48f   1         1         1       12m

Created empty configuration files:

kubectl  -n sriov-network-operator get sriovnetworknodestates.sriovnetwork.openshift.io 
NAME    AGE
node2   12m
node3   12m
node4   12m
node5   12m

Config for node2:

kubectl  -n sriov-network-operator get sriovnetworknodestates.sriovnetwork.openshift.io node2 -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  creationTimestamp: "2020-11-08T13:06:22Z"
  generation: 2
  managedFields:
  - apiVersion: sriovnetwork.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .: {}
          k:{"uid":"5f271b74-adfa-4958-bf1f-49d1c8981b08"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        .: {}
        f:dpConfigVersion: {}
      f:status: {}
    manager: sriov-network-operator
    operation: Update
    time: "2020-11-08T13:16:22Z"
  name: node2
  namespace: sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
    uid: 5f271b74-adfa-4958-bf1f-49d1c8981b08
  resourceVersion: "1672926"
  selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/sriov-network-operator/sriovnetworknodestates/node2
  uid: 88b19ee6-1b7d-4e32-88f9-901e04ee1f8f
spec:
  dpConfigVersion: "1670976"

Logs file from operator pod is attached
sriov-operator.log

Thanks

Failed to apply default image tag "quay.io/openshift/origin-sriov-network-operator@": couldn't parse image reference "quay.io/openshift/origin-sriov-network-operator@": invalid reference format

If skopeo is not installed on the system the build succeed but the image name is incorrect and can't be pulled:

$ make deploy-setup
hack/env.sh: line 1: skopeo: command not found
hack/env.sh: line 3: skopeo: command not found
hack/env.sh: line 5: skopeo: command not found
hack/env.sh: line 7: skopeo: command not found
hack/env.sh: line 9: skopeo: command not found
hack/env.sh: line 11: skopeo: command not found
hack/env.sh: line 13: skopeo: command not found
~/go/src/github.com/openshift/sriov-network-operator/deploy ~/go/src/github.com/openshift/sriov-network-operator
namespace/openshift-sriov-network-operator created
...

$ oc get events
LAST SEEN TYPE REASON OBJECT MESSAGE
Normal Scheduled pod/sriov-network-operator-77cbd797b6-9hkhw Successfully assigned openshift-sriov-network-operator/sriov-network-operator-77cbd797b6-9hkhw to master-0.clus1.t5g.lab.eng.bos.redhat.com
4m12s Warning InspectFailed pod/sriov-network-operator-77cbd797b6-9hkhw Failed to apply default image tag "quay.io/openshift/origin-sriov-network-operator@": couldn't parse image reference "quay.io/openshift/origin-sriov-network-operator@": invalid reference format
7m31s Warning Failed pod/sriov-network-operator-77cbd797b6-9hkhw Error: InvalidImageName
9m25s Normal SuccessfulCreate replicaset/sriov-network-operator-77cbd797b6 Created pod: sriov-network-operator-77cbd797b6-9hkhw
9m25s Normal ScalingReplicaSet deployment/sriov-network-operator Scaled up replica set sriov-network-operator-77cbd797b6 to 1

hack/env.sh
...
export SRIOV_NETWORK_OPERATOR_IMAGE=${SRIOV_NETWORK_OPERATOR_IMAGE:-quay.io/openshift/origin-sriov-network-operator@${OPERATOR_IMAGE_DIGEST}}

I guess is the same for the rest of the images on the env.sh file.

Operator is not deployed on kubernetes due to failure in deploying webhooks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.