Giter Club home page Giter Club logo

Comments (21)

bparees avatar bparees commented on July 4, 2024 1

I think approach (a) is reasonable in this case, but keep in mind that it means that the kf notebook controller now loses the ability to update the image to a new value (such as it might want to do during an upgrade). Unless your plan is that the resource would always be defined with the imagetrigger annotation, in which case upgrades would be handled by updating the imagestream itself and letting the trigger flow through to rollout the workload resource. But now you've tied your logic to openshift behavior (the imagetrigger annotation and imagestream resource, which aren't things that exist on vanilla k8s)

So you need to fully think through who should be controlling the value of the image field, and where this operator/controller will be used.

One way to approach it would be for the kf controller to manage the image field value if, and only if, there is no image.openshift.io/triggers annotation on the resource. The presence of that annotation would be an indication that the user has chosen an alternate way to manage the image field value, and the kf notebook controller should defer to that alternate source of image values. Unfortunately this means adding "openshift specific/aware" logic to the controller, which i understand might be undesirable, but the good news is the logic would be completely portable to vanilla k8s (since it's just looking at annotation keys)

And then if there is not such an annotation, the kf notebook controller can directly set the image field, allowing it to update the image field to new values during upgrades of the kf notebook controller.

This would allow a user to update the workload resource to be imagetriggered based on an imagestream of their choice, when they want that, but if they didn't want it (or weren't on openshift), they'd get whatever standard image the kf notebook controller specifies/defaults.

Again, ultimately this really comes down to a philosophy of how the image value should be managed, and by whom. Is it a default provided by the notebook, but intended to be modified/controlled by the user? Or is it something that's intended to be tightly specified/controlled by the notebook? That will guide you as to how best to implement the behavior and what controls to offer users.

(e.g. another option would be to introduce an entirely new annotation that tells the controller whether or not to manage the image field. Then users could disable controller management of the image field for both that case of "i want the imagetrigger to manage it" but also "i just want to manually set it to something else for some reason")

from openshift-apiserver.

bparees avatar bparees commented on July 4, 2024

yes, it should work. Can you provide the exact steps(oc commands, in order) you are taking along w/ your yaml files for the imagestream and statefulset?

Due to some limitations in the admission logic, i think it's critical that your imagestream exist and be successfully imported before you create the statefulset. At least depending what mechanism you're using (which will be clearer if we can see your statefulset yaml and imagestream).

i'm guessing you've already been here, but here are the docs on these features:
https://docs.openshift.com/container-platform/4.11/openshift_images/using-imagestreams-with-kube-resources.html

https://docs.openshift.com/container-platform/4.11/openshift_images/triggering-updates-on-imagestream-changes.html

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

Hi Ben, thank you for grabbing this topic, I really appreciate it.

Imagestream is applied by a kustomize-style yaml files deployment. Statefulset is created by Kubeflow Notebooks Controller as a result of a Notebook CR being present.

The imagestream exists and tag has been successfully imported at created: '2022-11-21T13:14:31Z'. We do not use an the internal openshift docker registry.

The statefulset references imagestream name and tag in its container image field. Its creation timestamp is way past the imagestream one creationTimestamp: '2022-12-07T16:54:40Z'

We are getting the following events on the pod, which is why I got to the bugzilla link, access.redhat link and so on in the first message of this issue here.

 [Warning] Failed to pull image "s2i-generic-data-science-notebook:v0.0.5": rpc error: code = Unknown desc = reading manifest v0.0.5 in docker.io/library/s2i-generic-data-science-notebook: errors: denied: requested access to the resource is denied unauthorized: authentication required
jupyter-nb-sthoms-0
Namespace odhsven
Dec 8, 2022, 5:21 AM
Generated from kubelet xxxx-ostlxxx.ocp4dev.xxxx.mydomain.ch
138 times in the last 14 hours
Failed to pull image "s2i-generic-data-science-notebook:v0.0.5": rpc error: code = Unknown desc = reading manifest v0.0.5 in docker.io/library/s2i-generic-data-science-notebook: errors: denied: requested access to the resource is denied unauthorized: authentication required

I am referencing the yamls below. The problem occurs on OCP 4.10.3 and also on the latest DEV versions 4.13.x. Imagestream and StatefulSet are in the same namespace. Imagestream, when referenced by DeploymentConfigs, Deployments, and Pods in the image-field, resolves to the src location of the imagestream-tag. In those types of objects, the resolve and image pull through is working fine, just not with StatefulSet.

There are two containers in the pod, disregard the oauth-one, please, we are trying to instead use the central openshift imagestream for that some day, too :-)

kind: ImageStream
apiVersion: image.openshift.io/v1
metadata:
  annotations:
    kfctl.kubeflow.io/kfdef-instance: opendatahub.odhsven
    opendatahub.io/notebook-image-desc: >-
      Jupyter notebook image with a set of data science libraries that advanced
      AI/ML notebooks will use as a base image to provide a standard for
      libraries avialable in all notebooks
    opendatahub.io/notebook-image-name: Standard Data Science
    opendatahub.io/notebook-image-url: 'https://github.com/thoth-station/s2i-generic-data-science-notebook'
    openshift.io/image.dockerRepositoryCheck: '2022-11-21T13:14:31Z'
  resourceVersion: '804058030'
  name: s2i-generic-data-science-notebook
  uid: 9be25400-03ae-4672-9e4d-06e6d31ebd07
  creationTimestamp: '2022-11-21T13:14:30Z'
  generation: 2
  namespace: odhsven
  labels:
    app.kubernetes.io/part-of: jupyterhub
    component.opendatahub.io/name: jupyterhub
    opendatahub.io/component: 'true'
    opendatahub.io/notebook-image: 'true'
spec:
  lookupPolicy:
    local: true
  tags:
    - name: v0.0.5
      annotations:
        opendatahub.io/notebook-python-dependencies: >-
          [{"name":"Boto3","version":"1.17.11"},{"name":"Kafka-Python","version":"2.0.2"},{"name":"Matplotlib","version":"3.4.2"},{"name":"Numpy","version":"1.21.0"},{"name":"Pandas","version":"1.2.5"},{"name":"Scipy","version":"1.7.0"}]
        opendatahub.io/notebook-software: '[{"name":"Python","version":"v3.8.6"}]'
        openshift.io/imported-from: quay.io/thoth-station/s2i-generic-data-science-notebook
      from:
        kind: DockerImage
        name: 'quay.io/thoth-station/s2i-generic-data-science-notebook:v0.0.5'
      generation: 2
      importPolicy: {}
      referencePolicy:
        type: Source
status:
  dockerImageRepository: ''
  tags:
    - tag: v0.0.5
      items:
        - created: '2022-11-21T13:14:31Z'
          dockerImageReference: >-
            quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
          image: >-
            sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
          generation: 2
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: jupyter-nb-sthoms
  namespace: odhsven
  uid: f3692652-6718-40fc-909f-355eb22b5002
  resourceVersion: '829380751'
  generation: 1
  creationTimestamp: '2022-12-07T16:54:40Z'
  ownerReferences:
    - apiVersion: kubeflow.org/v1beta1
      kind: Notebook
      name: jupyter-nb-sthoms
      uid: 15d89396-c670-4c36-b0c9-5872fd040811
      controller: true
      blockOwnerDeletion: true
spec:
  replicas: 1
  selector:
    matchLabels:
      statefulset: jupyter-nb-sthoms
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: jupyter-nb-sthoms
        notebook-name: jupyter-nb-sthoms
        opendatahub.io/odh-managed: 'true'
        opendatahub.io/user: sthoms
        statefulset: jupyter-nb-sthoms
    spec:
      restartPolicy: Always
      serviceAccountName: jupyter-nb-sthoms
      schedulerName: default-scheduler
      enableServiceLinks: false
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                  - key: nvidia.com/gpu.present
                    operator: NotIn
                    values:
                      - 'true'
      terminationGracePeriodSeconds: 30
      securityContext: {}
      containers:
        - resources:
            limits:
              cpu: '2'
              memory: 8Gi
            requests:
              cpu: '1'
              memory: 8Gi
          readinessProbe:
            httpGet:
              path: /notebook/odhsven/jupyter-nb-sthoms/api
              port: notebook-port
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: jupyter-nb-sthoms
          livenessProbe:
            httpGet:
              path: /notebook/odhsven/jupyter-nb-sthoms/api
              port: notebook-port
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          env:
            - name: NOTEBOOK_ARGS
              value: |-
                --ServerApp.port=8888
                                  --ServerApp.token=''
                                  --ServerApp.password=''
                                  --ServerApp.base_url=/notebook/odhsven/jupyter-nb-sthoms
                                  --ServerApp.quit_button=False
                                  --ServerApp.tornado_settings={"user":"sthoms","hub_host":"https://odh-dashboard-odhsven.apps.ocp4dev.xxxx.mydomain.ch","hub_prefix":"/notebookController/sthoms"}
            - name: JUPYTER_IMAGE
              value: 'quay.io/thoth-station/s2i-generic-data-science-notebook:v0.0.5'
            - name: JUPYTER_NOTEBOOK_PORT
              value: '8888'
            - name: NB_PREFIX
              value: /notebook/odhsven/jupyter-nb-sthoms
          ports:
            - name: notebook-port
              containerPort: 8888
              protocol: TCP
          imagePullPolicy: Always
          volumeMounts:
            - name: jupyterhub-nb-sthoms-pvc
              mountPath: /opt/app-root/src
          terminationMessagePolicy: File
          image: 's2i-generic-data-science-notebook:v0.0.5'
          workingDir: /opt/app-root/src
        - resources:
            limits:
              cpu: 100m
              memory: 64Mi
            requests:
              cpu: 100m
              memory: 64Mi
          readinessProbe:
            httpGet:
              path: /oauth/healthz
              port: oauth-proxy
              scheme: HTTPS
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: oauth-proxy
          livenessProbe:
            httpGet:
              path: /oauth/healthz
              port: oauth-proxy
              scheme: HTTPS
            initialDelaySeconds: 30
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          env:
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
          ports:
            - name: oauth-proxy
              containerPort: 8443
              protocol: TCP
          imagePullPolicy: Always
          volumeMounts:
            - name: oauth-config
              mountPath: /etc/oauth/config
            - name: tls-certificates
              mountPath: /etc/tls/private
          terminationMessagePolicy: File
          image: 'registry.redhat.io/openshift4/ose-oauth-proxy:v4.10'
          args:
            - '--provider=openshift'
            - '--https-address=:8443'
            - '--http-address='
            - '--openshift-service-account=jupyter-nb-sthoms'
            - '--cookie-secret-file=/etc/oauth/config/cookie_secret'
            - '--cookie-expire=24h0m0s'
            - '--tls-cert=/etc/tls/private/tls.crt'
            - '--tls-key=/etc/tls/private/tls.key'
            - '--upstream=http://localhost:8888'
            - '--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
            - >-
              --skip-auth-regex=^(?:/notebook/$(NAMESPACE)/jupyter-nb-sthoms)?/api$
            - '--email-domain=*'
            - '--skip-provider-button'
            - >-
              --openshift-sar={"verb":"get","resource":"notebooks","resourceAPIGroup":"kubeflow.org","resourceName":"jupyter-nb-sthoms","namespace":"$(NAMESPACE)"}
            - >-
              --logout-url=https://odh-dashboard-odhsven.apps.ocp4dev.xxxx.mydomain.ch/notebookController/sthoms/home
      serviceAccount: jupyter-nb-sthoms
      volumes:
        - name: jupyterhub-nb-sthoms-pvc
          persistentVolumeClaim:
            claimName: jupyterhub-nb-sthoms-pvc
        - name: oauth-config
          secret:
            secretName: jupyter-nb-sthoms-oauth-config
            defaultMode: 420
        - name: tls-certificates
          secret:
            secretName: jupyter-nb-sthoms-tls
            defaultMode: 420
      dnsPolicy: ClusterFirst
  serviceName: ''
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 0
  revisionHistoryLimit: 10
status:
  observedGeneration: 1
  replicas: 1
  currentReplicas: 1
  updatedReplicas: 1
  currentRevision: jupyter-nb-sthoms-8578f7d994
  updateRevision: jupyter-nb-sthoms-8578f7d994
  collisionCount: 0
  availableReplicas: 0

It does not really matter, too, if it is that particular StatefulSet or one created ad-hoc by me, the behavior and bug is always the same. This is problematic because KF Notebook Controller instantiation of pods is based on StatefuSet, not Deployment or DeploymentConfig or Pod directly.

from openshift-apiserver.

bparees avatar bparees commented on July 4, 2024

I attempted to create this behavior and could not. Here's my version of the yamls (just stripped the status fields and the fields that are populated at creation time, and stripped the NS):

kind: ImageStream
apiVersion: image.openshift.io/v1
metadata:
  annotations:
    kfctl.kubeflow.io/kfdef-instance: opendatahub.odhsven
    opendatahub.io/notebook-image-desc: >-
      Jupyter notebook image with a set of data science libraries that advanced
      AI/ML notebooks will use as a base image to provide a standard for
      libraries avialable in all notebooks
    opendatahub.io/notebook-image-name: Standard Data Science
    opendatahub.io/notebook-image-url: 'https://github.com/thoth-station/s2i-generic-data-science-notebook'
  name: s2i-generic-data-science-notebook
  labels:
    app.kubernetes.io/part-of: jupyterhub
    component.opendatahub.io/name: jupyterhub
    opendatahub.io/component: 'true'
    opendatahub.io/notebook-image: 'true'
spec:
  lookupPolicy:
    local: true
  tags:
    - name: v0.0.5
      annotations:
        opendatahub.io/notebook-python-dependencies: >-
          [{"name":"Boto3","version":"1.17.11"},{"name":"Kafka-Python","version":"2.0.2"},{"name":"Matplotlib","version":"3.4.2"},{"name":"Numpy","version":"1.21.0"},{"name":"Pandas","version":"1.2.5"},{"name":"Scipy","version":"1.7.0"}]
        opendatahub.io/notebook-software: '[{"name":"Python","version":"v3.8.6"}]'
        openshift.io/imported-from: quay.io/thoth-station/s2i-generic-data-science-notebook
      from:
        kind: DockerImage
        name: 'quay.io/thoth-station/s2i-generic-data-science-notebook:v0.0.5'
      generation: 2
      importPolicy: {}
      referencePolicy:
        type: Source
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: jupyter-nb-sthoms
spec:
  replicas: 1
  selector:
    matchLabels:
      statefulset: jupyter-nb-sthoms
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: jupyter-nb-sthoms
        notebook-name: jupyter-nb-sthoms
        opendatahub.io/odh-managed: 'true'
        opendatahub.io/user: sthoms
        statefulset: jupyter-nb-sthoms
    spec:
      restartPolicy: Always
      serviceAccountName: jupyter-nb-sthoms
      schedulerName: default-scheduler
      enableServiceLinks: false
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                  - key: nvidia.com/gpu.present
                    operator: NotIn
                    values:
                      - 'true'
      terminationGracePeriodSeconds: 30
      securityContext: {}
      containers:
        - resources:
            limits:
              cpu: '2'
              memory: 8Gi
            requests:
              cpu: '1'
              memory: 8Gi
          readinessProbe:
            httpGet:
              path: /notebook/odhsven/jupyter-nb-sthoms/api
              port: notebook-port
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: jupyter-nb-sthoms
          livenessProbe:
            httpGet:
              path: /notebook/odhsven/jupyter-nb-sthoms/api
              port: notebook-port
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          env:
            - name: NOTEBOOK_ARGS
              value: |-
                --ServerApp.port=8888
                                  --ServerApp.token=''
                                  --ServerApp.password=''
                                  --ServerApp.base_url=/notebook/odhsven/jupyter-nb-sthoms
                                  --ServerApp.quit_button=False
                                  --ServerApp.tornado_settings={"user":"sthoms","hub_host":"https://odh-dashboard-odhsven.apps.ocp4dev.xxxx.mydomain.ch","hub_prefix":"/notebookController/sthoms"}
            - name: JUPYTER_IMAGE
              value: 'quay.io/thoth-station/s2i-generic-data-science-notebook:v0.0.5'
            - name: JUPYTER_NOTEBOOK_PORT
              value: '8888'
            - name: NB_PREFIX
              value: /notebook/odhsven/jupyter-nb-sthoms
          ports:
            - name: notebook-port
              containerPort: 8888
              protocol: TCP
          imagePullPolicy: Always
          volumeMounts:
            - name: jupyterhub-nb-sthoms-pvc
              mountPath: /opt/app-root/src
          terminationMessagePolicy: File
          image: 's2i-generic-data-science-notebook:v0.0.5'
          workingDir: /opt/app-root/src
        - resources:
            limits:
              cpu: 100m
              memory: 64Mi
            requests:
              cpu: 100m
              memory: 64Mi
          readinessProbe:
            httpGet:
              path: /oauth/healthz
              port: oauth-proxy
              scheme: HTTPS
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: oauth-proxy
          livenessProbe:
            httpGet:
              path: /oauth/healthz
              port: oauth-proxy
              scheme: HTTPS
            initialDelaySeconds: 30
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          env:
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
          ports:
            - name: oauth-proxy
              containerPort: 8443
              protocol: TCP
          imagePullPolicy: Always
          volumeMounts:
            - name: oauth-config
              mountPath: /etc/oauth/config
            - name: tls-certificates
              mountPath: /etc/tls/private
          terminationMessagePolicy: File
          image: 'registry.redhat.io/openshift4/ose-oauth-proxy:v4.10'
          args:
            - '--provider=openshift'
            - '--https-address=:8443'
            - '--http-address='
            - '--openshift-service-account=jupyter-nb-sthoms'
            - '--cookie-secret-file=/etc/oauth/config/cookie_secret'
            - '--cookie-expire=24h0m0s'
            - '--tls-cert=/etc/tls/private/tls.crt'
            - '--tls-key=/etc/tls/private/tls.key'
            - '--upstream=http://localhost:8888'
            - '--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
            - >-
              --skip-auth-regex=^(?:/notebook/$(NAMESPACE)/jupyter-nb-sthoms)?/api$
            - '--email-domain=*'
            - '--skip-provider-button'
            - >-
              --openshift-sar={"verb":"get","resource":"notebooks","resourceAPIGroup":"kubeflow.org","resourceName":"jupyter-nb-sthoms","namespace":"$(NAMESPACE)"}
            - >-
              --logout-url=https://odh-dashboard-odhsven.apps.ocp4dev.xxxx.mydomain.ch/notebookController/sthoms/home
      serviceAccount: jupyter-nb-sthoms
      volumes:
        - name: jupyterhub-nb-sthoms-pvc
          persistentVolumeClaim:
            claimName: jupyterhub-nb-sthoms-pvc
        - name: oauth-config
          secret:
            secretName: jupyter-nb-sthoms-oauth-config
            defaultMode: 420
        - name: tls-certificates
          secret:
            secretName: jupyter-nb-sthoms-tls
            defaultMode: 420
      dnsPolicy: ClusterFirst
  serviceName: ''
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 0
  revisionHistoryLimit: 10
status:
  observedGeneration: 1
  replicas: 1
  currentReplicas: 1
  updatedReplicas: 1
  currentRevision: jupyter-nb-sthoms-8578f7d994
  updateRevision: jupyter-nb-sthoms-8578f7d994
  collisionCount: 0
  availableReplicas: 0

I ran:

# create imagestream
oc create -f is.yaml

then i ran

# check import was successful
oc get is -o yaml

to confirm the import, which i did by checking the status section of the output:

  status:
    dockerImageRepository: image-registry.openshift-image-registry.svc:5000/p1/s2i-generic-data-science-notebook
    tags:
    - items:
      - created: "2022-12-08T15:31:49Z"
        dockerImageReference: quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
        generation: 2
        image: sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
      tag: v0.0.5
# create stateful set
oc create -f ss.yaml

and then i confirmed that the image field is substituted as expected (this happens during creation admission):

oc get statefulset -o yaml | grep image:
          image: quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
          image: registry.redhat.io/openshift4/ose-oauth-proxy:v4.10

No pods ended up being created and i didn't really look into why, but fundamentally the statefulset's pod template is as expected: the image field points to the resolved imagestreamtag value and not 's2i-generic-data-science-notebook:v0.0.5' as was in the original statefulset yaml. (your pod's "failure to pull the image" error are directly caused by the fact that the image field in the statefulset was not substituted correctly when the statefulset was created)

The OCP version i used was one of our 4.13 CI builds: 4.13.0-0.ci-2022-12-07-190258 but this behavior hasn't been modified in several releases as far as i know (since the bug you pointed to in 4.6 i believe)

i'm not really sure what to tell you since i used pretty much your exact yaml files to attempt the recreate and it seems to be working as expected.

What OCP version are you on?

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

I am on OCP 4.10.3 on-prem without openshift registry and on 4.13 DEV on AWS with openshift registry.

So hold on, I am probably doing circular reasoning, should work more along the lines of Occam. You analysis helps for now. @bparees Do not go any further for now, thank you.

The differentiating feature I have is the KF Notebook and ODH Notebook Controller, that watches for Notebook CRs and then creates StatefulSets from them. I think I until now only manually modified the existing, KF or ODH Notebook controller-created StatefulSet / Notebook CR. Will try with a brand new one without any annotations from notebook controller.

What bugs me is that the substitution you mention works for everything except a StatefulSet, which is really pretty weird. The substitution in my case is image: quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58, in DeploymentConfigs, Pods, Deployments, just not in StatefulSets .... But, that makes sense, as I think I only edited StatefulSet objects created by the notebook controller with the image field pointing to the imagestream. I think the KF and ODH Notebook Controller is keeping the image field in the StatefulSet from being updated.

from openshift-apiserver.

bparees avatar bparees commented on July 4, 2024

The differentiating feature I have is the KF Notebook and ODH Notebook Controller, that watches for Notebook CRs and then creates StatefulSets from them. I think I until now only manually modified the existing, KF or ODH Notebook controller-created StatefulSet. Will try with a brand new one without any annotations from notebook controller.

ah hah. Yes, if you have some controller that is reconciling the statefulset, it is very likely that the substituted value is being stomped back by the controller, after the admission logic has mutated it.

What bugs me is that the substitution you mention works for everything except a StatefulSet, which is really pretty weird.

assuming those other resources don't have a controller reconciling them, that would explain the difference. The substitution is probably working, but then immediately getting stomped/reverted by the notebook controller.

Try creating a statefulset directly and i think you'll see it work correctly.

I think the KF and ODH Notebook Controller is keeping the image field in the StatefulSet from being updated.

more accurately, it is probably reverting the change after it is made, but yes, fundamentally it's a case of multiple writers competing over the resource.

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

Will look more into that tomorrow, but we are on to something, yes.

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

hah, indeed. Turns out with a fresh StatefulSet, without any notebook controller annotations, the image name and digest is resolved correctly from the imagestream tag and name.

So yes, the Notebook controller Kubeflow Notebook CR owns StatefulSets created from a Notebook def instance and thus interferes with or overrides imagestream resolve on admission of the StatefulSet image resolve to the cluster.

  ownerReferences:
    - apiVersion: kubeflow.org/v1beta1
      kind: Notebook
      controller: true
      blockOwnerDeletion: true

@kpouget @DaoDaoNoCode seems the notebook controller overrides the value of the container.image field as resolved before by the Openshift image policy admission plugin/controller from the imagestream tag and name. As far as notebook controller goes, reconciling logic and StatefulSet logic is here, it seems:

https://github.com/kubeflow/kubeflow/blob/master/components/notebook-controller/controllers/notebook_controller.go#L75

Apparently, there is a kubeflow reconcilehelper package and the reconciliation logic is here (update-logic)

https://github.com/kubeflow/kubeflow/blob/master/components/notebook-controller/controllers/notebook_controller.go#L162

The whole StatefulSet template spec, of which the image-field is a part of, is continually updated, thus overwriting any changes from the image policy admission plug-in:

https://github.com/kubeflow/kubeflow/blob/0327c5dcd943806f75b646012002646cff627d8b/components/common/reconcilehelper/util.go#L131

Anyway, the image policy admission plugin itself is working as expected on the Kubernetes / Openshift side for StatefulSets not owned by KF Notebook controller. Here is the result with a fresh StatefulSet, working as expected:

Statefulset before apply, non-notebook-controller-related, referencing imagestream name and tag in same namespace:

piVersion: apps/v1
kind: StatefulSet
metadata:
  name: testwithoutnotebookannotation
  namespace: odhsven
spec:
  serviceName: testwithoutnotebookannotation
  replicas: 1
  selector:
    matchLabels:
      app: testwithoutnotebookannotation
  template:
    metadata:
      labels:
        app: testwithoutnotebookannotation
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: jupyter
          image: 's2i-generic-data-science-notebook:v0.0.5'

Result working as expected, excerpt from the StatefulSet container spec after applied in cluster:

  labels:
        app: testwithoutnotebookannotation
    spec:
      containers:
        - name: jupyter
          image: >-
            quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58

With this fresh StatefulSet and the pod created from it, all is working fine afterwards.
Image-pull is working as expected, since nothing else was defined, image pull policy ifNotPresent (on node) kicks in, the image had been present on my worker node before, all good:

Container image "quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58" already present on machine

Looks like I need to escalate this error with the ODH Notebook Controller team. Leaving ticket open for now, but thank you for all the help. I also created a ticket at kubeflow explaining the context kubeflow/kubeflow#6829

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

@bparees We have had some issues on a hyperscaler-based Openshift 4.11 and 4.13.0-ec.0 cluster with the image policy admission plugin / metadata annotation doing nothing. Where in Openshift's config is it defined which plugins are active and which are not?
For Openshift 3.10, I found some hints, but cannot find documentation about that cluster-specific stuff for 4.10 or later anywhere.

from openshift-apiserver.

bparees avatar bparees commented on July 4, 2024

it's not configurable in 4.x.

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

weird, thank you. Could PodSecurityPolicy be an issue potentially interfering with image-resolve? Context: fresh StatefulSet on a hyperscaler cluster, annotation in metadata

annotations:
    [image.openshift.io/triggers](http://image.openshift.io/triggers): >-
      [{"from":{"kind":"ImageStreamTag","name":"s2i-generic-data-science-notebook:v0.0.5","namespace":"odhsven"},"fieldPath":"spec.template.spec.containers[?(@.name==\"nginx\")].image"}]

on 4.10.37 in-house, it worked. On the hyperscalers with 4.11 and 4.13, it did not work. The image-field did not get updated. We carefully made sure to reference the correct container name, imagestream name and tag, all that.

I wonder if something keeps changing in the backend. For example, on 4.10.3 in-house, I had to either have

image: " "

or no image-field specified initially in the containers section of the StatefulSet for lookup to work. Now, on 4.10.37 in-house, I can use whatever value I want for the image-field, and it gets updated accordingly.

from openshift-apiserver.

bparees avatar bparees commented on July 4, 2024

does the same annotation work on a deployment or any other resource type?

I don't think PSP could be interfering. The way this is supposed to work, iirc is:

  1. an imagechangetrigger controller watches all the workload(deployment, statefulset, daemonset, etc) resources that have the imagetrigger annotation. Basically anything that contains a PodTemplate
  2. it also watches all the imagestreams that are ref'd by the annotations on those workload resources
  3. when the imagestream changes (new image tag SHA), it updates the workload resource (using the field path indicated by the annotation)
  4. assuming the annotation is specified correctly, once it updates, the workload resource controller (deployment controller, statefulset controller, etc) will see that the pod template has changed (image field value change) and rollout new pods for the workload resource

I don't think the initial value of the image field should matter, though you should be aware that an initial rollout of pods may occur w/ the initial value of the image field, before the imagechangetrigger observes the resources and updates them (which will then cause a subsequent rollout)

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

Yes, the annotation works on Pods, for example, on my in-house OCP 4.10.3 and 4.10.37 clusters. Just not on the hyperscalers 4.11 and 4.13.

That one-space-string or no-field-necessary issue for image-value might have been something specific to 4.10.3.

Now, on 4.10.37, I can have any string I want as the image-value, the actual value from the imagestream gets inserted / replaced. Once I apply the definition, the image-field gets updated accordingly.

Also, the imagestream is initialized and applied, no changes to the images referenced by the imagestream. So, in that way, that substitution of the image-field does not even necessitate an image-change when applying to a new Pod, Deployment, StatefulSet.

from openshift-apiserver.

dmage avatar dmage commented on July 4, 2024

You may have hit a HyperShift bug - https://issues.redhat.com/browse/OCPBUGS-4025

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

hyper-cool you found that, Oleg, děkuji / merci vielmal

from openshift-apiserver.

bparees avatar bparees commented on July 4, 2024

@shalberd these are hypershift guest clusters where you're seeing the issue? When you said hyperscalers i just thought you meant a traditional ocp cluster running on AWS/GCP/Azure.

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

It is on aws *aws-2.ci.openshift.org .... For more @andrewballantyne from Open Data Hub (ODH) team can answer. I am not a member of Open Data Hub, but pushing the use of imagestream abstractions, non-dependence on openshift-internal docker registry, ImageContentSourcePolicy support from an enterprise perspective, very on-prem OCP oriented, but making sure to test extensively with the ODH team.

Why use hypershift with AWS and Azure anyways? I mean, does that not add an unneccessary layer of abstraction, with reconciliations interfering there, too. Just interested ...

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

@bparees Regarding collisions between the image policy admission plugin / imagechangetrigger writing to the image-field and the notebook controller overwriting it on reconcilation, what is the conceptually best way to avoid overwriting the image-field by KF notebook controller? Is it

a) excluding the image-field from reconciliation, for example, or is it instead to
b) let the controller handle getting the image-value from the imagestream itself, similar to this approach for the oauth-image in another controller?

To me, the approach that the Elastic operator took looks like duplicating logic, but I am very much new to the topic of operators and object reconciliations. Also, since imagestream is specific to openshift and not always is the image-field of a notebook CR content based an a metadata annotation, this being a speciality case on openshift, I'd go towards approach a.

from openshift-apiserver.

bparees avatar bparees commented on July 4, 2024

btw, serverside apply may offer another approach to this problem, by allowing controllers to assert ownership/priority of specific fields within a resource:

https://kubernetes.io/docs/reference/using-api/server-side-apply/

I suspect that would first require updates to the imagechangetrigger logic to respect those semantics (in addition to the kf notebook controller supporting them), but since serverside apply is k8s' long term strategic answer to these sorts of conflicts, i'd be remiss in not at least mentioning it.

from openshift-apiserver.

shalberd avatar shalberd commented on July 4, 2024

Thanks a lot, that helps me get started, I really appreciate your input. Here's to a good new year.

from openshift-apiserver.

openshift-bot avatar openshift-bot commented on July 4, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

from openshift-apiserver.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.