Giter Club home page Giter Club logo

seldon-operator's Introduction

seldon-operator's People

Contributors

axsaucedo avatar gsunner avatar philippslang avatar phsiao avatar ryandawsonuk avatar tmckayus avatar ukclivecox avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

seldon-operator's Issues

Operator projects using the removed APIs in k8s 1.22 requires changes.

Problem Description

Kubernetes has been deprecating API(s), which will be removed and are no longer available in 1.22. Operators projects using these APIs versions will not work on Kubernetes 1.22 or any cluster vendor using this Kubernetes version(1.22), such as OpenShift 4.9+. Following the APIs that are most likely your projects to be affected by:

  • apiextensions.k8s.io/v1beta1: (Used for CRDs and available since v1.16)
  • rbac.authorization.k8s.io/v1beta1: (Used for RBAC/rules and available since v1.8)
  • admissionregistration.k8s.io/v1beta1 (Used for Webhooks and available since v1.16)

Therefore, looks like this project distributes solutions in the repository and does not contain any version compatible with k8s 1.22/OCP 4.9. (More info). Following some findings by checking the distributions published:

NOTE: The above findings are only about the manifests shipped inside of the distribution. It is not checking the codebase.

How to solve

It would be very nice to see new distributions of this project that are no longer using these APIs and so they can work on Kubernetes 1.22 and newer and published in the community-operators collection. OpenShift 4.9, for example, will not ship operators anymore that do still use v1beta1 extension APIs.

Due to the number of options available to build Operators, it is hard to provide direct guidance on updating your operator to support Kubernetes 1.22. Recent versions of the OperatorSDK greater than 1.0.0 and Kubebuilder greater than 3.0.0 scaffold your project with the latest versions of these APIs (all that is generated by tools only). See the guides to upgrade your projects with OperatorSDK Golang, Ansible, Helm or the Kubebuilder one. For APIs other than the ones mentioned above, you will have to check your code for usage of removed API versions and upgrade to newer APIs. The details of this depend on your codebase.

If this projects only need to migrate the API for CRDs and it was built with OperatorSDK versions lower than 1.0.0 then, you maybe able to solve it with an OperatorSDK version >= v0.18.x < 1.0.0:

$ operator-sdk generate crds --crd-version=v1
INFO[0000] Running CRD generator.
INFO[0000] CRD generation complete.

Alternatively, you can try to upgrade your manifests with controller-gen (version >= v0.4.1) :

If this project does not use Webhooks:

$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role paths="./..."

If this project is using Webhooks:

  1. Add the markers sideEffects and admissionReviewVersions to your webhook (Example with sideEffects=None and admissionReviewVersions={v1,v1beta1}: memcached-operator/api/v1alpha1/memcached_webhook.go):

  2. Run the command:

$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role webhook paths="./..."

For further information and tips see the comment.

Seldon Operator and Istio with strict mTLS does not work

Hello.

I came across issues when deploying models in the scenario described in the title. Is this supported? The work-around was to add a policy to the namespace to allow permissive mTLS. Can this be an enhancement if not supported at this time?

Thanks

Operator projects using the removed APIs in k8s 1.22 requires changes.

Problem Description

Kubernetes has been deprecating API(s), which will be removed and are no longer available in 1.22. Operators projects using these APIs versions will not work on Kubernetes 1.22 or any cluster vendor using this Kubernetes version(1.22), such as OpenShift 4.9+. Following the APIs that are most likely your projects to be affected by:

  • apiextensions.k8s.io/v1beta1: (Used for CRDs and available since v1.16)
  • rbac.authorization.k8s.io/v1beta1: (Used for RBAC/rules and available since v1.8)
  • admissionregistration.k8s.io/v1beta1 (Used for Webhooks and available since v1.16)

Therefore, looks like this project distributes solutions via the Red Hat Connect with the package name as seldon-operator-certified and does not contain any version compatible with k8s 1.22/OCP 4.9. Following some findings by checking the distributions published:

NOTE: The above findings are only about the manifests shipped inside of the distribution. It is not checking the codebase.

How to solve

It would be very nice to see new distributions of this project that are no longer using these APIs and so they can work on Kubernetes 1.22 and newer and published in the Red Hat Connect collection. OpenShift 4.9, for example, will not ship operators anymore that do still use v1beta1 extension APIs.

Due to the number of options available to build Operators, it is hard to provide direct guidance on updating your operator to support Kubernetes 1.22. Recent versions of the OperatorSDK greater than 1.0.0 and Kubebuilder greater than 3.0.0 scaffold your project with the latest versions of these APIs (all that is generated by tools only). See the guides to upgrade your projects with OperatorSDK Golang, Ansible, Helm or the Kubebuilder one. For APIs other than the ones mentioned above, you will have to check your code for usage of removed API versions and upgrade to newer APIs. The details of this depend on your codebase.

If this projects only need to migrate the API for CRDs and it was built with OperatorSDK versions lower than 1.0.0 then, you maybe able to solve it with an OperatorSDK version >= v0.18.x < 1.0.0:

$ operator-sdk generate crds --crd-version=v1
INFO[0000] Running CRD generator.
INFO[0000] CRD generation complete.

Alternatively, you can try to upgrade your manifests with controller-gen (version >= v0.4.1) :

If this project does not use Webhooks:

$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role paths="./..."

If this project is using Webhooks:

  1. Add the markers sideEffects and admissionReviewVersions to your webhook (Example with sideEffects=None and admissionReviewVersions={v1,v1beta1}: memcached-operator/api/v1alpha1/memcached_webhook.go):

  2. Run the command:

$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role webhook paths="./..."

For further info and tips see the blog.

Thank you for your attention.

TLS Errors in logs

2019/07/01 12:53:54 http: TLS handshake error from 10.5.139.146:57506: tls: first record does not look like a TLS handshake
2019/07/01 12:53:56 http: TLS handshake error from 10.5.167.79:46926: tls: first record does not look like a TLS handshake

Seldon controller does not create Ambassador REST config if a non PU container exists

If I add container in my pod, the seldon controller does not create the ambassador REST config. I'm talking about the Mapping object that ends with a -main that has the REST and GRPC mapping. My custom container does listen on a TCP Port (not http or grpc) and we use it for streaming logs.

This object:

This has to be a fresh new SeldonDeployment. If you are just updating an existing one and add the second container, you will run into a weird issue where there are two deployments for that Mapping object and the traffic splits between the two deployments.

Here are the logs of the controller (truncated to remove the huge lines -- I can provide them if needed):

{"level":"info","ts":1569889753.1269317,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889753.1271641,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889753.1274755,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889753.1278055,"logger":"seldon-controller","msg":"Creating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"lvel":"info","ts":1569889760.0835779,"logger":"seldon-controller","msg":"Creating Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb"}
{"level":"info","ts":1569889760.184504,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889760.214855,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889760.2150455,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889760.2153084,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889760.216094,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889760.239691,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889760.304992,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889760.3273509,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889760.3275404,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889760.327765,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889760.3282974,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889760.3377779,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889760.3378348,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889760.4085755,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889760.4087677,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889760.409045,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889760.4097595,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889760.4594553,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889760.459514,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889760.7984533,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889760.7986178,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889760.798862,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889760.7995014,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889760.874102,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889760.8741508,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889812.5688767,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889812.5691185,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889812.5694637,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889812.570288,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889817.0923462,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889817.092424,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
(base) โžœ  ~ kubectl get all
NAME                       AGE
deploy/sort-main-00ed72d   5m

NAME                              AGE
rs/sort-main-00ed72d-6c9b4698d9   5m

NAME                                    READY     STATUS    RESTARTS   AGE
po/sort-main-00ed72d-6c9b4698d9-pqw5m   3/3       Running   0          5m

NAME                                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
svc/seldon-447e09a2c0b7151d4e8d26ba14f046eb   ClusterIP   100.77.53.191   <none>        9000/TCP   5m

Here's the SeldonDeployment (truncated volumemounts and other irrelevant stuff)

---
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
    app.kubernetes.io/instance: dspe-seldon-ndmad2
    app.kubernetes.io/managed-by: argocd
    app.kubernetes.io/part-of: dspe-seldon
  name: sort-server
  namespace: dspe-seldon
spec:
  name: sort
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: classifier:2.0
          imagePullPolicy: Always
          name: sort-classifier
          resources:
            limits:
              cpu: 4
              memory: 8Gi
          image: logger:0.14
          imagePullPolicy: IfNotPresent
          lifecycle:
            preStop:
              exec:
                command:
                - /bin/sleep
                - "15"
          name: scribe
          ports:
          - containerPort: 1463
            name: probe-port
            protocol: TCP
          - containerPort: 1473
            protocol: TCP
    graph:
      children: []
      endpoint:
        type: REST
      name: sort-classifier
      type: MODEL
    name: main
    replicas: 1
    svcOrchSpec:
      env:
      - name: JAVA_OPTS
        value: -server -Xms512m -Xmx512m -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions
          -XX:G1NewSizePercent=20 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:GCLogFileSize=10485760
          -XX:NumberOfGCLogFiles=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
          -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
          -XX:+UseGCLogFileRotation -Xloggc:/tmp/gc.log -XX:+UseTLAB -XX:+DisableExplicitGC
      - name: SELDON_LOG_LEVEL
        value: DEBUG
      resources:
        limits:
          cpu: 4
          memory: 4Gi
        requests:
          cpu: 500m
          memory: 512Mi

Let me know if I can help debug in any way!

Missing release version

This respository is built as part of the seldon-core build but it misses a version tag / release. It should be versioned as seldon-core.

Volumes causing deployment updates

Kubernetes adds defaultMode if not specified to volume defns. This is causing the controller to think the deployment created differs from the one specified.

prepackaged servers: Tensorflow serving: ability to override only part of containers

For the TensorFlow Serving implementation, it is not possible in a SeldonDeployment to override part of the componentSpecs.spec.containers for the tfserving container. If one overrides it, he will need to override the whole container spec.

It would be a nice improvement to be able to only redefine part of it, like it is already done in the tensorflow proxy container.

Would you like a PR for this?

Timeout annotations are not respected when calling REST endpoint within Istio env

I have a Kubeflow 0.7.1 cluster setup using https://www.kubeflow.org/docs/started/k8s/kfctl-existing-arrikto/ and apply seldon.io/rest-read-timeout, seldon.io/rest-connection-timeout, seldon.io/grpc-read-timeout annotations to set the timeout to 30 sec.

It works perfectly fine when I call 'predict' from outside of the cluster. However, when I call within (e.g. from a Jupyter notebook) it fails (HTTP Status and time highlighted):

[2020-02-03T23:54:14.447Z] "POST /seldon/aneverov/server-78190d6619e14653926768f60a016848/api/v0.1/predictions HTTP/1.1" 200 - "-" 168 381 30028 30026 "10.233.74.1" "python-requests/2.22.0" "84b462d5-f2d0-9481-9eb5-26e822375958" "10.50.8.102" "127.0.0.1:8000" inbound|8000|http|seldon-b3bd70ca9777516558eba158a9f106f0.aneverov.svc.cluster.local - 10.233.69.224:8000 10.233.74.1:0 -

vs

[2020-02-03T23:49:52.035Z] "POST /seldon/aneverov/server-78190d6619e14653926768f60a016848/api/v0.1/predictions HTTP/1.1" 504 UT "-" 168 24 15001 - "-" "python-requests/2.22.0" "18db807f-cf01-9d3a-9c55-912c58382796" "10.50.8.102" "10.50.8.102:80" PassthroughCluster - 10.50.8.102:80 10.233.73.217:38208 -

The difference is about it taking a different route (e.g. PassthroughCluster).

There are some mentions of the "magic" 15 sec timeout (istio/istio#16915 (comment), istio/istio#1888), but I haven't found a working solution yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.