DEPRECATED: Seldon Core Go Controller
The Seldon Operator has now moved to the main seldon-core repo: https://github.com/SeldonIO/seldon-core/tree/master/operator
Seldon Core Operator for Kubernetes
License: Apache License 2.0
The Seldon Operator has now moved to the main seldon-core repo: https://github.com/SeldonIO/seldon-core/tree/master/operator
Kubernetes has been deprecating API(s), which will be removed and are no longer available in 1.22. Operators projects using these APIs versions will not work on Kubernetes 1.22 or any cluster vendor using this Kubernetes version(1.22), such as OpenShift 4.9+. Following the APIs that are most likely your projects to be affected by:
Therefore, looks like this project distributes solutions in the repository and does not contain any version compatible with k8s 1.22/OCP 4.9. (More info). Following some findings by checking the distributions published:
NOTE: The above findings are only about the manifests shipped inside of the distribution. It is not checking the codebase.
It would be very nice to see new distributions of this project that are no longer using these APIs and so they can work on Kubernetes 1.22 and newer and published in the community-operators collection. OpenShift 4.9, for example, will not ship operators anymore that do still use v1beta1 extension APIs.
Due to the number of options available to build Operators, it is hard to provide direct guidance on updating your operator to support Kubernetes 1.22. Recent versions of the OperatorSDK greater than 1.0.0 and Kubebuilder greater than 3.0.0 scaffold your project with the latest versions of these APIs (all that is generated by tools only). See the guides to upgrade your projects with OperatorSDK Golang, Ansible, Helm or the Kubebuilder one. For APIs other than the ones mentioned above, you will have to check your code for usage of removed API versions and upgrade to newer APIs. The details of this depend on your codebase.
If this projects only need to migrate the API for CRDs and it was built with OperatorSDK versions lower than 1.0.0 then, you maybe able to solve it with an OperatorSDK version >= v0.18.x < 1.0.0:
$ operator-sdk generate crds --crd-version=v1
INFO[0000] Running CRD generator.
INFO[0000] CRD generation complete.
Alternatively, you can try to upgrade your manifests with controller-gen (version >= v0.4.1) :
$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role paths="./..."
Add the markers sideEffects and admissionReviewVersions to your webhook (Example with sideEffects=None and admissionReviewVersions={v1,v1beta1}: memcached-operator/api/v1alpha1/memcached_webhook.go):
Run the command:
$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role webhook paths="./..."
For further information and tips see the comment.
Hello.
I came across issues when deploying models in the scenario described in the title. Is this supported? The work-around was to add a policy to the namespace to allow permissive mTLS. Can this be an enhancement if not supported at this time?
Thanks
Kubernetes has been deprecating API(s), which will be removed and are no longer available in 1.22. Operators projects using these APIs versions will not work on Kubernetes 1.22 or any cluster vendor using this Kubernetes version(1.22), such as OpenShift 4.9+. Following the APIs that are most likely your projects to be affected by:
Therefore, looks like this project distributes solutions via the Red Hat Connect with the package name as seldon-operator-certified and does not contain any version compatible with k8s 1.22/OCP 4.9. Following some findings by checking the distributions published:
NOTE: The above findings are only about the manifests shipped inside of the distribution. It is not checking the codebase.
It would be very nice to see new distributions of this project that are no longer using these APIs and so they can work on Kubernetes 1.22 and newer and published in the Red Hat Connect collection. OpenShift 4.9, for example, will not ship operators anymore that do still use v1beta1 extension APIs.
Due to the number of options available to build Operators, it is hard to provide direct guidance on updating your operator to support Kubernetes 1.22. Recent versions of the OperatorSDK greater than 1.0.0 and Kubebuilder greater than 3.0.0 scaffold your project with the latest versions of these APIs (all that is generated by tools only). See the guides to upgrade your projects with OperatorSDK Golang, Ansible, Helm or the Kubebuilder one. For APIs other than the ones mentioned above, you will have to check your code for usage of removed API versions and upgrade to newer APIs. The details of this depend on your codebase.
If this projects only need to migrate the API for CRDs and it was built with OperatorSDK versions lower than 1.0.0 then, you maybe able to solve it with an OperatorSDK version >= v0.18.x < 1.0.0:
$ operator-sdk generate crds --crd-version=v1
INFO[0000] Running CRD generator.
INFO[0000] CRD generation complete.
Alternatively, you can try to upgrade your manifests with controller-gen (version >= v0.4.1) :
$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role paths="./..."
Add the markers sideEffects and admissionReviewVersions to your webhook (Example with sideEffects=None and admissionReviewVersions={v1,v1beta1}: memcached-operator/api/v1alpha1/memcached_webhook.go):
Run the command:
$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role webhook paths="./..."
For further info and tips see the blog.
Thank you for your attention.
Currently the tfserving-proxy image was updated to support the JSON API endpoint that the standard TFX server API supports. The objective of this task is to make sure that this image works correctly with the TFProxy. Once this Issue is closed, we should be able to close SeldonIO/seldon-core#693.
2019/07/01 12:53:54 http: TLS handshake error from 10.5.139.146:57506: tls: first record does not look like a TLS handshake
2019/07/01 12:53:56 http: TLS handshake error from 10.5.167.79:46926: tls: first record does not look like a TLS handshake
If I add container in my pod, the seldon controller does not create the ambassador REST config. I'm talking about the Mapping object that ends with a -main
that has the REST and GRPC mapping. My custom container does listen on a TCP Port (not http or grpc) and we use it for streaming logs.
This object:
This has to be a fresh new SeldonDeployment. If you are just updating an existing one and add the second container, you will run into a weird issue where there are two deployments for that Mapping object and the traffic splits between the two deployments.
Here are the logs of the controller (truncated to remove the huge lines -- I can provide them if needed):
{"level":"info","ts":1569889753.1269317,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889753.1271641,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889753.1274755,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889753.1278055,"logger":"seldon-controller","msg":"Creating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"lvel":"info","ts":1569889760.0835779,"logger":"seldon-controller","msg":"Creating Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb"}
{"level":"info","ts":1569889760.184504,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889760.214855,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889760.2150455,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889760.2153084,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889760.216094,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889760.239691,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889760.304992,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889760.3273509,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889760.3275404,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889760.327765,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889760.3282974,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889760.3377779,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889760.3378348,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889760.4085755,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889760.4087677,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889760.409045,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889760.4097595,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889760.4594553,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889760.459514,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889760.7984533,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889760.7986178,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889760.798862,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889760.7995014,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889760.874102,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889760.8741508,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
{"level":"info","ts":1569889812.5688767,"logger":"seldon-controller","msg":"pSvcName","val":"sort-server-sort-main"}
{"level":"info","ts":1569889812.5691185,"logger":"seldon-controller","msg":"Not creating container service for scribe"}
{"level":"info","ts":1569889812.5694637,"logger":"seldon-controller","msg":"Creating default Ambassador config"}
{"level":"info","ts":1569889812.570288,"logger":"seldon-controller","msg":"Updating Deployment","namespace":"dspe-seldon","name":"sort-main-00ed72d"}
{"level":"info","ts":1569889817.0923462,"logger":"seldon-controller","msg":"Found identical Service","namespace":"dspe-seldon","name":"seldon-447e09a2c0b7151d4e8d26ba14f046eb","status":{"loadBalancer":{}}}
{"level":"info","ts":1569889817.092424,"logger":"seldon-controller","msg":"Skipping Ambassador Svc"}
(base) โ ~ kubectl get all
NAME AGE
deploy/sort-main-00ed72d 5m
NAME AGE
rs/sort-main-00ed72d-6c9b4698d9 5m
NAME READY STATUS RESTARTS AGE
po/sort-main-00ed72d-6c9b4698d9-pqw5m 3/3 Running 0 5m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/seldon-447e09a2c0b7151d4e8d26ba14f046eb ClusterIP 100.77.53.191 <none> 9000/TCP 5m
Here's the SeldonDeployment (truncated volumemounts and other irrelevant stuff)
---
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
labels:
app: seldon
app.kubernetes.io/instance: dspe-seldon-ndmad2
app.kubernetes.io/managed-by: argocd
app.kubernetes.io/part-of: dspe-seldon
name: sort-server
namespace: dspe-seldon
spec:
name: sort
predictors:
- componentSpecs:
- spec:
containers:
- image: classifier:2.0
imagePullPolicy: Always
name: sort-classifier
resources:
limits:
cpu: 4
memory: 8Gi
image: logger:0.14
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sleep
- "15"
name: scribe
ports:
- containerPort: 1463
name: probe-port
protocol: TCP
- containerPort: 1473
protocol: TCP
graph:
children: []
endpoint:
type: REST
name: sort-classifier
type: MODEL
name: main
replicas: 1
svcOrchSpec:
env:
- name: JAVA_OPTS
value: -server -Xms512m -Xmx512m -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=20 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:GCLogFileSize=10485760
-XX:NumberOfGCLogFiles=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+UseGCLogFileRotation -Xloggc:/tmp/gc.log -XX:+UseTLAB -XX:+DisableExplicitGC
- name: SELDON_LOG_LEVEL
value: DEBUG
resources:
limits:
cpu: 4
memory: 4Gi
requests:
cpu: 500m
memory: 512Mi
Let me know if I can help debug in any way!
This respository is built as part of the seldon-core
build but it misses a version tag / release. It should be versioned as seldon-core
.
The go operator needs to have client functions that are Kubernetes like APIs fro the Seldon Deployment CRDs.
The kubernetes code generator project can be utilized for creating these API from the CRD definition types.go file mentioned in the current code base.
We have a seldon deployment, after deleted the hpaSpec from the yaml file, seldon operator didn't remove the hpa object. We are using seldon-operator v0.4.1, wondering is this a bug ?
Kubernetes adds defaultMode
if not specified to volume defns. This is causing the controller to think the deployment created differs from the one specified.
Based on running the seldon operator in openshift in operator-framework/community-operators#447, it looks like the operator should have a kubebuilder mark for permissions on seldondeployments/finalizers
The CSV file for the community operator was updated and contains '*' for verbs, but get, update, and patch appear to be what's needed.
For the TensorFlow Serving implementation, it is not possible in a SeldonDeployment
to override part of the componentSpecs.spec.containers
for the tfserving container. If one overrides it, he will need to override the whole container spec.
It would be a nice improvement to be able to only redefine part of it, like it is already done in the tensorflow proxy container.
Would you like a PR for this?
Updates to Makefile and folder structure needed to work with Kustomize v2
When SeldonDeployment is created with istio siddecar injection enabled with mtls strict mode
initContainer tfserving-model-initializer is not able to pull model from s3
There is a known issue in istio side that initContainer will not be able to do outside calls
istio/istio#11130
I have a Kubeflow 0.7.1 cluster setup using https://www.kubeflow.org/docs/started/k8s/kfctl-existing-arrikto/ and apply seldon.io/rest-read-timeout, seldon.io/rest-connection-timeout, seldon.io/grpc-read-timeout annotations to set the timeout to 30 sec.
It works perfectly fine when I call 'predict' from outside of the cluster. However, when I call within (e.g. from a Jupyter notebook) it fails (HTTP Status and time highlighted):
[2020-02-03T23:54:14.447Z] "POST /seldon/aneverov/server-78190d6619e14653926768f60a016848/api/v0.1/predictions HTTP/1.1" 200 - "-" 168 381 30028 30026 "10.233.74.1" "python-requests/2.22.0" "84b462d5-f2d0-9481-9eb5-26e822375958" "10.50.8.102" "127.0.0.1:8000" inbound|8000|http|seldon-b3bd70ca9777516558eba158a9f106f0.aneverov.svc.cluster.local - 10.233.69.224:8000 10.233.74.1:0 -
vs
[2020-02-03T23:49:52.035Z] "POST /seldon/aneverov/server-78190d6619e14653926768f60a016848/api/v0.1/predictions HTTP/1.1" 504 UT "-" 168 24 15001 - "-" "python-requests/2.22.0" "18db807f-cf01-9d3a-9c55-912c58382796" "10.50.8.102" "10.50.8.102:80" PassthroughCluster - 10.50.8.102:80 10.233.73.217:38208 -
The difference is about it taking a different route (e.g. PassthroughCluster).
There are some mentions of the "magic" 15 sec timeout (istio/istio#16915 (comment), istio/istio#1888), but I haven't found a working solution yet.
Don't allow 'seldon-container-engine' or 'tfserving'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.