Giter Club home page Giter Club logo

example-seldon's Introduction

OpenSSF Best Practices OpenSSF Scorecard CLOMonitor

Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment.


Documentation

Please refer to the official docs at kubeflow.org.

Working Groups

The Kubeflow community is organized into working groups (WGs) with associated repositories, that focus on specific pieces of the ML platform.

Quick Links

Get Involved

Please refer to the Community page.

example-seldon's People

Contributors

jinchihe avatar nicholas-fwang avatar ryandawsonuk avatar ukclivecox avatar windkit avatar zijianjoy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

example-seldon's Issues

predict fails and seldondeployment missing .status

@cliveseldon
Calling predict on a deployment that returned sucess fails with a connection error. Attempting to debug this reveals that .status is missing from seldondeployment. Sugestions for how to debug this?

!kubectl get seldondeployments mnist-classifier -o jsonpath='{.status}'

returns nothing

!kubectl get seldondeployments mnist-classifier -o json
returns
{
"apiVersion": "machinelearning.seldon.io/v1alpha2",
"kind": "SeldonDeployment",
"metadata": {
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{"apiVersion":"machinelearning.seldon.io/v1alpha2","kind":"SeldonDeployment","metadata":{"annotations":{},"labels":{"app":"seldon"},"name":"mnist-classifier","namespace":"kubeflow"},"spec":{"annotations":{"deployment_version":"v1","project_name":"MNIST Example","seldon.io/engine-separate-pod":"false","seldon.io/rest-connection-timeout":"100"},"name":"mnist-classifier","predictors":[{"annotations":{"predictor_version":"v1"},"componentSpecs":[{"spec":{"containers":[{"image":"seldonio/deepmnistclassifier_runtime:0.2","imagePullPolicy":"Always","name":"tf-model","volumeMounts":[{"mountPath":"/data","name":"persistent-storage"}]}],"terminationGracePeriodSeconds":1,"volumes":[{"name":"persistent-storage","volumeSource":{"persistentVolumeClaim":{"claimName":"nfs-1"}}}]}}],"graph":{"children":[],"endpoint":{"type":"REST"},"name":"tf-model","type":"MODEL"},"name":"mnist-classifier","replicas":1}]}}\n"
},
"creationTimestamp": "2019-04-18T21:26:32Z",
"generation": 1,
"labels": {
"app": "seldon"
},
"name": "mnist-classifier",
"namespace": "kubeflow",
"resourceVersion": "128631",
"selfLink": "/apis/machinelearning.seldon.io/v1alpha2/namespaces/kubeflow/seldondeployments/mnist-classifier",
"uid": "a3450e71-6220-11e9-a023-da0ed60f5a55"
},
"spec": {
"annotations": {
"deployment_version": "v1",
"project_name": "MNIST Example",
"seldon.io/engine-separate-pod": "false",
"seldon.io/rest-connection-timeout": "100"
},
"name": "mnist-classifier",
"predictors": [
{
"annotations": {
"predictor_version": "v1"
},
"componentSpecs": [
{
"spec": {
"containers": [
{
"image": "seldonio/deepmnistclassifier_runtime:0.2",
"imagePullPolicy": "Always",
"name": "tf-model",
"volumeMounts": [
{
"mountPath": "/data",
"name": "persistent-storage"
}
]
}
],
"terminationGracePeriodSeconds": 1,
"volumes": [
{
"name": "persistent-storage",
"volumeSource": {
"persistentVolumeClaim": {
"claimName": "nfs-1"
}
}
}
]
}
}
],
"graph": {
"children": [],
"endpoint": {
"type": "REST"
},
"name": "tf-model",
"type": "MODEL"
},
"name": "mnist-classifier",
"replicas": 1
}
]
}
}

Training error on all frameworks

And I am running into an error training the model with all the different frameworks.  This is a new installation and this is my first time through, so I expect that there is a missing dependency or something, but I cannot figure out how to debug this and find out what the problem it.  The error does not occur until about 4 hours into a run, so I can replicate it reliably, but it takes a long time to do so.

Here is the error:

Name: kubeflow-tf-train-bp9ln
Namespace: kubeflow
ServiceAccount: default
Status: Failed
Message: child 'kubeflow-tf-train-bp9ln-480988007' failed
Created: Tue Apr 02 21:03:12 +0000 (1 week ago)
Started: Tue Apr 02 21:03:12 +0000 (1 week ago)
Finished: Tue Apr 02 21:03:21 +0000 (1 week ago)
Duration: 9 seconds
Parameters:
tfjob-version-hack: 1
version: 0.1
github-user: kubeflow
github-revision: master
docker-user: seldonio
build-push-image: false

STEP PODNAME DURATION MESSAGE
✖ kubeflow-tf-train-bp9ln child 'kubeflow-tf-train-bp9ln-480988007' failed
├---○ build-push when 'false == true' evaluated false
└---✖ train kubeflow-tf-train-bp9ln-480988007 8s Error from server (AlreadyExists): error when creating "/tmp/manifest.yaml": tfjobs.kubeflow.org "mnist-train-1" already exists

RESOURCE_ERROR:No valid versions with the prefix \"1.11\" found

I tried to deploy the seldon on GCP after changing the env.sh file & running create_demo.sh. Got below error while deployig on GCP:
ERROR: (gcloud.deployment-manager.deployments.create) Error in Operation [operation-1608701216204-5b71af08aa508-44ff7cbc-8fd80a7f]: errors:

Please help!

Not using GCloud to setup Seldon

Hello,
Can seldon-core be setup without using GCloud ?
I mean, can I have the NFS set on the VM itself ?
Please share your inputs.
Thanks in advance.

deploy.sh fail with error "must provide URIs beginning with 'github.com'"

During installation the following command will fail:
$ curl https://raw.githubusercontent.com/kubeflow/kubeflow/v${KUBEFLOW_VERSION}/scripts/deploy.sh | bash

Produce error:
"ERROR Registries using protocol 'github' must provide URIs beginning with 'github.com' (optionally prefaced with 'http', 'https', 'www', and so on"
This is the full output:

$ export KUBEFLOW_VERSION=0.2.2
$ export KUBEFLOW_KS_DIR=/home/arllanos/ks_kubeflow_seldon
$ export KUBEFLOW_DEPLOY=false
$ curl https://raw.githubusercontent.com/kubeflow/kubeflow/v${KUBEFLOW_VERSION}/scripts/deploy.sh | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1947  100  1947    0     0   2309      0 --:--:-- --:--:-- --:--:--  2306
++ pwd
+ KUBEFLOW_REPO=/home/arllanos/kubeflow_repo
+ KUBEFLOW_VERSION=0.2.2
+ KUBEFLOW_DEPLOY=false
+ [[ ! -d /home/arllanos/kubeflow_repo ]]
+ source /home/arllanos/kubeflow_repo/scripts/util.sh
+ check_install ks
+ which ks
+ check_install kubectl
+ which kubectl
+ DEPLOYMENT_NAME=kubeflow
+ KUBEFLOW_KS_DIR=/home/arllanos/ks_kubeflow_seldon
++ dirname /home/arllanos/ks_kubeflow_seldon
+ cd /home/arllanos
++ basename /home/arllanos/ks_kubeflow_seldon
+ ks init ks_kubeflow_seldon
INFO  Using context 'gke_silicon-cell-209113_us-central1-a_kubeflow-seldon-ml' from the kubeconfig file specified at the environment variable $KUBECONFIG
INFO  Creating environment "default" with namespace "default", pointing to cluster at address "https://35.224.223.191"
INFO  Generating ksonnet-lib data at path '/home/arllanos/ks_kubeflow_seldon/lib/v1.7.0'
INFO  ksonnet app successfully created! Next, try creating a component with `ks generate`.
+ cd /home/arllanos/ks_kubeflow_seldon
+ ks registry add kubeflow /home/arllanos/kubeflow_repo/kubeflow
ERROR Registries using protocol 'github' must provide URIs beginning with 'github.com' (optionally prefaced with 'http', 'https', 'www', and so on

-BTW, in setup.md there is a broken link (404-not found) in the following text:
"Install kubeflow - for details see here"

Prediction Analytics Dashboard not showing metrics

I deployed a keras mnist model using seldon core and I'm trying to monitor the model using seldon-core-analytics with grafana dashaboards. The cluster_monitoring dashboard looks fine but the Prediction Analytics doesn't find the my deployed model and all the panels are empty. I have installed kubeflow, the seldon-core and seldon-core-analytics in the same namespace and I have my model deployed in that same namespace.

After checking prometheus service logs in the same namespace I found the following message recurrently:

level=warn ts=2019-07-17T16:24:27.691294684Z caller=scrape.go:836 component="scrape manager" scrape_pool=kubernetes-pods target=http://192.168.49.198:16686/metrics msg="append failed" err=""INVALID" is not a valid start token"

Can you please advise on this?

Thanks.

Wrapping model for MNIST Scikit-learn doesn't work

When trying to serve the MNIST Scikit-learn model (kubeflow-seldon example), using the latest code there (which is using s2i for building images) I'm getting the following error:

santiago@santiago-Inspiron-5559:~$ kubectl logs seldon-sk-deploy-wc9qg-927719684 main
Connecting to github.com (192.30.253.112:443)
wget: can't execute 'ssl_helper': No such file or directory
wget: error getting response: Connection reset by peer
tar: can't open 'source-to-image-v1.1.9a-40ad911d-linux-amd64.tar.gz': No such file or directory
Cannot connect to the Docker daemon at tcp://127.0.0.1:2375. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://127.0.0.1:2375. Is the docker daemon running?
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
./wrap.sh: line 14: ./s2i: not found
REPOSITORY TAG IMAGE ID CREATED SIZE
Pushing image to santiagomol40/skmnistclassifier_runtime:0.1
Login Succeeded
The push refers to a repository [docker.io/santiagomol40/skmnistclassifier_runtime]
An image does not exist locally with the tag: santiagomol40/skmnistclassifier_runtime

Looks like the container in charge of running s2i doesn't have SSL support.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.