kserve / modelmesh-serving Goto Github PK
View Code? Open in Web Editor NEWController for ModelMesh
License: Apache License 2.0
Controller for ModelMesh
License: Apache License 2.0
The ModelMesh Runtime Adapter has been refactored to leverage a go-library named pullman
. Currently it supports http and s3 based on the storage providers listed here. Since many KServe users use Google Cloud Storage, let's aim to add pullman support for GCS.
For reference:
To help onboard new developers, we should have a document that goes over some of the development processes.
Should include:
Can also add things like running controller-gen to generate code and manifests.
Try to update the model type(using tensorflow instead of sklearn) and location following the example , will get this error:
% grpcurl \
-plaintext \
-proto fvt/proto/kfs_inference_v2.proto \
-d '{ "model_name": "example-mnist-predictor", "inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "contents": { "fp32_contents": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0] }}]}' \
localhost:8033 \
inference.GRPCInferenceService.ModelInfer
ERROR:
Code: InvalidArgument
Message: inference.GRPCInferenceService/ModelInfer: INVALID_ARGUMENT: unexpected inference input 'predict' for model 'example-mnist-predictor__ksp-c3597b719f'
The default of 4MiB is often too small for many use cases. This should be increased in the mlserver serving runtime config.
This was recently made configurable via SeldonIO/MLServer#317. Should probably be 16Mib to match the modelmesh default.
The InferenceService CRD is the primary interface for KServe, so we want to ensure that model deployments on ModelMesh work as expected when users use InferenceService CRD.
I think for this, we just need some basic tests:
"serving.kserve.io/deploymentMode": "ModelMesh"
annotation, and ensure that it becomes ready, and you can perform inference.Would like to test the following formats:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: example-sklearn-isvc
annotations:
serving.kserve.io/deploymentMode: ModelMesh
serving.kserve.io/secretKey: localMinIO
spec:
predictor:
sklearn:
storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib
And also using the new format:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: example-sklearn-isvc2
annotations:
serving.kserve.io/deploymentMode: ModelMesh
serving.kserve.io/secretKey: localMinIO
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib
And once the new storage spec is in (kserve/kserve#1899):
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: example-sklearn-isvc
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
sklearn:
storage:
key: localMinIO
path: sklearn/mnist-svm.joblib
# schemaPath: null
parameters:
bucket: modelmesh-example-models
The run-fvt.yml
GitHub actions workflow will probably need to be updated to install the InferenceService CRD onto the minikube cluster before ModelMesh-Serving is installed.
Is your feature request related to a problem? If so, please describe.
Describe your proposed solution
I was wondering why Modelmesh doesn't support Triton Python backend? If there is no specific reason, can we add that to the roadmap?
Describe alternatives you have considered
Additional context
Describe the bug
To Reproduce
Steps to reproduce the behavior:
➜ modelmesh-serving git:(remove-trainedmodel) ✗ go version
go version go1.17.1 darwin/amd64
When using original v1.32.0 of golangci-lint
➜ modelmesh-serving git:(remove-trainedmodel) ✗ pre-commit run --all-files
golangci-lint............................................................Failed
- hook id: golangci-lint
- exit code: 2
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e pc=0x7fff20425c9e]
runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x7fff20425c9e
stack: frame={sp:0x7ffeefbff218, fp:0x7ffeefbff268} stack=[0x7ffeefb802b8,0x7ffeefbff320)
0x00007ffeefbff118: 0x01007ffeefbff138 0x0000000000000004
0x00007ffeefbff128: 0x000000000000001f 0x00007fff20425c9e
0x00007ffeefbff138: 0x0b01dfacedebac1e 0x0000000000000001
0x00007ffeefbff148: 0x0000000004038831 <runtime.throw+0x0000000000000071> 0x00007ffeefbff1e8
0x00007ffeefbff158: 0x00000000049f4939 0x00007ffeefbff1a0
0x00007ffeefbff168: 0x0000000004038ae8 <runtime.fatalthrow.func1+0x0000000000000048> 0x00000000051fbd40
0x00007ffeefbff178: 0x0000000000000001 0x0000000000000001
0x00007ffeefbff188: 0x00007ffeefbff1e8 0x0000000004038831 <runtime.throw+0x0000000000000071>
0x00007ffeefbff198: 0x00000000051fbd40 0x00007ffeefbff1d8
0x00007ffeefbff1a8: 0x0000000004038a70 <runtime.fatalthrow+0x0000000000000050> 0x00007ffeefbff1b8
0x00007ffeefbff1b8: 0x0000000004038aa0 <runtime.fatalthrow.func1+0x0000000000000000> 0x00000000051fbd40
0x00007ffeefbff1c8: 0x0000000004038831 <runtime.throw+0x0000000000000071> 0x00007ffeefbff1e8
......
Investigated it should be due to golangci-lint version does not match go version, so upgrade to v1.42.1
➜ modelmesh-serving git:(remove-trainedmodel) ✗ pre-commit run --all-files
[INFO] Initializing environment for https://github.com/golangci/golangci-lint.
[INFO] Installing environment for https://github.com/golangci/golangci-lint.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
golangci-lint............................................................Failed
- hook id: golangci-lint
- exit code: 1
apis/serving/v1alpha1/predictor_types.go:42:22: fieldalignment: struct with 24 pointer bytes could be 16 (govet)
type S3StorageSource struct {
^
apis/serving/v1alpha1/predictor_types.go:49:12: fieldalignment: struct with 56 pointer bytes could be 48 (govet)
type Model struct {
^
apis/serving/v1alpha1/predictor_types.go:144:18: fieldalignment: struct with 72 pointer bytes could be 64 (govet)
type FailureInfo struct {
^
apis/serving/v1alpha1/predictor_types.go:163:22: fieldalignment: struct with 88 pointer bytes could be 80 (govet)
type PredictorStatus struct {
^
apis/serving/v1alpha1/servingruntime_types.go:23:16: fieldalignment: struct with 24 pointer bytes could be 16 (govet)
type ModelType struct {
^
apis/serving/v1alpha1/servingruntime_types.go:30:16: fieldalignment: struct with 160 pointer bytes could be 144 (govet)
type Container struct {
^
apis/serving/v1alpha1/servingruntime_types.go:79:25: fieldalignment: struct with 136 pointer bytes could be 120 (govet)
type ServingRuntimeSpec struct {
^
apis/serving/v1alpha1/servingruntime_types.go:163:21: fieldalignment: struct of size 424 could be 416 (govet)
type ServingRuntime struct {
^
controllers/config.go:50:13: fieldalignment: struct of size 632 could be 608 (govet)
type Config struct {
^
controllers/config.go:78:23: fieldalignment: struct of size 32 could be 24 (govet)
type PrometheusConfig struct {
^
controllers/config.go:99:22: fieldalignment: struct with 152 pointer bytes could be 128 (govet)
type RESTProxyConfig struct {
^
controllers/config.go:121:21: fieldalignment: struct with 80 pointer bytes could be 72 (govet)
type ConfigProvider struct {
^
controllers/config.go:234:27: fieldalignment: struct with 72 pointer bytes could be 64 (govet)
type ResourceRequirements struct {
^
controllers/service_controller.go:68:24: fieldalignment: struct with 128 pointer bytes could be 120 (govet)
type ServiceReconciler struct {
^
controllers/servingruntime_controller.go:59:31: fieldalignment: struct with 136 pointer bytes could be 120 (govet)
type ServingRuntimeReconciler struct {
^
controllers/modelmesh/cluster_config.go:45:20: fieldalignment: struct with 32 pointer bytes could be 24 (govet)
type ClusterConfig struct {
^
controllers/modelmesh/modelmesh.go:38:17: fieldalignment: struct of size 408 could be 376 (govet)
type Deployment struct {
^
controllers/modelmesh/endpoint_test.go:22:13: fieldalignment: struct with 48 pointer bytes could be 40 (govet)
tests := []struct {
^
controllers/modelmesh/model_type_labels_test.go:73:18: fieldalignment: struct with 112 pointer bytes could be 104 (govet)
tableTests := []struct {
^
controllers/modelmesh/runtime_test.go:107:37: fieldalignment: struct with 24 pointer bytes could be 16 (govet)
var addStorageConfigVolumeTests = []struct {
^
controllers/modelmesh/util_test.go:30:15: fieldalignment: struct with 80 pointer bytes could be 72 (govet)
var tests = []struct {
^
fvt/fvtclient.go:70:16: fieldalignment: struct with 96 pointer bytes could be 80 (govet)
type FVTClient struct {
^
pkg/mmesh/etcdrangewatcher.go:73:15: fieldalignment: struct with 32 pointer bytes could be 24 (govet)
type KeyEvent struct {
^
pkg/mmesh/grpc_resolver.go:36:22: fieldalignment: struct with 56 pointer bytes could be 48 (govet)
type serviceResolver struct {
^
pkg/mmesh/grpc_resolver.go:85:19: fieldalignment: struct with 64 pointer bytes could be 48 (govet)
type KubeResolver struct {
^
pkg/mmesh/modelmesh_service.go:29:16: fieldalignment: struct of size 104 could be 88 (govet)
type MMService struct {
^
pkg/predictor_source/cached_predictor_source.go:45:27: fieldalignment: struct with 16 pointer bytes could be 8 (govet)
type PredictorStreamEvent struct {
^
pkg/predictor_source/cached_predictor_source.go:76:28: fieldalignment: struct of size 120 could be 112 (govet)
type cachedPredictorSource struct {
^
pkg/predictor_source/watchrefresh_predictor_source_test.go:34:18: fieldalignment: struct with 72 pointer bytes could be 40 (govet)
type testWatcher struct {
^
controllers/modelmesh/etcd.go:45:18: unusedwrite: unused write to field ReadOnly (govet)
volumeMount.ReadOnly = true
^
controllers/modelmesh/etcd.go:46:18: unusedwrite: unused write to field MountPath (govet)
volumeMount.MountPath = etcdMountPath
^
pkg/mmesh/etcdrangewatcher.go:140:17: unusedwrite: unused write to field found (govet)
current.found = true
^
Expected behavior
No error
MLServer supports Spark MLlib. We should verify this, and add it as a supported framework to the MLServer Serving Runtime. This should be accompanied by functional tests for the framework as well as documentation.
Currently, FVTs are able to run on a nightly basis using IBM Toolchains with support for that included in #7. However, we want to use this set of tests as a gate for incoming PRs, so they will need to be run on a per-PR basis.
Toolchains on IBM Cloud appears to have limitations regarding exposing logs and status to external users, so alternative avenues may need to be explored. I see that there is the ability to download log assets from toolchain pipeline runs (in zip format), so I am wondering if there is an API endpoint that can be used to download these logs so that we can rehost them somewhere externally accessible (perhaps on the prow server if prow can be used to invoke the pipeline run?).
Some documentation for Kubeflow infra is noted here: https://github.com/kubeflow/testing#test-infrastructure. They essentially use prow to submit argo workflows for building and running tests.
Another thing to potentially explore is using GitHub Actions self-hosted runners.
FYI @chinhuang007
This is a parent issue for this overall effort. The goal here is to figure out how we can merge and evolve the two different specs moving forward.
Sub issues:
Describe the goal or feature or two, usually in the form of a user story.
As a user, I want to modelmesh help me automatically and efficiently orchestrate models into available runtime servers, so that I no need to care about where model will be placed
The current ModelMesh Serving supports InferenceService with a predictor only. The SKerve transformer concept, providing pre and post processing for predict, should be provided in the case of ModelMesh as well.
The user would be able to apply the same transformer in both the single-model serving in KServer and multi-model serving in ModelMesh.
ModelMesh-Serving needs to be able to reconcile InferenceServices with the serving.kserve.io/deploymentMode: ModelMesh
annotation. If this annotation does not exist, the resource will be ignored (the KServe controller will handle reconciliation).
Related: kserve/kserve#54
For the first iteration, this will be similar to how a TrainedModel is reconciled in ModelMesh-Serving where TrainedModel fields are mapped internally to PredictorSpec fields as shown here.
The goal is for an InferneceService YAML like the following to be applied and the corresponding model deployed using ModelMesh:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: example-sklearn-mnist-svm
annotations:
serving.kserve.io/deploymentMode: ModelMesh
serving.kserve.io/secret-key: localMinio
Spec:
predictor:
sklearn:
storageUri: s3://sklearn/mnist-svm.joblib
Note that this would be the structure with the InferenceService CRD's current state. There is ongoing and upcoming work to introduce a modelType
field to the InferenceService and also work for storage/credential handling.
Many examples do not work
Hi,
Thanks for providing modelmesh-serving! I'm trying to use examples provided in ./config/example-predictors/ but I found only example-sklearn-mnist-svm can be accessed. Other examples, such as example-onnx-mnist, example-tensorflow-mnist, example-keras-mnist can't be used. Specifically, clients (both grpc and REST) return "predict() method not implemented" error.
To Reproduce
Steps to reproduce the behavior:
MODEL_NAME=example-tensorflow-mnist
curl -X POST -k http://localhost:8008/v2/models/${MODEL_NAME}/infer -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0]}]}'
{"code":2,"message":"inference.GRPCInferenceService/ModelInfer: UNKNOWN: Unexpected <class 'NotImplementedError'>: predict() method not implemented"}
Environment:
While trying to run through the quick start guide, the install script runs into an error
Steps to reproduce the behavior:
kubectl create namespace modelmesh-serving
./scripts/install.sh --namespace modelmesh-serving --quickstart
Installing ModelMesh Serving built-in runtimes
Error: unknown flag: --load-restrictor
error: no objects passed to apply
Expected behavior
I expect the installer in the quick start guide to run without errors with the provided instructions.
Output from installer run attached.
Environment:
Client Version: 4.5.0-202005291417-9933eb9
Server Version: 4.9.0-rc.5
Kubernetes Version: v1.22.0-rc.0+8719299
Additional context
Running on OpenShift 4.9
TorchServe will soon support the KServe v2 inference protocol and can support additional types of PyTorch models that Triton does not currently support.
Create a serving runtime for TorchServe with a corresponding ModelMesh adapter: kserve/modelmesh-runtime-adapter#4
The v2 support is still WIP (see kserve/kserve#1870 and pytorch/serve#1190), but I think this torchserve image: jagadeeshj/torchserve-kfsv2:1.0
can be used in the meantime until it's official.
Need to support TensorRT 8 model which run in Triton Inference Server 21.11-py3 (or above)
I tried to update Triton Inference Server image to nvcr.io/nvidia/tritonserver:21.09-py3 and got the below error.
It seem to have change in interface then could not connect to the server
Describe your proposed solution
Describe alternatives you have considered
Additional context
Is that right behavior that we need to rollback the kubectl config to its previous status:
Initial status
% kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://kubernetes.docker.internal:6443
name: docker-desktop
contexts:
- context:
cluster: docker-desktop
user: docker-desktop
name: docker-desktop
current-context: docker-desktop
kind: Config
preferences: {}
users:
- name: docker-desktop
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
After install modelmesh serving
, it is added namespace: modelmesh-serving
to current-context
:
% kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://kubernetes.docker.internal:6443
name: docker-desktop
contexts:
- context:
cluster: docker-desktop
namespace: modelmesh-serving <<<<<<<<<<<<< new added
user: docker-desktop
name: docker-desktop
current-context: docker-desktop
kind: Config
preferences: {}
users:
- name: docker-desktop
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
However, after deleting your ModelMesh Serving installation
:
% kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://kubernetes.docker.internal:6443
name: docker-desktop
contexts:
- context:
cluster: docker-desktop
namespace: modelmesh-serving <<<<<<<<< It is still here
user: docker-desktop
name: docker-desktop
current-context: docker-desktop
kind: Config
preferences: {}
users:
- name: docker-desktop
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
I have the example-tensorflow-mnist
predictor deploy and the triton-2.x
pods running to serve the tensorflow model.
I'm trying to get the model metadata using gRPC but I'm facing issues:
grpcurl -plaintext -proto fvt/proto/kfs_inference_v2.proto -d '{ "name": "'"${MODEL_NAME}"'"}' localhost:8033 inference.GRPCInferenceService.ModelMetadata
ERROR: Code: InvalidArgument Message: must include mm-model-id header
Even adding the mm-model-id
header I get this error:
ERROR: Code: Unimplemented Message: inference.GRPCInferenceService/ModelMetadata: UNIMPLEMENTED: Method not found or not permitted: inference.GRPCInferenceService/ModelMetadata
Thanks in advance
For dynamic loading/unloading of models, Triton defines a "Model Repository" API which is described as an extension to the KServe v2 dataplane API.
This includes both REST and gRPC variants of the following API endpoints:
POST v2/repository/index
POST v2/repository/models/${MODEL_NAME}/load
POST v2/repository/models/${MODEL_NAME}/unload
MLServer followed this and have implemented the same API but unfortunately their gRPC service definition uses different service and packages name:
inference.model_repository.ModelRepositoryService
inference.GRPCInferenceService
data-plane serviceModelMesh uses these in the built-in modelmesh support for Triton/MLServer to manage models in each Triton instance, but currently the logic is mostly specific to each because of the differing service names and different filesystem layout requirements. Note that only the load/unload methods are used, index isn't required.
It seems that this is an at least de facto standard KServe API for model management so it would make sense to support it as an option for other/custom model server implementations via our built-in adapter, as alternative to implementing the native model-mesh gRPC model runtime SPI.
First though we should decide on the official/standard package and service name to use for the gRPC service, and copy its specification into the KServe repo somewhere.
Currently only gRPC inference requests are supported with ModelMesh. However, there is a need for REST support in order to smoothen the user experience of interacting with ModelMesh models.
Currently, the idea is to have a transcoder container/service that will transcode the REST v2 protocol JSON into the gRPC format ModelMesh expects (and vice versa). The transcoding mappings will have to be explored, and performance will need to be kept in mind.
Describe the bug
Fail to build develop image on Mac. Error message:
Building dev image kserve/modelmesh-controller-develop:feb21e0272b82ed0
[+] Building 4.8s (8/8) FINISHED
=> [internal] load build definition from Dockerfile.develop 0.0s
=> => transferring dockerfile: 3.64kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for registry.access.redhat.com/ubi8/ubi-minimal:8.4 0.7s
=> [1/4] FROM registry.access.redhat.com/ubi8/ubi-minimal:8.4@sha256:54ef2173bba7384dc7609e8affbae1c36f8a3ec137cacc0866116d65dd4b9afe 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 116.07kB 0.0s
=> CACHED [2/4] WORKDIR /workspace 0.0s
=> CACHED [3/4] COPY .pre-commit-config.yaml go.mod go.sum ./ 0.0s
=> ERROR [4/4] RUN microdnf install diffutils gcc-c++ make wget tar vim git python38 nodejs && pip3 install pre-commit && set -eux; wge 3.9s
------
.....
#8 3.831
#8 3.833 error: Not enough free space in /var/cache/yum/metadata: needed 159.1 MB, available 0 bytes
#8 3.847 /bin/sh: wget: command not found
#8 3.848 /bin/sh: tar: command not found
#8 3.849 /bin/sh: go: command not found
To Reproduce
run make build.develop
Expected behavior
Successful build message on a Ubuntu looks like:
Successfully built e8efba498870
Successfully tagged kserve/modelmesh-controller-develop:feb21e0272b82ed0
Environment (please complete the following information):
Internally, the documentation has been expanded and overhauled. We should go through and sync the docs with this repository.
Related: kserve/website#17
modelmesh-serving-mlserver containers fail startup and are in CrashLoopBackOff state. Error message is:
Error: failed to start container "mlserver": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "500000": write /sys/fs/cgroup/cpu/kubepods/burstable/podaf518315-5ad0-451b-b437-410b8b93b050/mlserver/cpu.cfs_quota_us: invalid argument: unknown
To Reproduce
Steps to reproduce the behavior:
Followed quickstart guide here: https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md
Running on a mac 11.6
minikube start --memory 8192 --cpus 4 -p kserve-mm
Pods start crash loop with the error message
Error: failed to start container "mlserver": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "500000": write /sys/fs/cgroup/cpu/kubepods/burstable/podaf518315-5ad0-451b-b437-410b8b93b050/mlserver/cpu.cfs_quota_us: invalid argument: unknown
Expected behavior
Expect mlserver runtime pod to start and the model to be deployed, based on quickstart guide.
Environment (please complete the following information):
Thanks in advance for any insight on the issue
Describe the bug
{"level":"error","ts":1640666407.3959372,"logger":"controller.predictor","msg":"Reconciler error","reconciler group":"serving.kserve.io","reconciler kind":"Predictor","name":"example-sklearn-isvc","namespace":"isvc_modelmesh-serving","error":"failed to fetch CR from kubebuilder cache for predictor example-sklearn-isvc: No valid InferenceService predictor framework found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Screenshots
Environment (please complete the following information):
Additional context
We should determine how releases should be handled and document the process.
Some notes:
Currently KFServing has a RELEASE_PROCESS.md document (though it is a bit dated) that outlines what needs to be done during a release. Essentially, several image tags are updated in the config yamls to be pinned to the specific release (instead of latest
). Then a single yaml file is generated containing all the resources as seen here. There are also workflows for publishing images to Dockerhub for the tagged version when a tag in GitHub is created.
For KFServing, all the generated install yaml files for each version can be seen in the install
directory: https://github.com/kubeflow/kfserving/tree/master/install and also as assets under the actual Release on GitHub (e.g. https://github.com/kubeflow/kfserving/releases). I think it's probably fine to forgo the install
directory paradigm, and just rely on publishing needed assets under each GitHub release. This will keep things cleaner. For this, we might add what is published internally (install script and tar.gz of the config folder).
We can also adjust the install script to accept a release version as an argument, and the script will pull the config files from the Github release and install as normal. However, the install script already has a --install-config-path
argument that can be leveraged.
Should modelmesh
and modelmesh-runtime-adapter
release cadence and versioning be tied to modelmesh-serving
?
Tentative work items:
Would be good to analyze the performance and perhaps compare with the other current model serving methods available.
Some ideas:
Other ideas and tooling for benchmarking are welcome.
The Triton Inference Server supports TensorRT models and our the Triton Serving Runtime indicates this.
We should include some documentation, examples, and tests to demonstrate this. It probably needs to be investigated if TensorRT will work on Triton without a gpu.
Does kserve modelmesh support pipelining models in consecutive steps in the form of inference graph? Some thing like similar technologies like Seldon Inference graph, Ray Deployment Graph?
The current ModelMesh Serving is namespace scoped, meaning all of its components must exist within a single namespace and only one instance of ModelMesh Serving can be installed per namespace. The idea is to make ModelMesh Serving cluster scoped so that one set of controller components can serve multiple namespaces.
The limitation of having a controller and a set of Serving Runtimes in each namespace means sharing is not possible between namespaces and would cause unnecessarily high resource consumption when ModelMesh Serving is installed in different namespaces.
At a high level, we would like to change Serving Runtimes to be cluster scoped and make the controller to manage resources across namespaces. Here is the doc with more details, including a couple of possible solutions. The doc will be updated whenever decisions are made and implementations are identified.
Describe the bug
The BSD sed on OSX fail with this command sed: 1: "minio-storage-secret.yaml": invalid command code m
These file has the sed -i
./script/delete.sh
./script/install.sh
./script/deploy/iks
To Reproduce
modelmesh_release="v0.8.0-rc0"
wget https://raw.githubusercontent.com/kserve/modelmesh-serving/${modelmesh_release}/config/dependencies/minio-storage-secret.yaml
sed -i "s/controller_namespace/controller_namespace/g" minio-storage-secret.yaml
While running through the quick start guide, I get an error when trying to run the gRPC example under the "perform an inference test" section.
Running the command as written in the guide gives an error:
MODEL_NAME=example-mnist-predictor
grpcurl \
-plaintext \
-proto fvt/proto/kfs_inference_v2.proto \
-d '{ "model_name": "'"${MODEL_NAME}"'", "inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "contents": { "fp32_contents": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0] }}]}' \
localhost:8033 \
inference.GRPCInferenceService.ModelInfer
To Reproduce
Steps to reproduce the behavior:
Failed to process proto source files.: could not parse given files: open fvt/proto/kfs_inference_v2.proto: no such file or directory
Expected behavior
I expect the inference to be returned as documented.
Client Version: 4.5.0-202005291417-9933eb9
Server Version: 4.9.0-rc.5
Kubernetes Version: v1.22.0-rc.0+8719299
When I run the REST example, I do get the expected results for the inference.
Is it possible to leverage Triton client (https://github.com/triton-inference-server/client) features like the on-wire compression of request/response on HTTP using the current /infer
endpoint? (https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#compression)
If not, will be implemented on future ModelMesh releases?
Is your feature request related to a problem? If so, please describe.
Hi~ I'm new here and interested in this project.
I found that the system depends on an etcd cluster outside of kubernetes. I think it will bring extra cost to maintain the etcd.
So I was wondering is there any plan to remove the dependency on an extra etcd cluster ?
Describe your proposed solution
Maybe we can use another CRD which could describe a deployed model (e.g. PredictorInstance
or Model
?). Thus the ModelMesh
sidecar could watch/update the CRD and send request to ServiceRuntime
. (The mechanism maybe similar to pod in kubernetes)
Describe alternatives you have considered
Additional context
Currently the CRDs for ServingRuntimes are generated in two places: the kserve/kserve
repo and the kserve/modelmesh-serving
repo. Although it'd be preferred to have a single source, for now, let's ensure that the specs are synchronized. This way a user won't run into issues when using ServingRuntimes in the different contexts.
List of changes;
Container
struct.
ModelMesh-Serving
multiModel
field to SR (#89)
autoPlace
field to ModelType/Framework (kserve/kserve#1948)
Types for these resources are currently defined in two places: the kserve
repository and the modelmesh-serving
repository. Let's investigate importing these types from KServe and using those within modelmesh-serving as a step towards unification.
Since ModelMesh supports deploying Keras models as noted here, we should ensure that our minio images have a sample Keras model that users can try.
Both the FVT and quickstart image should probably be updated:
kserve/modelmesh-minio-dev-examples
kserve/modelmesh-minio-examples
GitHub Action workflows should be more selective on when they run. For example, if just the README or docs are updated, we shouldn't have to run unit tests, although maybe linting is still needed for those. Path filters can be utilized for these scenarios.
Describe the bug
The install.sh failed with this message on OSX 12.x but not in OSX 11.x. It seems that sleep "10s"
syntax no long works after the latest OSX update.
Pods found with selector '--field-selector metadata.name=etcd' are not ready yet. Waiting 10 secs...
usage: sleep seconds
To Reproduce
Following the Quick Start guide to run a ./scripts/install.sh --namespace modelmesh-serving --quickstart
command.
Expected behavior
Installing ModelMesh Serving built-in runtimes
Environment (please complete the following information):
OSX 12.1
Describe the bug
./scripts/install.sh: line 280: ctrl_ns: unbound variable
To Reproduce
kubectl create ns modelmesh-serving
kubectl create ns tedchang1
kubectl create ns tedchang2
./scripts/install.sh --namespace modelmesh-serving --quickstart -u "tedchang1 tedchang2"
Expected behavior
namespace/tedchang1 labeled
servingruntime.serving.kserve.io/mlserver-0.x created
servingruntime.serving.kserve.io/triton-2.x created
secret/storage-config created
namespace/tedchang2 labeled
servingruntime.serving.kserve.io/mlserver-0.x created
servingruntime.serving.kserve.io/triton-2.x created
secret/storage-config created
Successfully installed ModelMesh Serving!
/kind bug
What steps did you take and what happened:
I follow the InferenceService Deployment with ModelMesh: example-sklearn-isvc from here
Here is what I have already get, and it has been tested successfully by using the port-forward method as the example instruction.
However, I am considering to achieve serverless function by curl from Ingress gateway with HOST Header.
I have tried different curl commands, but none of them work.
echo ${SERVICE_HOSTNAME} modelmesh-serving.modelmesh-serving:8008
Here is what I get by using kubectl get inferenceservice
example-sklearn-isvc grpc://modelmesh-serving.modelmesh-serving:8033 True 2d9h
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/example-sklearn-isvc:predict -d @./isvc-input.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/example-sklearn-isvc/infer -d @./isvc-input.json
They all show the 404 Not found in the following:
`Trying 192.168.49.2:30413...
TCP_NODELAY set
Connected to 192.168.49.2 (192.168.49.2) port 30413 (#0)
POST /v1/models/example-sklearn-isvc/infer HTTP/1.1
Host: modelmesh-serving.modelmesh-serving
User-Agent: curl/7.68.0
Content-Length: 431
Content-Type: application/x-www-form-urlencoded
upload completely sent off: 431 out of 431 bytes
Mark bundle as not supporting multiuse
HTTP/1.1 404 Not Found
date: Thu, 02 Jun 2022 20:53:01 GMT
server: istio-envoy
connection: close
content-length: 0
Closing connection 0`
Environment:
kubectl version
): v1.24.1Could you figure out where the problem is? Thanks
With the TrainedModel CRD being deprecated in favor of using the InferenceService CRD for both single model and multi model deployments, we can go ahead and remove support for TrainedModel reconciliation. Our current support is still using the old serving.kubeflow.org/v1alpha1
APIGroupVersion anyway.
Describe the bug
To Reproduce
Steps to reproduce the behavior:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: mnist
annotations:
serving.kserve.io/deploymentMode: ModelMesh
serving.kserve.io/secretKey: localMinIO
spec:
predictor:
sklearn:
storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib
2021-12-29T12:56:32.918Z DEBUG MLServer Adapter.MLServer Adapter Server Listing objects from s3 {"bucket": "modelmesh-example-models", "prefix": "/sklearn/mnist-svm.joblib"}
2021-12-29T12:56:32.960Z DEBUG MLServer Adapter.MLServer Adapter Server Ignore downloading s3 object matching part of the path {"bucket": "modelmesh-example-models", "prefix": "/sklearn/mnist-svm.joblib", "s3_path": "sklearn/mnist-svm.joblib"}
2021-12-29T12:56:32.960Z ERROR MLServer Adapter.MLServer Adapter Server.Load Model Failed to pull model from storage {"model_id": "mnist__isvc-04bd33724e", "error": "rpc error: code = Unknown desc = Failed to pull model from storage due to error: no objects found for path '/sklearn/mnist-svm.joblib'"}
github.com/kserve/modelmesh-runtime-adapter/internal/proto/mmesh._ModelRuntime_LoadModel_Handler
/opt/app/internal/proto/mmesh/model-runtime_grpc.pb.go:175
google.golang.org/grpc.(*Server).processUnaryRPC
/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1217
google.golang.org/grpc.(*Server).handleStream
/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1540
google.golang.org/grpc.(*Server).serveStreams.func1.2
/root/go/pkg/mod/google.golang.org/[email protected]/server.go:878
Expected behavior
The model exists on Minio and show load successfully, the issue seems to be the modelPath
which was set to sklearn/mnist-svm.joblib
in v0.7.0 while from master branch it sets to /sklearn/mnist-svm.joblib
which caused the above error.
Screenshots
Environment (please complete the following information):
Additional context
Users may want to find the input tensor name and/or shape of a particular model. The v2 predict protocol outlines a ModelMetadata API. We should have ModelMesh support this as I believe Triton and MLServer already expose this endpoint.
Related issue and discussion: #104
I was trying to run the example-tensorflow-mnist
on GPU.
To achieve this, I edited the ServingRuntime
object adding nvidia.com/gpu: 1
to the spec.containers.resources.limits
and spec.containers.resources.requests
sections of the tritonserver
container and the needed tolerations
.
Once done that, I noticed that the example-tensorflow-mnist
predictor always failed to be loaded.
This is the error I'm getting from the Predictor
:
UNAVAILABLE: Failed to load Model due to Triton runtime error: rpc error: code = Unavailable desc = error reading from server: EOF
Before setting the nvidia.com/gpu: 1
I was able to run an inference on example-tensorflow-mnist
.
My goal is to load with success the model and then run an inference using the GPU.
Thanks in advance.
We are currently using mlserver 0.3.2, however the latest is 0.5.2. This should be updated.
With KServe now using ServingRuntimes, we need to make sure that the ModelMesh-Serving runtime selection logic only selects runtimes that are ModelMesh compatible. While currently, KServe leverages the existence of the field GrpcMultiModelManagementEndpoint
to determine this, let's try to make this more explicit with a dedicated boolean field in the SR spec.
Something like modelMeshCompatible
, isMMS
, supportsModelMesh
? Open to suggestions.
We will also have to make sure the SR controller only creates deployments for accessible runtimes with this field se to true.
Currently, the way a user defines a storage location of a model varies between the MM Predictor
and KFS InferenceService
.
With the InferenceService, a user specifies a storageUri
(ex: s3://kfserving-examples/tf-models/mnist
).
With the Predictor, a user can specify a storage
object, but the path
would be it's own key/value. Example:
path: tf-models/mnist
storage:
s3:
secretKey: my_storage
bucket: kfserving-examples
In some cases, the storage object can be omitted in the case of custom runtimes managing their own storage.
Would be good to examine the advantages and disadvantages of these approaches and see how we may consolidate.
KServe supports cluster-scoped ServingRuntimes called ClusterServingRuntimes
. These act as the built-in or default serving runtimes accessible to any user/namespace in the cluster. Currently ModelMesh-Serving only considers the the namespace-scoped ServingRunimes
. Let's think about how ModelMesh-Serving can handle these cluster-level resources.
The plan is to cut the KServe 0.7 release mid next week. For this release, ModelMesh will be loosely integrated with KServe.
Action Items:
hack/quick_install.sh
to include ModelMesh-Serving as part of installation.
v0.7.0
to follow suit with KServe.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.