kubernetes-sigs / kube-scheduler-simulator Goto Github PK
View Code? Open in Web Editor NEWThe simulator for the Kubernetes scheduler
License: Apache License 2.0
The simulator for the Kubernetes scheduler
License: Apache License 2.0
Proof of Concept
https://github.com/sanposhiho/kube-scheduler-simulator-cli
If we build the scheduler from submodule, we can change version of scheduler or even can change implementation with editing the submodule implementations.
This will be super helpful, users cannot change the version of scheduler on the simulator and we can debug the scheduler by changing the implementations. (I personally use the above repo to debug the scheduler.)
related: #8
/kind feature
/assign
There are 2 problems with import
function of ExportService
.
PriorityClass
conflicts.When the scheduler is restarted, 2 PriorityClass
that issystem-cluster-critical
and system-node-critical
will be created automatically.
The import
function of ExportService
calls that restart logic, at first. This means that the 2 PriorityClass
are recreated before import logic.
And exported resources file(export.yml
) from the export
function also includes the 2 PriorityClass
.
Therefore, the names of these PriorityClass
are in conflict when importing the resources file(export.yml
).
E0209 22:41:50.712897 74340 export.go:57] failed to import all resources: import resources all:
github.com/kubernetes-sigs/kube-scheduler-simulator/export.(*Service).Import
/Users/username/kube-scheduler-simulator/export/export.go:225
- apply resources:
github.com/kubernetes-sigs/kube-scheduler-simulator/export.(*Service).apply
/Users/username/kube-scheduler-simulator/export/export.go:196
- apply PriorityClass:
github.com/kubernetes-sigs/kube-scheduler-simulator/export.(*Service).applyPcs.func1
/Users/username/kube-scheduler-simulator/export/export.go:369
- apply priorityClass:
github.com/kubernetes-sigs/kube-scheduler-simulator/priorityclass.(*Service).Apply
/Users/username/kube-scheduler-simulator/priorityclass/priorityclass.go:47
- Operation cannot be fulfilled on priorityclasses.scheduling.k8s.io "system-cluster-critical": the object has been modified; please apply your changes to the latest version and try again
{"time":"2022-02-09T22:41:50.713011+09:00","id":"","remote_ip":"127.0.0.1","host":"localhost:1212","method":"POST","uri":"/api/v1/import","user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:96.0) Gecko/20100101 Firefox/96.0","status":500,"error":"code=500, message=Internal Server Error","latency":147452615,"latency_human":"147.452615ms","bytes_in":6097,"bytes_out":36}
priorityclass
The import
function tries to create a priority class that names with 'system-' prefix when an imported resources file includes PriorityClass
like a system-cluster-critical
or system-node-critical
.
But that gets this permission's error.
E0209 22:58:57.957848 74651 priorityclass.go:36] failed to apply priorityClass: apply priorityClass:
github.com/kubernetes-sigs/kube-scheduler-simulator/priorityclass.(*Service).Apply
/Users/username/kube-scheduler-simulator/priorityclass/priorityclass.go:47
- PriorityClass.scheduling.k8s.io "system-priority-class1" is invalid: metadata.name: Forbidden: priority class names with 'system-' prefix are reserved for system use only. error: system-priority-class1 is not a known system priority class
Would you give me some good ideas, please?
/kind bug
/assign
Hi, I am new to this project (from scheduling-plugin sig). I learned about the kube-scheduler-simulator. In my option, this simulator only simulate pods scheduling process in a new simulation cluster, which means it can't simulate a real cluster for now(e.g production environment) . Is it possible to simulate pods in an already exist environment?
I am interesting in contributing to this project, maybe I need to learn more details about it.
I tried running make docker_build_and_up but it's not running for me, can anyone help? (Freshly installed docker and docker-compose on fedora, docker-compose is actually docker compose
version 2 using an alias to avoid writing the space)
[khalilswdp@fedora kube-scheduler-simulator]$ make docker_build_and_up
docker build -t simulator-server .
Sending build context to Docker daemon 1.142GB
Step 1/15 : FROM golang:1.16.6 AS build-env
---> 028d102f774a
Step 2/15 : ENV GOOS=linux
---> Using cache
---> f2764c87cb50
Step 3/15 : ENV GOARCH=amd64
---> Using cache
---> 50f9fe591483
Step 4/15 : ENV CGO_ENABLED=0
---> Using cache
---> ced709072cc7
Step 5/15 : ENV GO111MODULE=on
---> Using cache
---> 112b70d3df9b
Step 6/15 : WORKDIR /go/src/simulator-server
---> Using cache
---> 350704a196c6
Step 7/15 : COPY go.mod go.sum ./
---> Using cache
---> 5ec45d29fc4d
Step 8/15 : RUN go mod download
---> Using cache
---> 65f93d650a85
Step 9/15 : COPY . .
---> Using cache
---> 23a247e03954
Step 10/15 : RUN go build -v -o ./bin/simulator simulator.go
---> Using cache
---> cd9ff6f85720
Step 11/15 : FROM alpine:3.14.0
---> d4ff818577bc
Step 12/15 : COPY --from=build-env /go/src/simulator-server/bin/simulator /simulator
---> Using cache
---> 5fbd42920407
Step 13/15 : RUN chmod a+x /simulator
---> Using cache
---> a893ed73f01d
Step 14/15 : EXPOSE 1212
---> Using cache
---> e6592d8d48f8
Step 15/15 : CMD ["/simulator"]
---> Using cache
---> 2f604362e856
Successfully built 2f604362e856
Successfully tagged simulator-server:latest
docker build -t simulator-frontend ./web/
Sending build context to Docker daemon 1.055MB
Step 1/22 : FROM node:16-alpine AS deps
---> 710c8aa630d5
Step 2/22 : RUN apk update && apk upgrade && apk add --no-cache make gcc g++ py-pip
---> Using cache
---> 8e241d3a401e
Step 3/22 : WORKDIR /app
---> Using cache
---> 315dcffcc578
Step 4/22 : COPY package.json yarn.lock ./
---> Using cache
---> 02eace2453ad
Step 5/22 : RUN yarn install --frozen-lockfile
---> Using cache
---> 1d0637341af6
Step 6/22 : FROM node:16-alpine AS builder
---> 710c8aa630d5
Step 7/22 : WORKDIR /app
---> Using cache
---> 4995f74aa96d
Step 8/22 : COPY . .
---> Using cache
---> 3bcc20c5a484
Step 9/22 : COPY --from=deps /app/node_modules ./node_modules
---> Using cache
---> 4a06368e526d
Step 10/22 : RUN yarn build && yarn install --production --ignore-scripts --prefer-offline
---> Using cache
---> c3f6e5fa57f0
Step 11/22 : FROM node:16-alpine AS runner
---> 710c8aa630d5
Step 12/22 : WORKDIR /app
---> Using cache
---> 4995f74aa96d
Step 13/22 : ENV NODE_ENV production
---> Using cache
---> 6dc1430cde13
Step 14/22 : RUN addgroup -g 1001 -S nodejs
---> Using cache
---> e2ad6db4eb4a
Step 15/22 : RUN adduser -S nuxtjs -u 1001
---> Using cache
---> 27d05d75ccd5
Step 16/22 : COPY --from=builder ./app/package.json ./
---> Using cache
---> 12cf0a92a526
Step 17/22 : COPY --from=builder ./app/node_modules ./node_modules/
---> Using cache
---> e94bc71438c5
Step 18/22 : COPY --from=builder ./app/.nuxt ./.nuxt/
---> Using cache
---> 130d1f166ecd
Step 19/22 : COPY --from=builder ./app/static ./static/
---> Using cache
---> 0e8cc00e98c8
Step 20/22 : USER nuxtjs
---> Using cache
---> 9c8abd259fbe
Step 21/22 : EXPOSE 3000
---> Using cache
---> fe42c70a027b
Step 22/22 : CMD ["yarn", "start"]
---> Using cache
---> 2d93946f59f3
Successfully built 2d93946f59f3
Successfully tagged simulator-frontend:latest
docker-compose up -d
make: docker-compose: No such file or directory
make: *** [Makefile:48: docker_up] Error 127
but docker-compose up -d
works just fine!
/kind documentation
/assign
We want to create a guide for how to release scheduler simulator.
the guide on scheduler-plugins: https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/doc/release-guide.md
The current Web UI is mostly created by me and I don't have any knowledge about accessibility. 😓
So, it has non-good accessibility and should be improved.
This is a very vague issue. If you have any ideas for improvement, please comment.
/kind feature
An error occurred when starting up the Kubernetes scheduler simulator using Docker.
make docker_build_and_up
executor failed running [/bin/sh -c yarn build && yarn install --production --ignore-scripts --prefer-offline]: exit code: 1
make: *** [docker_build_front] Error 1
/kind bug
/cc @adtac
/cc @Huang-Wei
/cc @alculquicondor
/assign
Hello team.
I'd like to release v0.1.0 after all bug-fix PRs are merged. So, PTAL when you are free. 🙇♂️
And, I also want to check if the image build by cloudbuild will work.
So, we also have to wait for these PRs.
I saw the release guide for scheduler-plugins, and will follow the guide to release.
https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/doc/release-guide.md
And then, I'll create a release guide for the simulator. #19
We cannot run github actions on PRs from first contributor.
With prow job, it seems that we can trust PR and run actions with /ok-to-test
.
https://github.com/kubernetes/test-infra/blob/master/prow/jobs.md
/kind feature
Users can change the scheduler configuration through Web UI.
But it would be nice if we can pass an initial scheduler configuration when we start the simulator.
/kind feature
failed to restart scheduler if config specify .profiles.pluginConfig.
This is critical bug because, initially, scheduler simulator shows default value like below and users have to delete .profiles.pluginConfig
from webUI manually to restart scheduler successfully.
restart scheduler with specifying .profiles.pluginConfig
E0830 22:49:48.480481 32087 schedulerconfig.go:36] failed to restart scheduler: start scheduler:
github.com/kubernetes-sigs/kube-scheduler-simulator/scheduler.(*Service).RestartScheduler
/Users/kenseinakada/workspace/kube-scheduler-simulator/scheduler/scheduler.go:42
- create scheduler:
github.com/kubernetes-sigs/kube-scheduler-simulator/scheduler.(*Service).StartScheduler
/Users/kenseinakada/workspace/kube-scheduler-simulator/scheduler/scheduler.go:90
- couldn't create scheduler: initializing profiles: creating profile for scheduler name default-scheduler: initializing plugin "NodeAffinityForSimulator": create original plugin: args are not of type NodeAffinityArgs, got *runtime.Unknown
/kind bug
/assign
It would be nice if we can export the resources at that time and can import and restore the resources.
Currently, it is a little hard for users to import this simulator as library and use some features of this simulator.
That's both a problem of internal structure of simulator and a problem of lack of documentation.
/kind feature
/assign
Add reset button
need to change both frontend and backend.
/kind feature
If we can't delete this system's PriorityClass
, we should disable this red delete button?
(If not, close this please..)
E0209 23:40:17.582829 74651 priorityclass.go:81] failed to delete priorityClass: delete priorityClass:
github.com/kubernetes-sigs/kube-scheduler-simulator/priorityclass.(*Service).Delete
/Users/username/kube-scheduler-simulator/priorityclass/priorityclass.go:56
- priorityclasses.scheduling.k8s.io "system-node-critical" is forbidden: this is a system priority class and cannot be deleted
{"time":"2022-02-09T23:40:17.582926+09:00","id":"","remote_ip":"127.0.0.1","host":"localhost:1212","method":"DELETE","uri":"/api/v1/priorityclasses/system-node-critical","user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:96.0) Gecko/20100101 Firefox/96.0","status":500,"error":"code=500, message=Internal Server Error","latency":5640497,"latency_human":"5.640497ms","bytes_in":0,"bytes_out":36}
/kind feature
Let's support Namespace.
I think it requires a big Web UI design change, because all resources are now created on the same namespace. We need to change Web UI to make it easier to understand which namespace each resource resides in.
let's do lint check(for frontend/backend) on GitHub Action
/assign
Hello team.
We, kube-scheduler-simulator team, are now facing a problem of shortage of contributors. And we want more people to join our development.
If you have any questions about participating in development, please post them here.
Also, please let us know if there is any documentation that lacks information so that we can improve it.
The following information is for your participation in the development.
For backend API
For web frontend
This front-end knowledge is optional because the main logic is on the backend.
We have a brief doc to explain how this simulator works.
And a small contribution guide.
open pod's detail → switch to edit page → switch to non-edit page → scheduling result tables disappear from the page.
/kind bug
On #64 , I made/found some small mistakes.
This link is dead on the markdown view of GitHub.
And the class name was also changed to ResourcesForImport
kube-scheduler-simulator/docs/api.md
Line 505 in 86527a2
kube-scheduler-simulator/docs/api.md
Line 524 in 86527a2
This doc was reviewed and I should have rewritten it like this.
These docs are examples of requests and responses of APIs.
This issue is caused by the nuxt's default host configuration. (default: localhost)
We have to add host configuration on nuxt.config.ts or environment variables. (change it to 0.0.0.0)
https://nuxtjs.org/docs/configuration-glossary/configuration-server
/kind bug
/assign
update all resources once every 5 seconds.
Users will be able to see all resources without reloading the page even if resources are asynchronously updated.
/kind feature
It would be nice if we could see the time used in scheduling for each plugin.
simulatorPlugin
and send used time to resultStore
.plugin_execution_duration_seconds
to see each plugins time./assign
/kind feature
this simulator restart when user request to change the scheduler configuration.
Currently, when fail to restart with custom setting, the scheduler is dead until the user adapts the correct settings.
restore the original settings and reboot scheduler when fail to restart scheduler with user-custom settings
While trying to run this application via docker-compose
I noticed that the frontend
is not available via http://localhost:3000.
I did some evaluation and noticed that there is a nuxt
telemetry user-input required when starting the frontend
application.
> podman exec -it simulator-frontend
/app $ netstat -an -t
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
/app $ yarn start
yarn run v1.22.15
$ nuxt start
ℹ NuxtJS collects completely anonymous data about usage.
This will help us improve Nuxt developer experience over time.
Read more on https://git.io/nuxt-telemetry
? Are you interested in participating? (Y/n)
After some digging I found the following input on that:
https://github.com/nuxt/telemetry#opting-out
When adding the described env variable and rebuilding the frontend
image, the application is starting without problems and directly available:
> podman exec -it simulator-frontend
/app $ netstat -an -t
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN
I will create a PR for that change
We configure the simulator through environment variables.
But, we don't have any documentation for that.
/kind documentation
Delete
in node service deletes Pods on that Node.
Currently for this:
podService.List
https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/master/node/node.go#L71
But, we can delete all Pods on one Node by calling DeleteCollection method on Node service once with FieldSelector in metav1.ListOptions.
https://kubernetes.io/docs/concepts/overview/working-with-objects/field-selectors/
like: spec.nodename={deletingnodename}
/good-first-issue
/kind cleanup
/kind feature
/cc @adtac
/cc @Huang-Wei
/cc @alculquicondor
/assign
Hi team.
add images for backend/frontend to k8s.gcr.io
Now, when users want to use the simulator, they have to clone this repo, build it and run it(on docker or local).
This is a little inconvenient because it will take them a lot of time to build frontend/backend (maybe because of some huge dependencies).
So it is great if we can provide images for backend/frontend so that users can only have to run these images. wdyt?
I cannot find anything about these in documentations.
let's do unit test(for backend) on GitHub Action
/assign
Currently, unit tests for API service layer and transport layer (handler) are not enough. This is because most packages of these layers are just a wrapper for client-go for now, with less complex logic.
It is useful to have an e2e test to make sure that simulator API is working as expected. The goal of this issue is to be able to run e2e tests on github actions prow job.
/kind feature
Related #26
I implemented receiver method to delete all resources of each object.
But I'm facing a problem where to run these.
I think that scheduler isn't responsible to delete all resources of each objects.
So, I suggest 2 approach.
Do you have a good solution ?
We will expand simulator plugin not only for simulator, but for other purposes. So, rename it to appropriate name.
Hello, I am very interested in this simulator.
However, when I do make docker_build_and_up , both the server and front containers work well, but I am unable to connect to the front (localhost:3000).
When I check the log, nothing shows up.
So temporarily, I build the front manually and run it through the yarn dev
command.
I've tested it on both Ubuntu 18.04 and MacBook, but both show the same symptoms.
thanks.
SSIA.
We have to enable Priority
admission plugin to handle PriorityClass properly.
https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#priority
/assign
In near future, we (maybe) merge PR #111.
In that PR, we will be able to import a variety of resources from external.
Therefore, we should change the logic so that we can import from different namespace
.
kube-scheduler-simulator/pod/pod.go
Line 27 in ba06325
We can check it after we merged #111. (Maybe)
# pod_namespace_test.yml
apiVersion: v1
kind: Pod
metadata:
name: mypod
namespace: test
labels:
name: mypod
spec:
containers:
- name: mypod
image: nginx
Make a pod.
kubectl create namespace test
vim pod_namespace_test.yml
kubectl apply -f pod_namespace_test.yml
kubectl get pods --namespace=test
Start simulator
export EXTERNAL_IMPORT_ENABLED=1
make start
when I start server: make start
output is all like this:
W1206 15:06:41.585828 44196 warnings.go:70] flowcontrol.apiserver.k8s.io/v1beta1 FlowSchema is deprecated in v1.23+, unavailable in v1.26+
W1206 15:06:41.587208 44196 warnings.go:70] flowcontrol.apiserver.k8s.io/v1beta1 PriorityLevelConfiguration is deprecated in v1.23+, unavailable in v1.26+
Then I create node through the web client,it have no display my node。
how I solve the problem?
ssia... 🤦♂️
/kind bug
/assign
Now, we have two mode on resouces detail page -- edit page and non-edit page.
But, these two pages show the almost same thing. I'd like to merge them into one page.
/kind cleanup
It is very difficult to recreate a cluster in production env, using the web.
To make it easier, I want to use kubectl apply
command.
Proof of Concept
https://github.com/sanposhiho/kube-scheduler-simulator-cli
With this, we can also simulate the scheduler without web UI.
see what I'd like to add on README
How and where to have the scenario written needs to be considered. In kube-scheduler-simulator-cli, we can write the scenario in sched.go, but I don't think it is the best place. need to be discussed.
/kind feature
/assign
To use this simulator as a library, we needed to fetch scheduling results from pod annotations.
In annotation, the results are stored as json, so the functions should unmarshal the json in the pod annotation and return the result as some useful type.
/kind feature
https://github.com/kubernetes-sigs/scheduler-plugins
This idea should be discussed if it is really needed or not.
If we want to support it, we also need to support CRD (like podgroup). Supporting each CRDs can be a big task. Therefore, we need to think of some better way to do that, like handle CRDs together... etc (I have no good vision for it now 😓 )
But, #35 will make it easy to use custom-scheduler. So, even if they are not supported here, users will be able to use that plugins with #35 feature. In that case, since CRDs are not supported in the WebUI, users need to create CRD from another client. #55 will enable other clients (like kubectl, client-go... etc) to communicate with api-server.
To sum up my opinion, if there are many users who want to use plugins in sigs/scheduler-plugins, it is better to support them and support CRD in webui as well. But if not, we don't need to support them.
And there is a possibility that "scheduler-plugins will not be supported but a handling CRDs feature will be.".
Currently, when the simulator API is requested to create a resource, it requests the change to kube-apiserver via client-go.
So, our backend API can be considered as a kind of BFF (backend for frontend), acting as an intermediary between kube-apiserver and the frontend.
However, for the API to create/read/update/delete resources, the simulator API is just passing the request to the kube-apiserver without doing any additional processing. In other words, simple requests such as creating, deleting, and editing resources can be thrown directly to kube-apiserver by the frontend without going through the backend.
This change will make it easier to extend in the future. (like supporting new resource type)
/kind feature
Now we don't support PriorityClass.
If you don't know about PriorityClass and preemption, see this doc: https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/
We cannot simulate preemption without PriorityClass. So I'd like to add the feature to create/edit/delete PriorityClass.
The implementation may take some time because development of this feature requires changes both frontend and backend. Seeing implementation of other resources will help.
Please let me know if you have any questions.
/kind feature
This issue is due to webpack/webpack#14532
#15 5.142 Error: error:0308010C:digital envelope routines::unsupported
#15 5.142 at new Hash (node:internal/crypto/hash:67:19)
#15 5.142 at Object.createHash (node:crypto:130:10)
#15 5.142 at module.exports (/app/node_modules/webpack/lib/util/createHash.js:135:53)
#15 5.142 at NormalModule._initBuildHash (/app/node_modules/webpack/lib/NormalModule.js:417:16)
#15 5.142 at handleParseError (/app/node_modules/webpack/lib/NormalModule.js:471:10)
#15 5.142 at /app/node_modules/webpack/lib/NormalModule.js:503:5
#15 5.142 at /app/node_modules/webpack/lib/NormalModule.js:358:12
#15 5.142 at /app/node_modules/loader-runner/lib/LoaderRunner.js:373:3
#15 5.142 at iterateNormalLoaders (/app/node_modules/loader-runner/lib/LoaderRunner.js:214:10)
#15 5.142 at Array.<anonymous> (/app/node_modules/loader-runner/lib/LoaderRunner.js:205:4)
#15 5.142 at Storage.finished (/app/node_modules/enhanced-resolve/lib/CachedInputFileSystem.js:55:16)
#15 5.142 at /app/node_modules/enhanced-resolve/lib/CachedInputFileSystem.js:91:9
#15 5.142 at /app/node_modules/graceful-fs/graceful-fs.js:123:16
#15 5.142 at FSReqCallback.readFileAfterClose [as oncomplete] (node:internal/fs/read_file_context:68:3) {
#15 5.142 opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
#15 5.142 library: 'digital envelope routines',
#15 5.142 reason: 'unsupported',
#15 5.142 code: 'ERR_OSSL_EVP_UNSUPPORTED'
#15 5.142 }
I have confirmed that the solution described in the issue will solve the problem.
Currently, the simulator only supports default plugins.
But some users may want to try their custom plugins on the simulator.
/assign
We only support v1beta2 now. The latest version of that is v1beta3, so users may confuse.
So, let's add the version of KubeSchedulerConfiguration in the WebUI.
We plan to support v1beta3 in this issue.
#46
/kind feature
/good-first-issue
In #41, we will import resources from existing cluster.
Current web UI is hard to see many resources and should be improved.
It would be nice if there was a different mode that would make it easier to see as many resources as possible at once.
It will probably be simpler than the current UI and will be more like lists of resources. It would also be useful to be able to do things like search for specific resources in the list.
/kind feature
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.