dgkanatsios / azuregameserversscalingkubernetes Goto Github PK
View Code? Open in Web Editor NEWScaling Dedicated Game Servers on Azure Kubernetes Service
License: MIT License
Scaling Dedicated Game Servers on Azure Kubernetes Service
License: MIT License
Make sure you open/forward ports for all relevant containers on the pod
Currently, when we scale down a collection we randomly choose the DGSs that will become MarkedForDeletion and be removed out of the collection. We could improve this algorithm by prioritising DGSs that
When a new DedicatedGameServerCollection is created, its states are Healthy/Running. This should not be the case.
Add related documentation for
Sometimes there is a race condition issue on the DGSCol controller
repro:
You will see 3 Running DGS and 3 as MarkedForDeletion.
The problem lies in the fact that non-parented DGSs (the MarkedForDeletion ones) trigger DGSCol controller updates (because of #31). In one of these updates, there is a chance that another scale down operation will occur, while the original one is still in progress.
We should write a simple bash script that updates APIServer and Controller images on YAML creation files
Sometimes the DGS status is not updated, when we delete Pods/DGS/etc.
Right now a port is randomly created and it’s existence is checked on a azure Storage. It would be faster and more optimal if the entire Port table contents are loaded into memory first.
It would be interesting to investigate the creation of a custom Pod scheduler. This scheduler would distribute the Pods/DedicatedGameServers to each Node, by satisfying the rule ‘is Node A full of running Pods? If not, keep scheduling there. If yes, schedule on B’ and so on and so forth.
Describe how our PortRegistry works
Right now the default namespace is used for both admin stuff (operator/apiserver) and game server pods. We should Create and manage two namespaces: one for the actual pods and one for the admin stuff (operator/apiserver)
I checked on the Azure File share, I could not find '/data/oa_ded.x86_64' file. However, I do see bunch of .cfg files. (see attached)
noticed a panic when 5 DGS were loaded with no exported ports. We should write a test about it
Provide an HTML UI so the user can see the running Dedicated Game Servers as well as their IPs/Ports
Probably introduced by one of the latest commits
We should introduce Admission Webhooks on the project. At first, they should do two tasks
i) validate that incoming DedicatedGameServerCollection and DedicatedGameServer have request limits at their Pod template
ii) introduce pod-affinity for the Pods so they are better grouped together
Provide a sample application to be used for tests
Add client-go leaderelection for controllers
In one of our tests, we noticed that availableReplicas number was wrong.
Repro
number that was shown was 6, should be 3.
How? What? When?
Currently, the env variables are created hardcoded. While this works for OpenArena, we need to make them parameterized.
We should implement a mechanism to update Container images. First step would be to detect it in the DGSCollection controller, than mark images for deletion, so that new DGS that are created pull the updated image.
Right now, we're using Table Storage as the backend storage for the project. Is it the best option? Can we provide alternatives? Maybe create a storage interface?
When you delete a DedicatedGameServerCollection, some errors are logged into the console
Example:
INFO[0072] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"openarena-collection-example-ifzdl", UID:"0a1a0d34-91b9-11e8-81bc-aaf3f78eddce", APIVersion:"v1", ResourceVersion:"
236963", FieldPath:""}): type: 'Warning' reason: 'Error in getting the DedicatedGameServer' dedicatedgameservers.azuregaming.com "openarena-collection-example-ifzdl" not found
INFO[0072] dedicatedgameservers.azuregaming.com "openarena-collection-example-ayqki" not found
E0727 19:21:53.103336 3217 PodController.go:160] error syncing 'default/openarena-collection-example-ayqki': dedicatedgameservers.azuregaming.com "openarena-collection-example-ayqki" not
found
We should check if they can be removed.
We should modify our mutation controller so that it creates a 'requests' set for each Pod template, provided it doesn't already have one.
i.e. write more tests!
Modify Makefile to allow local building and publishing to Docker Hub.
During testing with TestNodeJS App (https://github.com/dgkanatsios/azuregameserversscalingkubernetes/blob/master/docs/installation.md#testing-with-nodejs-demo-app-an-echo-http-server), the pod goes to error state each time while setting the DedicatedGameServer MarkedForDeletion state to true (markedfordeletion|true).
Pod log:
$ kubectl logs simplenodejsudp-emijv-lxgdr
UDP Server listening on 0.0.0.0:22222
Set status Assigned OK
Set status Healthy OK
Message received from 104.172.182.40:53085 -
UDP message sent to 104.172.182.40:53085
Message received from 104.172.182.40:53085 - hello
UDP message sent to 104.172.182.40:53085
Message received from 104.172.182.40:53085 - players|8
Set Active Players to running OK
UDP message sent to 104.172.182.40:53085
Message received from 104.172.182.40:53085 - markedfordeletion|true
Set Server Status OK
/app/index.js:87
serverResponse = ${serverResponse}, set Server Status to ${status} OK\n
;
^
ReferenceError: status is not defined
at Request. (/app/index.js:87:69)
at Request._callback (/app/node_modules/lodash/lodash.js:10052:25)
at Request.requestRetryReply [as reply] (/app/node_modules/requestretry/index.js:105:19)
at Request. (/app/node_modules/requestretry/index.js:138:10)
at Request.self.callback (/app/node_modules/request/request.js:185:22)
at Request.emit (events.js:182:13)
at Request. (/app/node_modules/request/request.js:1161:10)
at Request.emit (events.js:182:13)
at IncomingMessage. (/app/node_modules/request/request.js:1083:12)
at Object.onceWrapper (events.js:273:13)
A DedicatedGameServer can signal to our API Server that it has Failed. We do nothing here other than marking the entire DedicatedGameServerCollection as Failed. We should investigate if we should do something else in this case:
We could have the user select what to do via an extra flag on the DedicatedGameServerCollection.
Right now the project creates hostPort (and thus, opens traffic to the Internet) for all ports in all containers. We should come up with a way so that each DedicatedGameServer selects which containers' ports will have hostPorts created.
Probably we need to declare this in the DedicatedGameServer YAML.
Provide some testing methods for the Controllers
As we try to generalize the project, it makes no sense to have 'OpenArena' in the Docker Hub images.
Azure provides an API to see if a VM will reboot for maintenance. Details for this API are listed here. You can try it this way:
kubectl run busybox --image=busybox --rm --restart=Never -it -- /bin/sh
wget -O- --header=Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2017-08-01 # first call takes some time, it will probably return 200 and empty body
# go to the portal, manually click “reboot” on a VM
# after a while …
wget -O- --header=Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2017-08-01
Connecting to 169.254.169.254 (169.254.169.254:80)
{"DocumentIncarnation":1,"Events":[{"EventId":"8A506C35-679A-4F78-9D76-C16DBE65EE6F","EventStatus":"Scheduled","EventType":"Reboot","ResourceType":"VirtualMachine","Resources":["aks-nodepool1-34166363-1"],"NotBefore":"Mon, 26 Nov 2018 10:52:34 GMT"}]}
Proposed design:
Investigate autoscaling. User should issue minimum and maximum replicas when creating a DGSCollection. Investigate adding autoscaler on pods and nodes.
Investigate whether pods created on hostNetwork is the best idea
In current implementation, the Pod that's created by a DedicatedGameServer instance has the same name as this instance. This might create a lot of problems in the future, so we should change it.
We need to provide a unified status for the entire DedicatedGameServerCollection object. Probably we should handle this on the DGSCollectionController?
An open question is whether this status I) is needed and II) if this status has to be depicted in Table Storage as well
We should create liveness and readiness probes for APIServer, to do proper health checks within Kubernetes
Is this because I am using OpenSSL, instead of a valid CA. Is there a way to get past this without actually procuring a cert from a CA?
Right now, the OpenArena image details are mostly hardcoded. Think about making it more flexible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.