asobti / kube-monkey Goto Github PK
View Code? Open in Web Editor NEWAn implementation of Netflix's Chaos Monkey for Kubernetes clusters
License: Apache License 2.0
An implementation of Netflix's Chaos Monkey for Kubernetes clusters
License: Apache License 2.0
I think at the moment the design is to if schedule_immediate_kill: true to supersede everything else and start killing pods, even dry_run= true, I think it makes a more sense honor dry run as the first option of preference so users can actually experiment without waiting hours. I could have sent a PR proposing a fix, but I'm not much familiar with GO :( my sample config is as below which I used for verification.
dryRun: true
runHour: 8
startHour: 10
endHour: 16
blacklistedNamespaces: kube-system
whitelistedNamespaces:
timeZone: America/New_York
debug:
enabled: true # if you want to enable debugging and see how pods killed immediately set this to true
schedule_immediate_kill: true
Please
This is config.toml
sample from the docs:
[kubemonkey]
dry_run = true # Terminations are only logged
run_hour = 8 # Run scheduling at 8am on weekdays
start_hour = 10 # Don't schedule any pod deaths before 10am
end_hour = 16 # Don't schedule any pod deaths after 4pm
blacklisted_namespaces = ["kube-system"] # Critical apps live here
Per this sample, what goes on between 8am and 10am? What is the purpose of specifying run_hour
in addition to start_hour
?
Also, is there no provision to run this on weekends?
Hey there,
Wanted to give a quick spin to this projet but not sure what's the way to go?
I can't find a docker image hosted on Docker Hub, is there one? Or should I build the docker image myself? If so, can you provide some guidelines?
Thanks
Nice work. How about setting MTBF to seconds rather than days. Since currently it is an integer, you can't go below daily restarts.
You should also consider whitelisting namespaces.
Why are you setting mbtf on a label rather than in the config map?
When killing pods, it would be nice to know their replication scale to find out their kill frequency.
Hi ,
Can you please let me know will Kube monkey works for the open shift origin whose base system is kubernettes only.
Normal Scheduled 79s default-scheduler Successfully assigned kube-monkey-kubemonkey-5b58fcc9d8-59nbf to ip-192-168-14-36.us-west-2.compute.internal
Normal SuccessfulMountVolume 79s kubelet, ip-192-168-14-36.us-west-2.compute.internal MountVolume.SetUp succeeded for volume "default-token-v8h9h"
Warning FailedMount 15s (x8 over 79s) kubelet, ip-192-168-14-36.us-west-2.compute.internal MountVolume.SetUp failed for volume "config-volume" : configmaps "kube-monkey-kubemonkey" not
I tried building and deploying the projecto k8s and I get this error in the pod logs:
Starting kube-monkey...
panic: open /usr/local/go/lib/time/zoneinfo.zip: no such file or directory
goroutine 1 [running]:
github.com/asobti/kube-monkey/config.Timezone(0x8)
/Users/pswens200/dev/gowork/src/github.com/asobti/kube-monkey/config/config.go:77 +0xc5
github.com/asobti/kube-monkey/kubemonkey.Run(0xc420537f68, 0x1)
/Users/pswens200/dev/gowork/src/github.com/asobti/kube-monkey/kubemonkey/kubemonkey.go:46 +0x5b
main.main()
/Users/pswens200/dev/gowork/src/github.com/asobti/kube-monkey/main.go:23 +0xa0
Any ideas?
I have configured kube-monkey in the kube-system namespace. The kube-monkey pod, deployment, replica-set and config are all seen in good health on the kubernetes dashboard.
$ sudo nano /etc/kube-monkey/config.toml
[kubemonkey]
dry_run = false
run_hour = 9
start_hour = 10
end_hour = 18
graceperiod_sec = 120
blacklisted_namespaces = ["kube-system","kube-chaos"]
time_zone = "Asia/Kolkata"
[debug]
enabled = true
schedule_delay=300
force_should_kill = true
schedule_immediate_kill = true
My deployment has the necessary labels to mark the pod for termination.
kube-chaos-tomcat-deployment.txt
Yet kube-monkey is unable to find any pods for termination.
Logs show :
I0703 18:16:03.623685 1 kubemonkey.go:19] Debug mode detected!
I0703 18:16:03.623705 1 kubemonkey.go:20] Status Update: Generating next schedule in 30 sec
I0703 18:16:33.623885 1 schedule.go:64] Status Update: Generating schedule for terminations
I0703 18:16:33.722343 1 schedule.go:57] Status Update: 0 terminations scheduled today
I0703 18:16:33.722398 1 kubemonkey.go:63] Status Update: Waiting to run scheduled terminations.
I0703 18:16:33.722410 1 kubemonkey.go:77] Status Update: All terminations done.
Could someone please assist me in debugging what my mistake could be?
Thanks.
Hi, I have been some days trying to use kube-monkey in order to kill pods of a namespace but all the tries are not being succesfull. Every time I try a configuration, I receive the next message :
********** Today's schedule **********
No terminations scheduled
********** End of schedule *********
I don't understand what is happening, those are my configuration files:
deployment.yml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-monkey
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: ignite-10
kube-monkey/mtbf: '2'
kube-monkey/kill-mode: "fixed"
kube-monkey/kill-value: "1"
spec:
containers:
- name: kube-monkey
command:
- "/kube-monkey"
args: ["-v=5", "-log_dir=/var/log/kube-monkey"]
image: ayushsobti/kube-monkey:v0.2.3
volumeMounts:
- name: config-volume
mountPath: "/etc/kube-monkey"
volumes:
- name: config-volume
configMap:
name: kube-monkey-config-map
And configmap.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-monkey-config-map
namespace: kube-system
data:
config.toml: |
[kubemonkey]
run_hour = 11
start_hour = 12
end_hour = 13
blacklisted_namespaces = ["kube-system"]
whitelisted_namespaces = [] # I have tried using "ignite" here, which is the namespace that I want to for kill pods
time_zone="Europe/Madrid"
Those are the pods that I have in the ignite namespace:
This is the kube-system namespace:
This is the log of the kube-monkey:
Please, could anyone help me?
viper should allow using environment variables
https://github.com/spf13/viper#working-with-environment-variables
I am trying to set the internal timezone of the scheduler for my own like this
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-monkey-config-map
namespace: kube-system
data:
config.toml: |
[kubemonkey]
run_hour = 8
start_hour = 10
end_hour = 16
blacklisted_namespaces = ["kube-system"]
timezone = "Europe/Amsterdam"
As I saw in the code that it is there, but it is not working...
It's always like this
I1207 15:47:29.649440 1 kubemonkey.go:26] Generating next schedule at 2017-12-07 08:00:00 -0800 PST
So time is in time format but the scheduler is not
Changing the default in the code and recompiling it woks so that is what I did here, but would be nice to have it documented or if it's a bug fixed.
Thanks, awesome work!
Scenario:
Kube monkey generates the following schedule for the day:
At 10 AM the first attack against Deployment A will fail with the error below, and the docker container for Kube Monkey dies.
Kubernetes restarted the container but (as expected) the attack on Deployment B did not run.
Error log:
{"log":"panic: runtime error: invalid memory address or nil pointer dereference\n","stream":"stderr","time":"2018-08-31T12:06:00.021385567Z"} {"log":"[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xe4c072]\n","stream":"stderr","time":"2018-08-31T12:06:00.021395074Z"} {"log":"\n","stream":"stderr","time":"2018-08-31T12:06:00.021398393Z"}
It would be useful for kube-monkey to have the ability to send events ("killed pod foo for app bar") to channels like Slack, HipChat etc.
In the event that an action taken by kube-monkey causes an issue, the ability to quickly correlate that could be valuable.
Can i set kube-monkey/mtbf = "0.01",Or mtbf must with int value?
It would be great if kube-monkey could expose a metrics endpoint with values for things it has done.
Inspiration can be drawn from linki/chaoskube#23 :)
What is the purpose of kube-monkey/identifier
annotation?
Docs say:
A unique identifier for the k8 app (eg. the k8 app's name). This is used to identify the pods that belong to a k8 app as Pods inherit labels from their k8 app.
How is this annotation used? What is the guidance around what this annotation value should be? Should it match any other label in manifest?
What permissions does kube-monkey require to run? I am trying to run it in a cluster with RBAC enabled and have assigned a service account with the below cluster role to the container:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: kube-monkey
rules:
- apiGroups: ["", "extensions", "apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get","list","watch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "delete"]
kube-monkey runs, but fails to delete any pods as it is unable to see the running pods for any deployment:
I1204 12:21:18.610160 1 config.go:71] Successfully validated configs
I1204 12:21:18.610604 1 main.go:52] Starting kube-monkey with v logging level 5 and local log directory /var/log/kube-monkey
I1204 12:21:18.635426 1 kubemonkey.go:26] Generating next schedule at 2017-12-05 08:00:00 +0000 UTC
I1205 08:00:00.000198 1 schedule.go:43] Generating schedule for terminations
I1205 08:00:00.010612 1 schedule.go:28] ********** Today's schedule **********
I1205 08:00:00.010625 1 schedule.go:32] Deployment Termination time
I1205 08:00:00.010628 1 schedule.go:33] ---------- ----------------
I1205 08:00:00.010634 1 schedule.go:35] my-api 2017-12-05 11:26:00 +0000 UTC
I1205 08:00:00.010649 1 schedule.go:39] ********** End of schedule **********
I1205 08:00:00.010664 1 kubemonkey.go:64] Waiting for terminations to run
E1205 11:26:00.014078 1 kubemonkey.go:70] Failed to execute termination for deployment my-api Error: Deployment my-api has no running pods at the moment
I1205 11:26:00.014440 1 kubemonkey.go:77] All terminations done
I1205 11:26:00.014478 1 kubemonkey.go:26] Generating next schedule at 2017-12-06 08:00:00 +0000 UTC
This is in a 1.7.11 cluster, could it be due to incompatibilities with the client lib?
In chaos.go
line 109, we check for KillValue
and in case of an error, we delete a single random pod. This happens even if KillType
has an invalid value since that check comes later.
Is there a possibility to add a whitelist to the kube monkey configuration? It could be exclusive with blacklisting. You either do the first or the second one. I find whitelisting better to deal with if you manage a lot of namespaces.
Thank you
time this helm to submit as an official helm chart?
Hey there!
I've tried following the instructions explicitly on the README, but unfortunately I keep hitting an interesting blocked, that I think may be tied to some circular dependencies in glog (I'm a super novice at Go, admittedly).
In my error log I get the following:
/kube-monkey flag redefined: log_dir
panic: /kube-monkey flag redefined: log_dir
goroutine 1 [running]:
flag.(*FlagSet).Var(0xc42004c060, 0x1d8fe00, 0xc42000efa0, 0x14c1722, 0x7, 0x14e3d51, 0x2f)
/usr/local/Cellar/go/1.9.3/libexec/src/flag/flag.go:793 +0x5e1
flag.(*FlagSet).StringVar(0xc42004c060, 0xc42000efa0, 0x14c1722, 0x7, 0x0, 0x0, 0x14e3d51, 0x2f)
/usr/local/Cellar/go/1.9.3/libexec/src/flag/flag.go:696 +0x8b
flag.(*FlagSet).String(0xc42004c060, 0x14c1722, 0x7, 0x0, 0x0, 0x14e3d51, 0x2f, 0xc4200f1f00)
/usr/local/Cellar/go/1.9.3/libexec/src/flag/flag.go:709 +0x8b
flag.String(0x14c1722, 0x7, 0x0, 0x0, 0x14e3d51, 0x2f, 0x0)
/usr/local/Cellar/go/1.9.3/libexec/src/flag/flag.go:716 +0x69
github.com/asobti/kube-monkey/vendor/k8s.io/client-go/1.5/vendor/github.com/golang/glog.init()
/Users/joe/go/src/github.com/asobti/kube-monkey/vendor/k8s.io/client-go/1.5/vendor/github.com/golang/glog/glog_file.go:41 +0x14a
github.com/asobti/kube-monkey/vendor/k8s.io/client-go/1.5/kubernetes.init()
<autogenerated>:1 +0x48
github.com/asobti/kube-monkey/deployments.init()
<autogenerated>:1 +0x53
github.com/asobti/kube-monkey/chaos.init()
<autogenerated>:1 +0x5d
github.com/asobti/kube-monkey/kubemonkey.init()
<autogenerated>:1 +0x53
main.init()
<autogenerated>:1 +0x5d
This is before even trying to run it in k8s, and just running it locally, but we saw the exact same issues when running this in k8s.
I've tried some attempts at resolving conflicts using glide install --strip-vendor
but to no avail.
Just wondering if you have any words of advice, or something stupid I may be doing as a Go noob (:
Additionally, do you think it's possible to push your image as a release to hub.docker.com?
Problem
Output of kubectl logs
Starting kube-monkey...
--
open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
goroutine 1 [running]:
main.main()
/home/user/go/src/github.com/asobti/kube-monkey/main.go:24 +0x108
Environment
Trying to run kube-monkey in a kube cluster that is behind a firewall hence no auth configured.
This error could be because I'm running v1.5.2 of kubernetes.
Is there any reason not to upgrade the kubernetes/client-go version?
[user@box] $ curl http://localhost:8080/api/
{
"kind": "APIVersions",
"versions": [
"v1"
],
"serverAddressByClientCIDRs": [
{
"clientCIDR": "0.0.0.0/0",
"serverAddress": "xxx.xxx.xxx.xxx:x443"
}
]
}
[user@box] $ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"c55cf2b7d8bfeb947f77453415d775d7f71c89c2", GitTreeState:"clean", BuildDate:"2017-02-06T23:54:31Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"c55cf2b7d8bfeb947f77453415d775d7f71c89c2", GitTreeState:"clean", BuildDate:"2017-02-06T23:54:31Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Debugging
Tracing the error goes to
https://github.com/asobti/kube-monkey/blob/master/kubernetes/kubernetes.go#L11
which calls Google's Kubernetes APi verification at
https://github.com/asobti/kube-monkey/blob/master/vendor/k8s.io/client-go/1.5/rest/config.go#L262
In the kubernetes docs, "/var/run/secrets/kubernetes.io/serviceaccount/token" should always be automatically generated. It's possible that this problem occurs if the service account admission controller is not enabled. Additionally "spec.serviceAccountName field has been automatically set" is not true for any deployed pods.
Our current configuration does not generate anything in the service account. i.e.
[user@box ~]$ kubectl describe sa
Name: default
Namespace: kube-system
Labels: <none>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Reason for Issue
I'm primarily opening this issue
Failed to fetch eligible deployments for namespace due to error: deployments.apps is forbidden: User "system:serviceaccount:kube-system:default" cannot list deployments.apps at the cluster scope
Just curious.....
Hi,
We have low traffic on Sundays, so I would like to schedule the monkey to run only on Sunday.
There is any way to do that with the current configuration options?
I want to fork this and add destroying entire Service's to kube-monkey. You guys open to the idea?
This would be useful for testing an application's resiliency to a network partition of a particular dependency, for example:
Proposal
Ability to define a kill mode or kill value that is based on the pod disruption budget for that deployment.
The idea behind this is that if you define in a PDB that it's ok to have x% of max unavailability, then it should stand to reason that it's also safe for kube to kill a number of pods based on this percentage.
Example:
kill-mode: pdb-based
kill-value: equals
or
kill-mode: pdb-based
kill-value: +25%
I'm not sure how best to deal with scenarios where this kill mode is set but there is no PDB. I guess it could either do nothing, or use some default value. At this point I just wanted to gauge interest in this new feature. I may be able to contribute part of the code for this.
This might be an issue with my local environment since these failures do not show up when Travis CI runs tests (https://github.com/asobti/kube-monkey/commits/master).
Opening this issue to investigate. Might be a go version thing
$ go test -v ./victims
# github.com/asobti/kube-monkey/victims
victims/victims.go:284: Verbose.Infof format % is missing verb at end of string
victims/victims.go:288: Verbose.Infof format % is missing verb at end of string
FAIL github.com/asobti/kube-monkey/victims [build failed]
$ go version
go version go1.11 darwin/amd64
I am trying to use the debug mode which can immediately kill the pod, but not sure what I did wrong, the immediately kill never happened.
Here is the config.yaml I have:
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-monkey-config-map
namespace: dmp-system
data:
config.toml: |
[kubemonkey]
DryRun = false
run_hour = 18
start_hour = 19
end_hour = 20
blacklisted_namespaces = ["kube-system"]
DebugEnabled = true
DebugScheduleImmediateKill = true
Short name for Kubernetes is k8s
, not k8
. Please update the documentation throughout.
https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/#what-does-kubernetes-mean-k8s
I'm currently attempting at getting kube-monkey up and running on a kubernetes cluster, and I keep getting an error stating:
panic: Unable to verify client connectivity to Kubernetes server
Any ideas or suggestions on what's going wrong and how to fix it?
Thanks.
I0221 03:49:36.201590 7 kubemonkey.go:24] Status Update: Generating next schedule at 2018-02-22 00:00:00 +0530 IST
I0221 18:30:00.000194 7 schedule.go:54] Status Update: Generating schedule for terminations
W0221 18:30:00.015033 7 factory.go:44] Failed to fetch eligible deployments for namespace cam-km due to error: User "system:serviceaccount:cam-km:default" cannot list deployments.extensions in the namespace "cam-km". (get deployments.extensions)
I0221 18:30:00.015728 7 schedule.go:47] Status Update: 0 terminations scheduled today
I0221 18:30:00.015775 7 kubemonkey.go:63] Status Update: Waiting to run scheduled terminations.
I0221 18:30:00.015789 7 kubemonkey.go:77] Status Update: All terminations done.
I0221 18:30:00.015892 7 kubemonkey.go:24] Status Update: Generating next schedule at 2018-02-23 00:00:00 +0530 IST
********** Today's schedule **********
No terminations scheduled
********** End of schedule **********
the values.yaml
replicaCount: 1
namespace: default
rbac:
enabled: true
image:
repository: ayushsobti/kube-monkey
tag: v0.2.3
pullPolicy: IfNotPresent
config:
dryRun: false
runHour: 8
startHour: 10
endHour: 16
blacklistedNamespaces: kube-system
whitelistedNamespaces:
timeZone: Australia/Melbourne
debug:
enabled: true # if you want to enable debugging and see how pods killed immediately set this to true
schedule_immediate_kill: true
args:
logLevel: 5
logDir: /var/log/kube-monkey
ms values.yaml
kubemonkey:
enabled: enabled # to disable set this to "disabled", to enable set it to "enabled"
mtbf: 1
kill-mode: fixed-percent
kill-value: 80
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xe28cd2]
goroutine 21 [running]:
github.com/asobti/kube-monkey/chaos.(*Chaos).terminate(0xc42015da10, 0x114ba00, 0xc4200b23c0, 0x0, 0x0)
/Users/asobti/Coding/go/src/github.com/asobti/kube-monkey/chaos/chaos.go:127 +0x1a2
github.com/asobti/kube-monkey/chaos.(*Chaos).Execute(0xc42015da10, 0xc420084660)
/Users/asobti/Coding/go/src/github.com/asobti/kube-monkey/chaos/chaos.go:66 +0x202
github.com/asobti/kube-monkey/chaos.(*Chaos).Schedule(0xc42015da10, 0xc420084660)
/Users/asobti/Coding/go/src/github.com/asobti/kube-monkey/chaos/chaos.go:42 +0x51
created by github.com/asobti/kube-monkey/kubemonkey.ScheduleTerminations
/Users/asobti/Coding/go/src/github.com/asobti/kube-monkey/kubemonkey/kubemonkey.go:56 +0xb1
Looking at the code, it seems this only supports labelling Deployments as victims:
Any plans to support services/replica sets? Is there a workaround if we don't use deployments?
EDIT: I thikn I can just add the label selector to pods and have it work, is that correct:
kube-monkey/victims/victims.go
Line 113 in b42263d
Really very good job,sir.
I‘ve made some practice in my kubernetes cluster.I found kube-monkey only kill pod under deployment.I've created some pods under rc and taged with kube-monkey tag as I used in my deployment.It seemed that kube-monkey don't have any impact on pod under rc.
Sir,do you have a plan to support some other k8s controllers(rc、daemonset etc.) ,this will make kube-monkey much more useful.
Many tks.
Facing issue while running kube-monkey on our kube cluster.
Error Log:
I0620 21:44:56.642419 1 main.go:52] Starting kube-monkey with v logging level 5 and local log directory /var/log/kube-monkey
E0620 21:44:56.643397 1 kubernetes.go:39] failed to obtain config from InClusterConfig: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
F0620 21:44:56.644008 1 main.go:55] Failed to generate NewInClusterClient: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
goroutine 1 [running]:
github.com/asobti/kube-monkey/vendor/github.com/golang/glog.stacks(0xc42024f500, 0xc420264000, 0xa5, 0xb6)
/go/src/github.com/asobti/kube-monkey/vendor/github.com/golang/glog/glog.go:766 +0xcf
github.com/asobti/kube-monkey/vendor/github.com/golang/glog.(*loggingT).output(0x1785600, 0xc400000003, 0xc4200ce580, 0x1716d4a, 0x7, 0x37, 0x0)
/go/src/github.com/asobti/kube-monkey/vendor/github.com/golang/glog/glog.go:717 +0x30f
github.com/asobti/kube-monkey/vendor/github.com/golang/glog.(*loggingT).printDepth(0x1785600, 0x7f8b00000003, 0x1, 0xc42014ff48, 0x1, 0x1)
/go/src/github.com/asobti/kube-monkey/vendor/github.com/golang/glog/glog.go:646 +0x129
github.com/asobti/kube-monkey/vendor/github.com/golang/glog.(*loggingT).print(0x1785600, 0x3, 0xc42014ff48, 0x1, 0x1)
/go/src/github.com/asobti/kube-monkey/vendor/github.com/golang/glog/glog.go:637 +0x5a
github.com/asobti/kube-monkey/vendor/github.com/golang/glog.Fatal(0xc42014ff48, 0x1, 0x1)
/go/src/github.com/asobti/kube-monkey/vendor/github.com/golang/glog/glog.go:1125 +0x53
main.main()
/go/src/github.com/asobti/kube-monkey/main.go:55 +0x1ea
Not sure this is configuration related problem or something else.
configmap.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-monkey-config-map
namespace: kube-system
data:
config.toml: |
[kubemonkey]
host="http://localhost:8080"
run_hour = 8
start_hour = 10
end_hour = 16
blacklisted_namespaces = ["kube-system"]
whitelisted_namespaces = [""]
time_zone = "America/New_York"
deployment.yaml
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-monkey
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
app: kube-monkey
spec:
containers:
- name: kube-monkey
command:
- "/kube-monkey"
args: ["-v=5", "-log_dir=/var/log/kube-monkey"]
#image: andrewsrobertamary/kube-monkey:latest
image: kube-monkey:ubuntu
volumeMounts:
- name: config-volume
mountPath: "/etc/kube-monkey"
volumes:
- name: config-volume
configMap:
name: kube-monkey-config-map
We are using insecure bind address in api server.
KUBE_API_ADDRESS="--insecure-bind-address=0.0.0.0"
Even I tried with adding ServiceAccount
in KUBE_ADMISSION_CONTROL but that worsen the problem so I rollback changes.
In our cluster /var/run/secrets/kubernetes.io/serviceaccount/token
never created.
We are using heapster with --source=kubernetes:http://localhost:8080?inClusterConfig=false
When the config file on the pod changes, KM should use the new configs. Referencing https://github.com/asobti/kube-monkey/blob/master/config/config.go#L46
[kubemonkey]
dry_run = true
run_hour = 8
start_hour = 10
end_hour = 16
blacklisted_namespaces = ["default","kube-system"]
[debug]
enabled = true
schedule_delay = 10
force_should_kill = true
schedule_immediate_kill = true
Have kube-monkey running and a test deployment/pod. ex:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: counter
namespace: test-system
spec:
template:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim-counter
kube-monkey/mtbf: '1'
spec:
containers:
- name: count
image: busybox
args: [/bin/sh, -c, 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
kubectl edit cm/cm-config-map
or edit the config file locally and reload with kubectl create configmap cm-config-map --from-file=config.toml=cm-config-map.toml -o yaml --dry-run | kubectl replace -f -
[kubemonkey]
dry_run = false
run_hour = 8
start_hour = 10
end_hour = 16
blacklisted_namespaces = ["default","kube-system"]
[debug]
enabled = true
schedule_delay = 10
force_should_kill = true
schedule_immediate_kill = true
kubectl get pods --namespace=kube-system | grep -o '^kube-monkey[^[:space:]]*' | xargs -L 1 -I f watch kubectl exec -ti "f" cat /etc/kube-monkey/config.toml
kubectl get pods --namespace=test-system | grep -o '^counter[^[:space:]]*' | xargs -L 1 -I f watch kubectl logs "f" --namespace="test-system" --tail=20
KM should detect the config change, switch off dry-run and kill the counter pod.
kubectl get pods --namespace=test-system | grep -o '^counter[^[:space:]]*' | xargs -L 1 -I f watch kubectl logs "f" --namespace="test-system" --tail=20
If retrieving KillType returns an error, a single pod is terminated (see https://github.com/asobti/kube-monkey/blob/master/chaos/chaos.go#L103)
However, if KillType returns a value, but we do not recognize that value, we log an error and do nothing (see https://github.com/asobti/kube-monkey/blob/master/chaos/chaos.go#L128).
This is inconsistent. Error condition and invalid value should both trigger the same behavior. IMO, this behavior should be of logging an error and doing nothing.
A similar inconsistency exists with KillValue too. If KillValue returns an error, it defaults to killing a single pod (see https://github.com/asobti/kube-monkey/blob/master/chaos/chaos.go#L112), but if the value is incorrect (eg. greater than 100 or < 0), then we log an error and don't kill anything (see https://github.com/asobti/kube-monkey/blob/master/victims/victims.go#L255)
Hi,
I've tried to clone and compile kube-monkey but get this error:
> go get github.com/asobti/kube-monkey
[...]
> cd $GOPATH/src/github.com/asobti/kube-monkey
> glide install --strip-vendor
[...]
> make clean
[...]
> make build
rm -f kube-monkey
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o kube-monkey
# github.com/asobti/kube-monkey/vendor/golang.org/x/crypto/ssh/terminal
vendor/golang.org/x/crypto/ssh/terminal/util.go:30:12: undefined: unix.IoctlGetTermios
vendor/golang.org/x/crypto/ssh/terminal/util.go:38:18: undefined: unix.IoctlGetTermios
vendor/golang.org/x/crypto/ssh/terminal/util.go:54:12: undefined: unix.IoctlSetTermios
vendor/golang.org/x/crypto/ssh/terminal/util.go:64:18: undefined: unix.IoctlGetTermios
vendor/golang.org/x/crypto/ssh/terminal/util.go:75:9: undefined: unix.IoctlSetTermios
vendor/golang.org/x/crypto/ssh/terminal/util.go:80:13: undefined: unix.IoctlGetWinsize
vendor/golang.org/x/crypto/ssh/terminal/util.go:98:18: undefined: unix.IoctlGetTermios
vendor/golang.org/x/crypto/ssh/terminal/util.go:107:12: undefined: unix.IoctlSetTermios
vendor/golang.org/x/crypto/ssh/terminal/util.go:112:3: undefined: unix.IoctlSetTermios
make: *** [Makefile:9: build] Error 2
As a fix I've just changed the commit used for golang.org/x/sys (golang/sys@9a2e24c) to the latest one to date (golang/sys@fff93fa).
Not sure if it's something on my side, or just a wrong dependency.
Expected behavior:
I expect just as if deleting one pod to get the error message:
kubemonkey.go:68] Failed to execute termination for deployment fail-deploy. Error: Deployment fail-deploy has no running pods at the moment
Currently it deletes the counter and does not complain. Sample output:
********** Today's schedule **********
Deployment Termination time
---------- ----------------
counter 12/12/27127 128:1527:00 -0500 UTC
********** End of schedule **********
I1227 22:02:49.181134 1 chaos.go:120] Terminating ALL pods for deployment counter
I1227 22:02:49.189360 1 kubemonkey.go:70] Termination successfully executed for deployment counter
I1227 22:02:49.189371 1 kubemonkey.go:73] Status Update: 0 scheduled terminations left.
I1227 22:02:49.189375 1 kubemonkey.go:76] Status Update: All terminations done.
I1227 22:02:49.189416 1 kubemonkey.go:18] Debug mode detected!
I1227 22:02:49.189420 1 kubemonkey.go:19] Status Update: Generating next schedule in 60 sec
---
apiVersion: v1
kind: Namespace
metadata:
name: test-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: counter
namespace: test-system
spec:
replicas: 2
template:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim-counter
kube-monkey/mtbf: "1"
kube-monkey/kill-all: "kill-all"
spec:
containers:
- args:
- /bin/sh
- -c
- 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
image: DOESNOTEXIST:latest
imagePullPolicy: Always
name: test-counter
restartPolicy: Always
kubectl get pods --namespace=test-system
NAME READY STATUS RESTARTS AGE
counter-9bcfdf745-fnr2r 0/1 InvalidImageName 0 22s
counter-9bcfdf745-vwmpz 0/1 InvalidImageName 0 22s
I believe there is a bug in:
Line 125 in e1052df
According to the README
, killValue
should be between 0-100. Therefore, the value passed to DeleteRandomPods
will be between 0 and 10,000 (100*100/(0+1)
).
Example:
https://play.golang.org/p/c0z6RijiR1D
Think one way to calculate the correct value is:
nPods := 3
maxPods := math.Round(float64(nPods*killValue) / 100)
toKill := r.Intn(int(maxPods+1))
Happy to submit a PR with a fix and some tests.
0.2.3 is the only tag available at DockerHub:
0.1.0 tag not available on DockerHub. As a result, https://github.com/asobti/kube-monkey/blob/master/examples/deployment.yaml#L19 fails to start kube monkey with the following error:
kube-monkey-d46b987b9-lrg67 0/1 ImagePullBackOff 0 1m
Also, the image's repo name needs to be specified as well.
The updated deployment.yaml
that works is:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-monkey
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
app: kube-monkey
spec:
containers:
- name: kube-monkey
command:
- "/kube-monkey"
args: ["-v=5", "-log_dir=/var/log/kube-monkey"]
image: ayushsobti/kube-monkey:v0.2.3
volumeMounts:
- name: config-volume
mountPath: "/etc/kube-monkey"
volumes:
- name: config-volume
configMap:
name: kube-monkey-config-map
I got an error called Failed to fetch eligible deployments for namespace testns-ns due to error: the server could not find the requested resource
.
The configmap, deployment, pod and victims are all in namespace testns-ns.
Details of logs are shown below
I0604 22:07:37.873927 1 config.go:74] Successfully validated configs
I0604 22:07:37.873958 1 main.go:52] Starting kube-monkey with v logging level 5 and local log directory /var/log/kube-monkey
I0604 22:07:38.157628 1 kubemonkey.go:24] Status Update: Generating next schedule at 2018-06-04 16:00:00 -0700 PDT
LM-SJC-11012475:kube-monkey yxu2$ kubectl logs kube-monkey-6bbccb7699-t5wnw --namespace=testns-ns
I0604 22:07:37.873927 1 config.go:74] Successfully validated configs
I0604 22:07:37.873958 1 main.go:52] Starting kube-monkey with v logging level 5 and local log directory /var/log/kube-monkey
I0604 22:07:38.157628 1 kubemonkey.go:24] Status Update: Generating next schedule at 2018-06-04 16:00:00 -0700 PDT
I0604 23:00:00.000173 1 schedule.go:64] Status Update: Generating schedule for terminations
W0604 23:00:00.010772 1 factory.go:45] Failed to fetch eligible deployments for namespace testns-ns due to error: the server could not find the requested resource
********** Today's schedule **********
No terminations scheduled
********** End of schedule **********
I0604 23:00:00.011352 1 schedule.go:57] Status Update: 0 terminations scheduled today
I0604 23:00:00.011388 1 kubemonkey.go:63] Status Update: Waiting to run scheduled terminations.
I0604 23:00:00.011394 1 kubemonkey.go:77] Status Update: All terminations done.
I0604 23:00:00.011479 1 kubemonkey.go:24] Status Update: Generating next schedule at 2018-06-05 16:00:00 -0700 PDT
Details of configmap is shown below
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
config.toml: |
[kubemonkey]
host="https://your-apiserver-url.com:apiport"
run_hour = 16
start_hour = 17
end_hour = 18
blacklisted_namespaces = ["kube-system"]
whitelisted_namespaces = ["testns-ns"]
time_zone = "America/Los_Angeles"
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"config.toml":"[kubemonkey]\nrun_hour = 12\nstart_hour = 12\nend_hour = 13\nblacklisted_namespaces = [\"kube-system\"]\nwhitelisted_namespaces = [\"testns-ns\"]\ntime_zone = \"America/Los_Angeles\"\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"kube-monkey-config-map","namespace":"testns-ns"}}
creationTimestamp: 2018-06-01T18:09:18Z
name: kube-monkey-config-map
namespace: testns-ns
resourceVersion: "2018008066"
selfLink: /api/v1/namespaces/testns-ns/configmaps/kube-monkey-config-map
uid: e6f8283a-65c6-11e8-9b2c-74dbd180d2d0
Ability to send an HTTP POST when the attacks are scheduled and when an attack runs.
This is specially useful when running Kubemonkey in a production environment. I'm not sure how other folks are monitoring the attacks but I guess it's based on the logs.
This feature could work based on a config map, where users would define where to post these messages to, and possibly the format as well (we could have keywords such as $attack_time, $pod_name, etc). When a new attack schedule is generated or when a new attack runs, a POST would be sent to the endpoint configured.
Example
kind: ConfigMap
data:
alerts_configs.json: |-
{
"report_schedules": false,
"report_attacks": true,
"endpoint": "http://somewhere:80/alerts"
}
attacks_message.json: |-
{
"pod": $pod_name,
"foo": "$bar"
}```
Currently, termination time is logged to microseconds granularity along with monotonic clock reading.
I1109 13:40:56.471721 11 schedule.go:28] ********** Today's schedule **********
I1109 13:40:56.471749 11 schedule.go:32] Deployment Termination time
I1109 13:40:56.471752 11 schedule.go:33] ---------- ----------------
I1109 13:40:56.471757 11 schedule.go:35] hello-world 2017-11-09 13:41:28.4717057 -0800 PST m=+62.093720077
See https://golang.org/pkg/time/#Time.String
Suggested fix:
The returned string is meant for debugging; for a stable serialized representation, use t.MarshalText, t.MarshalBinary, or t.Format with an explicit format string.
Adding helm support would help adoption of this great project. Any work started here yet?
If a TerminationGracePeriod
is defined in the pod spec and it is larger than the configured grace period, the TerminationGracePeriod
value should be used
It could have been nice, if this tool can perform quick chaos over a short and immediate priod of time. Our team wants to test the app resiliency in 20min/1 hour, not to wait the whole weak long before drawing conclusions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.