hjacobs / kube-janitor Goto Github PK
View Code? Open in Web Editor NEWClean up (delete) Kubernetes resources after a configured TTL (time to live)
License: GNU General Public License v3.0
Clean up (delete) Kubernetes resources after a configured TTL (time to live)
License: GNU General Public License v3.0
It might be convenient to specify the TTL in weeks. Support this by introducing w
as time unit, e.g. 2w
would define a time to live of 2 weeks.
This should be easily possible as the "w" is unambiguous and weeks have a fixed length of 7 days (not accounting for timezones/leap seconds).
Trying to run janitor on cluster with self signed certs.
I'm getting this error urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: /api/v1/namespaces (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1056)')))
Is there a way to ignore invalid certs?
It would be helpful to support a wildcard pattern when specifying namespaces to include/exclude.
Example:
--include-namespaces=devspace-*
--exclude-namespaces=kube-*
Use Case:
It is useful when namespaces are used as a separation between deployment instances in different namespaces, particularly for personal developer or team namespaces.
The 20.2.0
tag is placed on commit f395fbc
instead of 86e1dea
, which follows it.
When one does git checkout 20.2.0
, they get 20.2.0
bits with the 20.1.0
version number
Please see the screenshot
It looks like this is an issues that persists throughout versions:
I can see three possible remedies:
Others have been confused by this, as can be see in this comment
I created Deployment and annotated it:
kubectl run temp-nginx --image=nginx
kubectl annotate deploy temp-nginx janitor/ttl=1m
The pod had valid ownerReferences
:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-03-27T14:02:19Z"
generateName: temp-nginx-68498674c5-
labels:
pod-template-hash: 68498674c5
run: temp-nginx
name: temp-nginx-68498674c5-2mxfc
namespace: dev
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: temp-nginx-68498674c5
uid: ef5dbd98-5098-11e9-999b-02405219215c
As expected, the Deployments was deleted after 1m, here the logs of kube-janitor:
2019-03-27 14:03:37,720 DEBUG: Deployment temp-nginx with 1m TTL is 1m18s old
2019-03-27 14:03:37,720 INFO: Deployment temp-nginx with 1m TTL is 1m18s old and will be deleted (annotation janitor/ttl is set)
2019-03-27 14:03:37,731 DEBUG: https://10.96.0.1:443 "POST /api/v1/namespaces/dev/events HTTP/1.1" 201 836
2019-03-27 14:03:37,732 INFO: Deleting Deployment dev/temp-nginx..
2019-03-27 14:03:37,738 DEBUG: https://10.96.0.1:443 "DELETE /apis/extensions/v1beta1/namespaces/dev/deployments/temp-nginx HTTP/1.1" 200 1712
2019-03-27 14:03:37,739 DEBUG: ReplicaSet temp-nginx-68498674c5 with 1m TTL is 1m18s old
2019-03-27 14:03:37,739 INFO: ReplicaSet temp-nginx-68498674c5 with 1m TTL is 1m18s old and will be deleted (annotation janitor/ttl is set)
2019-03-27 14:03:37,745 DEBUG: https://10.96.0.1:443 "POST /api/v1/namespaces/dev/events HTTP/1.1" 201 858
2019-03-27 14:03:37,745 INFO: Deleting ReplicaSet dev/temp-nginx-68498674c5..
2019-03-27 14:03:37,752 DEBUG: https://10.96.0.1:443 "DELETE /apis/extensions/v1beta1/namespaces/dev/replicasets/temp-nginx-68498674c5 HTTP/1.1" 200 1546
2019-03-27 14:03:37,754 INFO: Clean up run completed: resources-processed=472, deployments-with-ttl=1, deployments-deleted=1, replicasets-with-ttl=1, replicasets-deleted=1
But pod wasn't deleted and was orphaned. ownerReferences
was removed:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-03-27T14:02:19Z"
generateName: temp-nginx-68498674c5-
labels:
pod-template-hash: 68498674c5
run: temp-nginx
name: temp-nginx-68498674c5-2mxfc
namespace: dev
resourceVersion: "66358563"
selfLink: /api/v1/namespaces/dev/pods/temp-nginx-68498674c5-2mxfc
uid: ef60a8b0-5098-11e9-999b-02405219215c
That deletion of resources is cascading. Otherwise there is little sense to just delete Deployments and leave all the Pods behind.
kubectl version:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
kube-janitor version and flags:
Janitor v0.6 started with debug=True, delete_notification=None, dry_run=False, exclude_namespaces=kube-system, exclude_resources=events,controllerrevisions, include_namespaces=all, include_resources=all, interval=30, once=False, rules_file=/config/rules.yaml
rules.yaml:
# example rules configuration to set TTL for arbitrary objects
# see https://github.com/hjacobs/kube-janitor for details
rules:
- id: require-application-label
# remove deployments and statefulsets without a label "application"
resources:
# resources are prefixed with "XXX" to make sure they are not active by accident
# modify the rule as needed and remove the "XXX" prefix to activate
- XXXdeployments
- XXXstatefulsets
# see http://jmespath.org/specification.html
jmespath: "!(spec.template.metadata.labels.application)"
ttl: 4d
- id: temporary-pr-namespaces
# delete all namespaces with a name starting with "pr-*"
resources:
# resources are prefixed with "XXX" to make sure they are not active by accident
# modify the rule as needed and remove the "XXX" prefix to activate
- XXXnamespaces
# this uses JMESPath's built-in "starts_with" function
# see http://jmespath.org/specification.html#starts-with
jmespath: "starts_with(metadata.name, 'pr-')"
ttl: 4h
Kubernetes Janitor should allow deleting PersistentVolumeClaims which are no longer mounted or referenced, e.g. because the StatefulSet was deleted.
PVCs are not automatically deleted and it's easy to forget them. From https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/:
Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources.
Idea: add additional context properties which can be used in the rule jmespath
.
Kube-janitor is a very good tool to remove dangling resources. It would be good if it can integrate resource deletion notifications feature also. for ex. If kube-janitor deletes a namespace then user should get a notification for the same.
Start with notification channels like Email, Slack. Later more channels can be added.
Hi thanks for this awesome project.
I am thinking in some notification mechanism when a resource is going to be deleted soon.
Is this possible? Something like a webhooks or callbacks to an specific endpoint. Make sense for you?
I've understood that, if you set the option include_resources
, it will only work with the resources given.
I've seen, when debugging, that even though the application is set to work (in my case) with namespaces, It asks the api for all the resources and the skips them.
Wouldn't be better directly not calling the api for all the resources and just call for the required ones? This would prevent too many unneeded calls and prevent congesting the API server for no reason.
Log (cropped):
2019-08-09 11:01:52,197 INFO: Janitor v0.7 started with debug=True, delete_notification=None, dry_run=False, exclude_namespaces=kube-system, exclude_resources=events,controllerrevisions, include_namespaces=all, include_resources=namespaces, interval=30, once=True, rules_file=None
2019-08-09 11:01:52,200 DEBUG: Starting new HTTPS connection (1): <API_SERVER_IP>:443
2019-08-09 11:01:52,213 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/namespaces HTTP/1.1" 200 None
2019-08-09 11:01:52,219 DEBUG: Namespace julio will expire on 2019-08-09T22:55
2019-08-09 11:01:52,219 DEBUG: Skipping Namespace kube-system
2019-08-09 11:01:52,222 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/ HTTP/1.1" 200 None
2019-08-09 11:01:52,229 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/configmaps HTTP/1.1" 200 None
2019-08-09 11:01:52,230 DEBUG: Skipping ConfigMap *** ... # Multiple lines
2019-08-09 11:01:52,235 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/endpoints HTTP/1.1" 200 None
2019-08-09 11:01:52,237 DEBUG: Skipping Endpoints *** ... # Multiple lines
2019-08-09 11:01:52,241 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/limitranges HTTP/1.1" 200 711
2019-08-09 11:01:52,241 DEBUG: Skipping LimitRange *** ... # Multiple lines
2019-08-09 11:01:52,256 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/pods HTTP/1.1" 200 None
2019-08-09 11:01:52,279 DEBUG: Skipping Pod *** ... # Multiple lines
2019-08-09 11:01:52,299 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/secrets HTTP/1.1" 200 None
2019-08-09 11:01:52,302 DEBUG: Skipping Secret *** ... # Multiple lines
2019-08-09 11:01:52,308 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/serviceaccounts HTTP/1.1" 200 None
2019-08-09 11:01:52,310 DEBUG: Skipping ServiceAccount *** ... # Multiple lines
2019-08-09 11:01:52,316 DEBUG: https://<API_SERVER_IP>:443 "GET /api/v1/services HTTP/1.1" 200 None
2019-08-09 11:01:52,318 DEBUG: Skipping Service *** ... # Multiple lines
etc etc
The application is run with this options: --once --include-resources=namespaces --debug
I am trying to delete CRDs like EtcdCluster
via Janitor. Specified like so:
rulesFile:
rules:
- id: namespace-cleanup
resources:
- deployments
- services
- etcdcluster
jmespath: "!(spec.template.metadata.labels.janitorIgnore)"
ttl: 12h
it does not delete the etcdcluster resource. This resource is introduced by the Etcd Operator from coreOS. Do I have to specify the CRDs somehow special?
Hi @hjacobs,
can we find out unused deployments/resources and delete them as Janitor Monkey! please comment
It looks like this is not iterating over all namespaces.
I see this
[kube-janitor-7cc797f987-5pgjz] 2020-05-04 00:26:44,482 INFO: Clean up run completed: resources-processed=3012
and looking over the previous logs, it just looked at resources in the namespace it's deployed in (kube-system
)
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: kube-janitor
version: 20.4.0
name: kube-janitor
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-janitor
template:
metadata:
labels:
app: kube-janitor
version: 20.4.0
spec:
containers:
- args:
- --dry-run
- --debug
- --interval=60
image: hjacobs/kube-janitor:20.4.0
name: janitor
resources:
limits:
cpu: 500m
memory: 110Mi
requests:
cpu: 5m
memory: 100Mi
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: kube-janitor
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app: kube-janitor
name: kube-janitor
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- create
- apiGroups:
- '*'
resources:
- '*'
verbs:
- get
- watch
- list
- delete
Not sure why
Sometimes people use the same file to deploy to different environments (live, staging, pr), as deployments to different stacks may differ only in parameter values (and parameters are taken from config maps or some other template engine).
Unfortunately, presence of janitor/ttl forces people to create several yaml files with only difference - presence of janitor/ttl annotation.
I believe that ability to set special value (janitor/ttl: undead, for example) that is blocking janitor in specific environments will allow people to create unified deployments.
Are there any plans on adding additional rule matching (e.g. whitelisting a specific instance of a resource, etc.) alongside the janitor/ttl
annotation, or is the design based on the premise that if the resource is included and there is an annotation, it will be removed? Or is this the case that rule matching will always be used in the situation when an annotation isn't marked?
Running kube-janitor
as a CronJob is probably better (does not need to run very frequently).
I've been having issues trying to get --include-resources
to work as expected.
I tried two variants of this spec but didn't get anything to work:
spec:
containers:
- args:
- --debug
- --interval=20
- --rules-file=/config/rules.yaml
- --include-resources=deployment
spec:
containers:
- args:
- --debug
- --interval=20
- --rules-file=/config/rules.yaml
- --include-resources=deployment
- --include-namespaces=default
I ran kubectl run temp-nginx --image=nginx
and annotated it with kubectl annotate deploy temp-nginx janitor/ttl=5s
, but the deployment doesn't disappear after a loop run. Any ideas on what to do?
running pipenv run coverage run --source=kube_janitor -m py.test
, and got No module named 'py'
Also, I find it not really running unit tests. Do you think so?
An exception occurs in v0.6 when --delete-notification
is set to a value > 0 and a deployment with a TTL annotation is processed. Note that everything works (events emitted, resources deleted) and the annotations are set as expected, but the exception is a bit annoying:
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:20:28,730 INFO: Deployment nginx will be deleted at 2019-03-23T22:24:30Z (annotation janitor/ttl is set)
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:20:28,756 INFO: ReplicaSet nginx-dbddb74b8 will be deleted at 2019-03-23T22:24:30Z (annotation janitor/ttl is set)
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:20:28,773 ERROR: Failed to clean up: Operation cannot be fulfilled on replicasets.extensions "nginx-dbddb74b8": the object has been modified; please apply your changes to the latest version and try again
kube-janitor-69f4484d9b-djc9m janitor Traceback (most recent call last):
kube-janitor-69f4484d9b-djc9m janitor File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 239, in raise_for_status
kube-janitor-69f4484d9b-djc9m janitor resp.raise_for_status()
kube-janitor-69f4484d9b-djc9m janitor File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
kube-janitor-69f4484d9b-djc9m janitor raise HTTPError(http_error_msg, response=self)
kube-janitor-69f4484d9b-djc9m janitor requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://10.3.0.1:443/apis/extensions/v1beta1/namespaces/default/replicasets/nginx-dbddb74b8
kube-janitor-69f4484d9b-djc9m janitor
kube-janitor-69f4484d9b-djc9m janitor During handling of the above exception, another exception occurred:
kube-janitor-69f4484d9b-djc9m janitor
kube-janitor-69f4484d9b-djc9m janitor Traceback (most recent call last):
kube-janitor-69f4484d9b-djc9m janitor File "/kube_janitor/main.py", line 51, in run_loop
kube-janitor-69f4484d9b-djc9m janitor dry_run=dry_run)
kube-janitor-69f4484d9b-djc9m janitor File "/kube_janitor/janitor.py", line 231, in clean_up
kube-janitor-69f4484d9b-djc9m janitor counter.update(handle_resource_on_ttl(resource, rules, delete_notification, dry_run))
kube-janitor-69f4484d9b-djc9m janitor File "/kube_janitor/janitor.py", line 156, in handle_resource_on_ttl
kube-janitor-69f4484d9b-djc9m janitor send_delete_notification(resource, reason, expiry_time, dry_run=dry_run)
kube-janitor-69f4484d9b-djc9m janitor File "/kube_janitor/janitor.py", line 75, in send_delete_notification
kube-janitor-69f4484d9b-djc9m janitor add_notification_flag(resource, dry_run=dry_run)
kube-janitor-69f4484d9b-djc9m janitor File "/kube_janitor/janitor.py", line 52, in add_notification_flag
kube-janitor-69f4484d9b-djc9m janitor resource.update()
kube-janitor-69f4484d9b-djc9m janitor File "/usr/local/lib/python3.7/site-packages/pykube/objects.py", line 142, in update
kube-janitor-69f4484d9b-djc9m janitor self.api.raise_for_status(r)
kube-janitor-69f4484d9b-djc9m janitor File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 246, in raise_for_status
kube-janitor-69f4484d9b-djc9m janitor raise HTTPError(resp.status_code, payload["message"])
kube-janitor-69f4484d9b-djc9m janitor pykube.exceptions.HTTPError: Operation cannot be fulfilled on replicasets.extensions "nginx-dbddb74b8": the object has been modified; please apply your changes to the latest version and try again
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:21:30,727 INFO: Clean up run completed: resources-processed=2941, deployments-with-ttl=1, replicasets-with-ttl=1
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:22:32,614 INFO: Clean up run completed: resources-processed=2941, deployments-with-ttl=1, replicasets-with-ttl=1
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:23:34,703 INFO: Clean up run completed: resources-processed=2941, deployments-with-ttl=1, replicasets-with-ttl=1
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:24:36,777 INFO: Deployment nginx with 10m TTL is 10m6s old and will be deleted (annotation janitor/ttl is set)
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:24:36,789 INFO: Deleting Deployment default/nginx..
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:24:36,808 INFO: ReplicaSet nginx-dbddb74b8 with 10m TTL is 10m6s old and will be deleted (annotation janitor/ttl is set)
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:24:36,817 INFO: Deleting ReplicaSet default/nginx-dbddb74b8..
kube-janitor-69f4484d9b-djc9m janitor 2019-03-23 22:24:36,834 INFO: Clean up run completed: resources-processed=2941, deployments-with-ttl=1, deployments-deleted=1, replicasets-with-ttl=1, replicasets-deleted=1
I just discovered that the log message should be improved to not show "None" when deleting namespaces:
INFO: Deleting Namespace None/d-26f3uxxcro63tdvozphzn18cg6..
(from zalando-incubator/kubernetes-on-aws#2009)
This is only a cosmetic issue and does not affect functionality.
Kube-janitor seems perfect for auto-cleaning of temp. feature deploys. But... we use Helm. So it would remove the resources and leave the release data (secret objects) dangling.
Possible ways to cleanup helm releases could be:
heritage: Helm
and release: <release-name>
.helm uninstall <release-name>
.Some pros/cons:
kube-janitor
. Shell commands, helm binary requirement. Violates linux mantra 'do one thing well'.heritage: Helm
and release: <release-name>
.name: <release-name>
and a name like sh.helm.release.v1.<release-name>.v1
.Some pros/cons:
kube-janitor
kube-janitor
but when using helm charts not authored yourself, you're dependent on the ability of the chart to annotate all resources. Note that this restriction doesn't seem to apply when using kube-janitor with a rules file. If I'm not mistaken that doesn't require having annotations on each resource.Any implementation would obviously be 'opt in' and might require some additional config options or logic, e.g. an annotation specifying the label to extract the helm release from.
I'd like to hear your thoughts. Personally I think the 2nd approach would be a fit for kube-janitor
while 1st approach has risk of embedding what would become a parallel completely new implementation.
Coming days (given time, πafter all) I'll try to run some tests to see how Helm copes with possible dangling resources, while release data has been removed.
We could limit the memory footprint by using paged results.
Hey,
Apologies this is more of a question than an issue... I tried to get the answer from the code but python isn't my strong suit :)
I'm thinking about kube-janitor at scale, we have 500+ namespaces etc, and I believe (please correct me if I'm wrong) the approach janitor takes is to iterate over them all, pulling at the resources, then inspecting the annotations - every minute.
That feels like an expensive operation, and I'm wondering if you've considered either:
janitor=true
, as well as the relevant annotation?or
Cheers
Karl
Some CronJobs use Persistent Volumes which not be deleted between CronJob runs, e.g:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: "foobar"
spec:
schedule: "0 23 * * *"
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
application: "foobar"
spec:
restartPolicy: Never
containers:
- name: cont
image: "my-docker-image"
volumeMounts:
- mountPath: "/data"
name: "foobar-data"
volumes:
- name: "foobar-data"
persistentVolumeClaim:
claimName: "foobar-data"
_context.pvc_is_not_referenced
should be false
for the PVC foobar-data
in this case.
Hi there,
Would you be into the idea of putting kube-janitor
up on helm charts repo? I'm happy to put it up for you as well, if you're OK with it.
EDIT: There is already another package using the name kube-janitor
so this package will have to be published under another name though. (https://github.com/helm/charts/blob/master/incubator/kube-janitor/Chart.yaml)
It would be good if developers and Travis agent don't have to install dependencies such as pipenv
, and just use Docker?
Docker can help with dep isolation, just as what virtualenv would help with. Not having to install a bunch of deps makes the development env more portable.
Some CRDs apparently are not appearing in logs, investigate.
kube-janitor/kube_janitor/janitor.py
Line 243 in 29c5152
expiry_timestamp is -1 when ttl is set to "forever", thus condition now>expiry_timestamp
evaluates to true and pod is deleted
kube-janitor/kube_janitor/helper.py
Line 27 in 29c5152
what if in parse_ttl() when ttl is set to "forever" we return (current_time + some_large_number_of_days(maybe 365d)). In this way it will not be deleted ever
The current TTL annotation (janitor/ttl
) always denotes a maximum time to live counted from the time of creation of the resource. Sometimes it's desired to mark resources for deletion at arbitrary times in the future. Use case example:
Proposal: support a new annotation janitor/expires
which accepts an absolute timestamp in the format YYYY-MM-DDTHH:MM:SSZ
(same format as Kubernetes creationTimestamp
). Resources should be deleted if their janitor/expires
timestamp lies in the past.
I have a custom resource whose count in all namespaces is ~770. I ran janitor in debug mode and it prints
But it does not delete that object. I exported the object to my docker-for-desktop, it prints same output but this time it is deleted
It is not about permissions because I modified clusterrole and this time output is:
Here it is the last line of main cluster:
Clean up run completed: resources-processed=3762, X-with-ttl=17, X-deleted=17, rule-require-completed-or-failed-on-not-default-matches=16
When running kube-janitor on our cluster, we got 503's when attempting to access the api metrics/v1alpha1
.
2019-08-12 05:06:00,534 DEBUG: Collecting resources for metrics/v1alpha1..
2019-08-12 05:06:00,734 DEBUG: https://172.20.0.1:443 "GET /apis/metrics/v1alpha1/ HTTP/1.1" 503 20
2019-08-12 05:06:00,734 ERROR: Failed to clean up: 503 Server Error: Service Unavailable for url: https://172.20.0.1:443/apis/metrics/v1alpha1/
Traceback (most recent call last):
File "/kube_janitor/main.py", line 51, in run_loop
dry_run=dry_run)
File "/kube_janitor/janitor.py", line 215, in clean_up
for _type in resource_types:
File "/kube_janitor/resources.py", line 39, in get_namespaced_resource_types
for api_version, resource in discover_namespaced_api_resources(api):
File "/kube_janitor/resources.py", line 32, in discover_namespaced_api_resources
r2.raise_for_status()
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://172.20.0.1:443/apis/metrics/v1alpha1/
This prevented kube-janitor from running at all on our cluster.
Is it by design supposed to fail when one API is inaccessible?
In addition(and I think this has been raised in another issue), this would also be avoided in our specific use-case if we didn't iterate through all but instead just the required APIs(That, of course, doesn't address the root cause).
Hello,
I've installed kube-janitor
as instructed: git clone
, kubectl apply -f deploy/common/
, then kubectl apply -f deploy/deployment/
. However, I'm getting the following error in the pod logs. I'm not sure where 10.222.0.1
is coming from as there aren't any nodes in cluster (inlcuding masters) with that IP.
Any ideas?
Best,
Greg
(This says 0.6
but I've also tried master
).
2019-08-06 05:39:49,669 INFO: Janitor v0.6 started with debug=True, delete_notification=None, dry_run=True, exclude_namespaces=kube-system, exclude_resources=events,controllerrevisions, include_namespaces=all, include_resources=all, interval=60, once=False, rules_file=/config/rules.yaml
2019-08-06 05:39:49,669 INFO: **DRY-RUN**: no deletions will be performed!
2019-08-06 05:39:49,688 INFO: Loaded 2 rules from file /config/rules.yaml
2019-08-06 05:39:49,697 DEBUG: Starting new HTTPS connection (1): 10.222.0.1:443
2019-08-06 05:39:49,714 DEBUG: https://10.222.0.1:443 "GET /api/v1/namespaces HTTP/1.1" 403 294
2019-08-06 05:39:49,715 ERROR: Failed to clean up: 403 Client Error: Forbidden for url: https://10.222.0.1:443/api/v1/namespaces
Traceback (most recent call last):
File "/kube_janitor/main.py", line 51, in run_loop
dry_run=dry_run)
File "/kube_janitor/janitor.py", line 201, in clean_up
for namespace in Namespace.objects(api):
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 148, in __iter__
return iter(self.query_cache["objects"])
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 138, in query_cache
cache["response"] = self.execute().json()
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 123, in execute
r.raise_for_status()
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://10.222.0.1:443/api/v1/namespaces
We have a custom resource which does not get deleted by kube-janitor.
The CRD (without schema) is defined as:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: eventstreams.zalando.org
spec:
conversion:
strategy: None
group: zalando.org
names:
kind: EventStream
listKind: EventStreamList
plural: eventstreams
shortNames:
- fes
singular: eventstream
preserveUnknownFields: true
scope: Namespaced
versions:
- name: v1alpha1
served: true
storage: true
subresources:
status: {}
An instance of the crd uses a kube-janitor annotation:
# kubectl get fes my-fes -o yaml
apiVersion: zalando.org/v1alpha1
kind: EventStream
metadata:
annotations:
deployment-time: "2020-02-20T12:13:49Z"
janitor/ttl: 1d
But even after the time has passed it is not deleted.
From an initial investigation, it appears that kube-janitor simply doesn't list the resources under this crd when it is listing all resources. If it was failing to delete them there would have been error messages in the logs.
This annotation is injected during deployment time, so the janitor should prefer using it instead of creationTime
. Currently it'll remove a deployment 2 days after it was created even if the users redeploy every hour, which is definitely not what anyone would expect.
If I run --include-namespaces=kube-system
without explicitly changes --exclude-namespaces
, kube-janitor
does not overwrite the exclusion. Is this by design, or should this be a change?
This is a proposal for an additional feature, so I wanted to get feedback on the idea before starting on an implementation.
Our use of kube-janitor at Ecosia is as follows: we create a QA/review environment as a k8s namespace for every PR, and after a TTL has been reached (eg. 7 days), we delete the namespace. We would rather automatically delete the namespace based on the PR status change (eg. closed or merged), and weβve tried a number of techniques for this (CI on the merge commit, github actions, etc) and none are particularly consistent or clean, yet.
It occurred to me the other day that for our use-case, kube-janitor could have a different kind of annotation, eg janitor/github-pr
or janitor/github-branch
, that would use the github API to check if the PR is open, or the branch exists, and remove the annotated resource when that condition are no longer met. In summary, if the annotation janitor/github-pr: βecosia/example-repo/101β
existed, kube-janitor would use the github api to check if the PR number 101 on the repo βecosia/example-repoβ was still in βopenβ status.
Please let me know if this is a feature youβd be willing to include and if so, I can try, sometime in the near future, to take a crack at an implementation.
Cheers! π
I'm having a problem with kube-janitor where the debug log shows success on deleting a resource:
2020-01-02 20:01:38,929 DEBUG: Rule backtest-configmaps with JMESPath "starts_with(metadata.name, 'backtest-params-')" evaluated for ConfigMap backtest/backtest-params-536746: True
2020-01-02 20:01:38,929 DEBUG: Rule backtest-configmaps applies 48h TTL to ConfigMap backtest/backtest-params-536746
2020-01-02 20:01:38,929 DEBUG: ConfigMap backtest-params-536746 with 48h TTL is 2d19h34m29s old
2020-01-02 20:01:38,929 INFO: ConfigMap backtest-params-536746 with 48h TTL is 2d19h34m29s old and will be deleted (rule backtest-configmaps matches)
2020-01-02 20:01:38,934 DEBUG: https://172.20.0.1:443 "POST /api/v1/namespaces/backtest/events HTTP/1.1" 201 867
2020-01-02 20:01:38,934 INFO: Deleting ConfigMap backtest/backtest-params-536746..
2020-01-02 20:01:38,941 DEBUG: https://172.20.0.1:443 "DELETE /api/v1/namespaces/backtest/configmaps/backtest-params-536746 HTTP/1.1" 200 None
but the resource doesn't get deleted. Any idea of what might be happening?
Rules file:
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-janitor
namespace: backtest
data:
rules.yaml: |-
rules:
- id: backtest-jobs
resources:
- jobs
jmespath: "starts_with(metadata.name, 'backtest-')"
ttl: 48h
- id: backtest-configmaps
resources:
- configmaps
jmespath: "starts_with(metadata.name, 'backtest-params-')"
ttl: 48h
kube-janitor
will leave behind an old replicaset when you edit the deployment for kube-janitor
Steps to reproduce:
kubectl apply -f deploy/
kubectl edit deploy kube-janitor
kubectl get replicasets
For test or prototyping environments/clusters, it can be desirable to automatically calculate a TTL for resources based on certain rules, e.g.:
Some people run Deployments with volumes. The PVC check should consider this. It's pretty trivial to add a check for Deployments here: https://github.com/hjacobs/kube-janitor/blob/master/kube_janitor/resource_context.py#L19
Looks like kube-janitor is expecting clusterrole permission level.
However for our least privileged approach we cannot grant clusterrole level permission.
@hjacobs could a namespace limited access via Role rather be supported?
β 2020-03-11T15:53:13.726691299Z requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://10.100.0.1:443/api/v1/namespaces β
β 2020-03-11T15:53:23.731598165Z 2020-03-11 15:53:23,731 DEBUG: Starting new HTTPS connection (1): 10.100.0.1 β
β 2020-03-11T15:53:23.73769914Z 2020-03-11 15:53:23,737 DEBUG: https://10.100.0.1:443 "GET /api/v1/namespaces HTTP/1.1" 403 297 β
β 2020-03-11T15:53:23.738242474Z 2020-03-11 15:53:23,737 ERROR: Failed to clean up: 403 Client Error: Forbidden for url: https://10.100.0.1:443/api/v1/namespaces β
β 2020-03-11T15:53:23.738259476Z Traceback (most recent call last): β
β 2020-03-11T15:53:23.738264047Z File "/kube_janitor/main.py", line 66, in run_loop β
β 2020-03-11T15:53:23.738267899Z clean_up( β
β 2020-03-11T15:53:23.738271363Z File "/kube_janitor/janitor.py", line 279, in clean_up β
β 2020-03-11T15:53:23.738274853Z for namespace in Namespace.objects(api): β
β 2020-03-11T15:53:23.738278123Z File "/usr/local/lib/python3.8/site-packages/pykube/query.py", line 196, in __iter__ β
β 2020-03-11T15:53:23.738282166Z return iter(self.query_cache["objects"]) β
β 2020-03-11T15:53:23.738285887Z File "/usr/local/lib/python3.8/site-packages/pykube/query.py", line 186, in query_cache β
β 2020-03-11T15:53:23.738297474Z cache["response"] = self.execute().json() β
β 2020-03-11T15:53:23.738301192Z File "/usr/local/lib/python3.8/site-packages/pykube/query.py", line 161, in execute β
β 2020-03-11T15:53:23.738304959Z r.raise_for_status() β
β 2020-03-11T15:53:23.738308315Z File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 940, in raise_for_status β
β 2020-03-11T15:53:23.738312089Z raise HTTPError(http_error_msg, response=self)
I have a small k3s cluster running on a couple Raspberry Pis, and with the size of the cluster, som automatic clean up of stuff I forget would be very helpful.
Any chance you can build a multi-arch image?
PS.
You could try out https://pypi.org/project/dockerma/ for a simple way to build multi-arch.
Disclaimer: I'm the author of dockerma, and as far as I know I'm the only one who has used it so far π.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.