hjacobs / kube-resource-report Goto Github PK
View Code? Open in Web Editor NEWReport Kubernetes cluster and pod resource requests vs usage and generate static HTML
License: GNU General Public License v3.0
Report Kubernetes cluster and pod resource requests vs usage and generate static HTML
License: GNU General Public License v3.0
If you would sort by status code decreasing, then you find easily old broken ingress resources.
Switch from flat output directory to a more organized structure, e.g. having "clusters" and "teams" folders.
First question looking at the cpu and memory usage of some of our applications was how the usage is calculated. Is this a min/max/avg per day/week/month?
We have an application that just consists of a cronjob that shows up as having 0 CPU and memory usage, even though when though it should use quite a lot of memory when it's running.
Users might want to exclude certain clusters from the report (e.g. because the report generator does not care about them).
Proposed CLI option: --exclude-clusters=REGEX_PATTERNS
(?)
Provide Docker image and instructions on how to run kube-resource-report on Kubernetes incl. automatic report generation and web server.
Heapster is deprecated, see also hjacobs/kube-ops-view#168
Seperate the Kubernetes kube-system namespace resources to see the "infrastructure" costs and resource consumption. To understand wether more nodes or less nodes is beneficial and how much.
There's a bunch of assumptions in the code that the kubernetes.io/role
label will be set to worker
kubeadm
for example uses the word node
to denote the node role:
k get nodes -o json | jq '.items[].metadata.labels."kubernetes.io/role"'
"master"
"master"
"node"
"node"
"node"
"node"
"master"
"node"
"node"
"node"
With 0.9 I'm only getting metrics for the pods of the namespace where kube-resource-report is located in, not having this issue with 0.8.
The official Kubernetes client is too big and only a tiny fraction of it is used, kelproject/pykube is abandoned, so I forked it to https://github.com/hjacobs/pykube
We get errors like in our access logs:
[09/Jul/2019:06:34:24 +0000] "GET / HTTP/1.1" 502 12 "-" "kube-resource-report/0.13" 1017 someapp-pr-85-1.$domain - -'
If kube-resource-report would be aware, that a stack was downscaled, then it would not need to query it. The same is true for old ingress, that were not cleaned up in general.
There should be a simple way to add custom links for resources (clusters, applications, teams) to link to external systems, e.g. monitoring dashboards or similar.
Each link definition should have:
https://mon.example.org/clusters/{id}
would generate a link to https://mon.example.org/clusters/123
for the cluster with ID 123TBD
Each application-*.html
page is around 800K, mainly because of whitespace generated by the template loop ๐
The script does not currently remove HTML files which are no longer generated (e.g. if a cluster/team/application was shut down and no longer exists).
There are some "hidden" costs which are not accounted for right now. The script should have an option to specify this fixed per-cluster costs such as etcd nodes, load balancers, etc.
Trying to run the Docker command on an EKS cluster and getting the following error. kubectl proxy
is already running:
eks $ docker run -it --user=$(id -u) --net=host -v $(pwd)/output:/output hjacobs/kube-resource-report:0.2.1 /output
INFO:kube_resource_report.report:Querying cluster localhost:8001 (http://localhost:8001/)..
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:8001
ERROR:kube_resource_report.report:HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8257213c18>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 171, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 79, in create_connection
raise err
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 69, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.7/http/client.py", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1275, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1224, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1016, in _send_output
self.send(msg)
File "/usr/local/lib/python3.7/http/client.py", line 956, in send
self.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 196, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 180, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f8257213c18>: Failed to establish a new connection: [Errno 111] Connection refused
Any ideas?
The AWS pricing information is currently hardcoded and only eu-central-1
is supported. Using the AWS pricing API will automatically support all AWS EC2 instances and regions: http://boto3.readthedocs.io/en/latest/reference/services/pricing.html
Currently, we can only pass one node label, but it should be possible to pass a list of labels (like for system-namespaces
).
@click.option(
"--node-label",
help="Value for the kubernetes.io/role label (e.g. 'worker' if nodes are labeled kubernetes.io/role=worker)",
default="worker",
)
The Docker image is ~141MiB large --- here the top Python packages:
du -cs usr/local/lib/python3.7/site-packages/* | sort -n | tail
916 usr/local/lib/python3.7/site-packages/jinja2
924 usr/local/lib/python3.7/site-packages/chardet
924 usr/local/lib/python3.7/site-packages/pkg_resources
1012 usr/local/lib/python3.7/site-packages/oauthlib
1380 usr/local/lib/python3.7/site-packages/setuptools
1892 usr/local/lib/python3.7/site-packages/virtualenv_support
7548 usr/local/lib/python3.7/site-packages/pip
19164 usr/local/lib/python3.7/site-packages/kubernetes
21352 usr/local/lib/python3.7/site-packages/pipenv
64272 total
It would be not a problem to get rid of pipenv, but the Kubernetes client alone is also 19 MiB large ๐
Getting the following error when using kubectl proxy
INFO:kube_resource_report.report:Querying cluster localhost:8001 (http://localhost:8001/)..
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:8001
ERROR:kube_resource_report.report:HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f9ef4bc50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 171, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 79, in create_connection
raise err
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 69, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
The Kubernetes API allows to list all persistent volumes and knowing the cloud provider we can estimate the costs, e.g. of EBS volumes.
I am trying to create a reports for multiple cluster reports in a single cluster can any one help me on this
It was noted that pending pods show up in the application view of the resource report.
Not sure how to do it, but resource limits are currently not considered at all and might help in analysing resource usage.
Related to #5: make the pricing information (cost per node) pluggable to also work for other cloud providers (e.g. GKE) and on-premise.
Reported via private email:
I deployed kube-resource-report inside my Kubernetes cluster. I'm getting the error "Failed to query cluster x-x-0-1:443: 'host'". There's no other indication of what's wrong. The IP is correct for my API server. I've set up RBAC as needed, I can hit the pod via port-forwarding, but it's not able to talk to my Kubernetes server. I don't suppose you have any ideas about what might beร causing this? Am I misusing your app (as in it's not meant to be deployed this way)?
Now, we have this:
# TODO: this should be configurable
NODE_LABEL_SPOT = "aws.amazon.com/spot"
NODE_LABEL_ROLE = "kubernetes.io/role"
# the following labels are used by both AWS and GKE
NODE_LABEL_REGION = "failure-domain.beta.kubernetes.io/region"
NODE_LABEL_INSTANCE_TYPE = "beta.kubernetes.io/instance-type"
but, some people use different labels.
You can potentially save X USD every month by optimizing resource requests and reducing slack.
How? It would be good to have a link there suggesting, if not specific, then at least general options available.
We currently rely on snapshot metrics from Metrics API (Heapster) which only has data for the last minute. Having some historic data would be desirable to compensate for volatile usage patterns (e.g. one pod could spike in the minute we observe it or vice versa, i.e. it might usually use more resource).
In the team view (applications list), I think it could be useful to add another column with an information about what percentage of monthly cost is each application consuming.
Assume a certain structure of pod labels to identify resource owners. The application
or app
label should point to a valid application ID. An optional application registry (REST service) can provide additional information about the owning team.
Provide an example of how to deploy the reporting script into a Kubernetes cluster, e.g. via CronJob + nginx for serving the static HTML.
The report should support a pricing model per unit, e.g. like AWS Fargate:
Price per vCPU is $0.00001406 per second ($0.0506 per hour) and per GB memory is $0.00000353 per second ($0.0127 per hour).
Add some top-level JSON file with overall statistics/KPIs to be scraped by a monitoring system in order to track KPIs over time.
Why don't you want to create helm chart? Its would be very useful for deploy to many people...
- apiVersion: v1
kind: Node
metadata:
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: s-1vcpu-1gb
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: fra1
kubernetes.io/hostname: flamboyant-grothendieck-ie1
region: fra1
The "system" (infrastructure) namespaces are currently hardcoded to kube-system
and visibility
, but cluster operators might have a different Kubernetes setup and deploy infrastructure components to other namespaces. Add a CLI option to make the namespaces configurable.
The first run might take some time and content pages are generated before the assets (CSS, ..) are copied, so the layout might look broken.
application-metrics.json
only provides a summary (e.g. number of pods), we should also expose all details (e.g. pod details) per application via JSON.
OAuth token is not passed correctly, seems to be related to #28
The generated HTML is not responsive, e.g. there is no menu shown (impossible to navigate anywhere).
The pricing for nodes running on EC2 Spot are dynamic and need to be taking into consideration for the cluster/application cost calculations.
Application developers and service owners might see the requests to check the Ingress status and ask what causes those requests. Setting an appropriate User Agent helps to identify kube-resource-report
as the culprit for those requests.
I have $KUBECONFIG
which contains several paths and I can't find a way how to ask the tool to read one of them that is not in ~/.kube/config
.
It would be nice if kube-resource-report
reads all available configurations from $KUBECONFIG as kubectl
does.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.