hjacobs / kube-resource-report Goto Github PK

View Code? Open in Web Editor NEW

714.0 18.0 102.0 1.24 MB

Report Kubernetes cluster and pod resource requests vs usage and generate static HTML

License: GNU General Public License v3.0

Python 42.46% HTML 54.19% CSS 1.90% Shell 0.22% Dockerfile 0.20% Makefile 0.38% JavaScript 0.65%

kubernetes kubernetes-resources kubernetes-cluster

kube-resource-report's Introduction

Moved to https://codeberg.org/hjacobs/kube-resource-report

kube-resource-report's People

Contributors

Stargazers

Watchers

Forkers

plarivee shinzu dneuhaeuser-zalando jaxxstorm veeramarni whachoe sakomws bszwej asosso tmatias lnvieira scherniavsky vsamidurai aledbf srobinson binnyrs tomislater zanaca johscheuer mrandi shivachinthala pettersolberg88 fuhbar richerve isindir jonhuynh05 rosscdh viafoura stevengonsalvez sahilbadla27 rcontisplk kareem-elsayed avaussant eddycharly bbolroc alan-buaa nvtkaszpir freenowtech batermj jiajie999 somaliz guillermodep teejaded thedemodev yurrriq aa047267 dsumsky anupama-p tylern91 g-sree alexanderyastrebov priitpaasukene xiii pankaj2934 carlpett pitr chaitushiva guilhem yashbhutwala gavinbunney dominicgunn pramine shankar-moeng jasonwittingtw khansuhel logicfox halcyondude blacs30 jalawala slimakcz szuecs ahoka gitkent bpschmitt makelefy f4tq unguiculus kessko thebuttonclan ylascombe jordan-huangwei tokyo2006 nazmulcs eagle1981 ranjanprj manuvaldi survivant sachin-net syllogy pussycat0x m-adnan8080 sunilkum84 balaclavalab dhimanm swap743 k8-hg tngraceenough

kube-resource-report's Issues

sort ingress by status code

If you would sort by status code decreasing, then you find easily old broken ingress resources.

Charts legend

Would be great to have a legend with a mouse hover or somewhere in the page.
Is not clear what are the color representing.

Clean output directory structure

Switch from flat output directory to a more organized structure, e.g. having "clusters" and "teams" folders.

CPU/Memory usage calculation and time frame

First question looking at the cpu and memory usage of some of our applications was how the usage is calculated. Is this a min/max/avg per day/week/month?

We have an application that just consists of a cronjob that shows up as having 0 CPU and memory usage, even though when though it should use quite a lot of memory when it's running.

Allow filtering out clusters (include/exclude)

Users might want to exclude certain clusters from the report (e.g. because the report generator does not care about them).

Proposed CLI option: --exclude-clusters=REGEX_PATTERNS (?)

Automate report generation and serving files

Provide Docker image and instructions on how to run kube-resource-report on Kubernetes incl. automatic report generation and web server.

Add support for passing environment variables in helm chart

Replace Heapster with Metrics Server

Heapster is deprecated, see also hjacobs/kube-ops-view#168

Seperate the Kubernetes kube-system namespace resources

Seperate the Kubernetes kube-system namespace resources to see the "infrastructure" costs and resource consumption. To understand wether more nodes or less nodes is beneficial and how much.

Usage of "worker" as hardcoded worker name breaks some functionality

There's a bunch of assumptions in the code that the kubernetes.io/role label will be set to worker

kubeadm for example uses the word node to denote the node role:

k get nodes -o json | jq '.items[].metadata.labels."kubernetes.io/role"'
"master"
"master"
"node"
"node"
"node"
"node"
"master"
"node"
"node"
"node"

Only metrics of current namespace

With 0.9 I'm only getting metrics for the pods of the namespace where kube-resource-report is located in, not having this issue with 0.8.

Switch to pykube-ng as Kubernetes client

The official Kubernetes client is too big and only a tiny fraction of it is used, kelproject/pykube is abandoned, so I forked it to https://github.com/hjacobs/pykube

Problem

We get errors like in our access logs:

[09/Jul/2019:06:34:24 +0000] "GET / HTTP/1.1" 502 12 "-" "kube-resource-report/0.13" 1017 someapp-pr-85-1.$domain - -'

possible Solution

If kube-resource-report would be aware, that a stack was downscaled, then it would not need to query it. The same is true for old ingress, that were not cleaned up in general.

Allow custom links to other systems (monitoring tools etc)

There should be a simple way to add custom links for resources (clusters, applications, teams) to link to external systems, e.g. monitoring dashboards or similar.

Each link definition should have:

href/URL with templating (using Python string formatting), e.g. https://mon.example.org/clusters/{id} would generate a link to https://mon.example.org/clusters/123 for the cluster with ID 123
title (tooltip)
icon (Fontawesome icon name)

Generated application HTML pages are huge because of whitespace

Each application-*.html page is around 800K, mainly because of whitespace generated by the template loop 😞

Clean up (remove) stale HTML files

The script does not currently remove HTML files which are no longer generated (e.g. if a cluster/team/application was shut down and no longer exists).

Allow specifying additional fixed cluster costs (e.g. etcd nodes, ELBs, ..)

There are some "hidden" costs which are not accounted for right now. The script should have an option to specify this fixed per-cluster costs such as etcd nodes, load balancers, etc.

Cannot generate report with Amazon EKS

Trying to run the Docker command on an EKS cluster and getting the following error. kubectl proxy is already running:

eks $ docker run -it --user=$(id -u) --net=host -v $(pwd)/output:/output hjacobs/kube-resource-report:0.2.1 /output
INFO:kube_resource_report.report:Querying cluster localhost:8001 (http://localhost:8001/)..
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:8001
ERROR:kube_resource_report.report:HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8257213c18>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 171, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 79, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 69, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 196, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 180, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f8257213c18>: Failed to establish a new connection: [Errno 111] Connection refused

Any ideas?

The AWS pricing information is currently hardcoded and only eu-central-1 is supported. Using the AWS pricing API will automatically support all AWS EC2 instances and regions: http://boto3.readthedocs.io/en/latest/reference/services/pricing.html

Pass more node labels than one

Currently, we can only pass one node label, but it should be possible to pass a list of labels (like for system-namespaces).

@click.option(
    "--node-label",
    help="Value for the kubernetes.io/role label (e.g. 'worker' if nodes are labeled kubernetes.io/role=worker)",
    default="worker",
)

Reduce Docker image size

The Docker image is ~141MiB large --- here the top Python packages:

du -cs usr/local/lib/python3.7/site-packages/* | sort -n | tail
916	usr/local/lib/python3.7/site-packages/jinja2
924	usr/local/lib/python3.7/site-packages/chardet
924	usr/local/lib/python3.7/site-packages/pkg_resources
1012	usr/local/lib/python3.7/site-packages/oauthlib
1380	usr/local/lib/python3.7/site-packages/setuptools
1892	usr/local/lib/python3.7/site-packages/virtualenv_support
7548	usr/local/lib/python3.7/site-packages/pip
19164	usr/local/lib/python3.7/site-packages/kubernetes
21352	usr/local/lib/python3.7/site-packages/pipenv
64272	total

It would be not a problem to get rid of pipenv, but the Kubernetes client alone is also 19 MiB large 😞

issue using kubectl proxy

Getting the following error when using kubectl proxy

INFO:kube_resource_report.report:Querying cluster localhost:8001 (http://localhost:8001/)..
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:8001
ERROR:kube_resource_report.report:HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f9ef4bc50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 171, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 79, in create_connection
    raise err
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 69, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Consider costs of persistent volumes (PV, e.g. EBS)

The Kubernetes API allows to list all persistent volumes and knowing the cloud provider we can estimate the costs, e.g. of EBS volumes.

Widen output columns

Cluster CPU and Mem values are not readable at current width (Cluster tab):

Some pods display poorly on the pods page:

How to configure multiple cluster reports in single report

I am trying to create a reports for multiple cluster reports in a single cluster can any one help me on this

Pending pods show up in application view

It was noted that pending pods show up in the application view of the resource report.

Consider resource limits

Not sure how to do it, but resource limits are currently not considered at all and might help in analysing resource usage.

Pluggable pricing information

Related to #5: make the pricing information (cost per node) pluggable to also work for other cloud providers (e.g. GKE) and on-premise.

Failed to query cluster x-x-0-1:443: 'host'

Reported via private email:

I deployed kube-resource-report inside my Kubernetes cluster. I'm getting the error "Failed to query cluster x-x-0-1:443: 'host'". There's no other indication of what's wrong. The IP is correct for my API server. I've set up RBAC as needed, I can hit the pod via port-forwarding, but it's not able to talk to my Kubernetes server. I don't suppose you have any ideas about what might beÂ causing this? Am I misusing your app (as in it's not meant to be deployed this way)?

Names of labels should be configurable

Now, we have this:

# TODO: this should be configurable
NODE_LABEL_SPOT = "aws.amazon.com/spot"
NODE_LABEL_ROLE = "kubernetes.io/role"
# the following labels are used by both AWS and GKE
NODE_LABEL_REGION = "failure-domain.beta.kubernetes.io/region"
NODE_LABEL_INSTANCE_TYPE = "beta.kubernetes.io/instance-type"

but, some people use different labels.

Team page: allow to see applications split by cluster

Suggest actions to take

You can potentially save X USD every month by optimizing resource requests and reducing slack.

How? It would be good to have a link there suggesting, if not specific, then at least general options available.

Keep historic metrics for higher accuracy

We currently rely on snapshot metrics from Metrics API (Heapster) which only has data for the last minute. Having some historic data would be desirable to compensate for volatile usage patterns (e.g. one pod could spike in the minute we observe it or vice versa, i.e. it might usually use more resource).

Add percentage of application cost in team view

In the team view (applications list), I think it could be useful to add another column with an information about what percentage of monthly cost is each application consuming.

Collect resource owners (applications/teams)

Assume a certain structure of pod labels to identify resource owners. The application or app label should point to a valid application ID. An optional application registry (REST service) can provide additional information about the owning team.

In-cluster deployment example

Provide an example of how to deploy the reporting script into a Kubernetes cluster, e.g. via CronJob + nginx for serving the static HTML.

Price per CPU core / GB memory

The report should support a pricing model per unit, e.g. like AWS Fargate:

Price per vCPU is $0.00001406 per second ($0.0506 per hour) and per GB memory is $0.00000353 per second ($0.0127 per hour).

https://aws.amazon.com/fargate/pricing/

Provide JSON with statistics/KPIs

Add some top-level JSON file with overall statistics/KPIs to be scraped by a monitoring system in order to track KPIs over time.

Helm chart

Why don't you want to create helm chart? Its would be very useful for deploy to many people...

Support DigitalOcean

- apiVersion: v1
  kind: Node
  metadata:
    labels:
      beta.kubernetes.io/arch: amd64
      beta.kubernetes.io/instance-type: s-1vcpu-1gb
      beta.kubernetes.io/os: linux
      failure-domain.beta.kubernetes.io/region: fra1
      kubernetes.io/hostname: flamboyant-grothendieck-ie1
      region: fra1

Make system namespaces configurable

The "system" (infrastructure) namespaces are currently hardcoded to kube-system and visibility, but cluster operators might have a different Kubernetes setup and deploy infrastructure components to other namespaces. Add a CLI option to make the namespaces configurable.