Giter Club home page Giter Club logo

caas-carbon-footprint's People

Contributors

eumel8 avatar puffitos avatar y-eight avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

caas-carbon-footprint's Issues

Grafana dashboards aren't populated properly when the scrape config interval is too high

After updating the scrape interval of the service monitor for kepler to a higher value, the default dashboards aren't displaying any data:

image

This can be addressed if the granularity of the grafana queries is turned down; instead of grabbing the rates over 1m, 3-5m should be fine at first.

The affected dashboards are:

  • Pod/Process Power Consumption (W) in Namespace
  • Pod/Process CO2 FOS Emission (C02g/h) in Namespace
  • Total Power Consumption (W) in Namespace
  • Total Power Consumption (PKG+DRAM+OTHER+GPU) by Namespace (kWh per day)

The last dashaboard isn't available anymore, because the metric kepler_container_joules_total isn't being exposed anymore and must be calculated separately.

The same must be done for the caas-project-monitoring kepler dashboards.

Entsoe crash with ZeroDivisionError

since a day Entsoe return Error 500 while crashing the flask app:

10.42.70.250 - - [05/Feb/2024:08:31:45 +0000] "GET /metrics HTTP/1.1" 500 20 "-" "Prometheus/2.46.0"
[2024-02-05 08:35:54,011] ERROR in app: Exception on /metrics [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 869, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/appuser/app.py", line 171, in metrics
    result_eco = (int(result_b01) + int(result_b09) + int(result_b10) + int(result_b11) + int(result_b12) + int(result_b16) + int(result_b17) + int(result_b18) + int(result_b19)) / int(result_sum)
ZeroDivisionError: division by zero

Summary power consumption of multiple clusters

As a requirement we need to know, how much power consumption has our platform in general, that means multiple cluster on multiple environments. If we have no multi-cluster monitoring in place, we can collect the information from each cluster:

  1. The current power consumption of container workload in Joule. 1 Joule = 1 Wattsekunde = 1 VAs. This can be a very large number:
kubectl curl -n cattle-monitoring-system  "http://prometheus-rancher-monitoring-prometheus-0:9090/api/v1/query?query=sum(kepler_container_package_joules_total)" | jq -r '.data.result[]|.value[-1]'
128948308.37400006

ask the same and convert to more readable, let's say MegaJoule

kubectl curl -n cattle-monitoring-system  "http://prometheus-rancher-monitoring-prometheus-0:9090/api/v1/query?query=sum(kepler_container_package_joules_total)%2F1000%2F1000" | jq -r '.data.result[]|.value[-1]'
128.96780355300004
  1. The daily power consumption, collected in the common metric kWh:
kubectl curl -n cattle-monitoring-system  "http://prometheus-rancher-monitoring-prometheus-0:9090/api/v1/query?query=sum(increase(kepler_container_package_joules_total%5B24h%3A1m%5D))%20*%200.00000027777777777" | jq -r '.data.result[]|.value[-1]'
10.91925820424631

This query is copied from the Kepler Grafana dashboard with the converting "watt_per_second_to_kWh", which is factor 0.0000002777777777 (1W*s = 1J and 1J = (1/3600000)kWh)

The same query for one hour

kubectl curl -n cattle-monitoring-system  "http://prometheus-rancher-monitoring-prometheus-0:9090/api/v1/query?query=sum(increase(kepler_container_package_joules_total%5B1h%3A1m%5D))%20*%200.00000027777777777" | jq -r '.data.result[]|.value[-1]'
0.4499151308907774

Which is a better visualization for a status page or status dashboard? Joule is in real time (in the second), but not very common.

Cc: @y-eight

hint: data collected via kubectl, curl plugin to ask Prometheus API on Prometheus Pod.

update kepler

Values.yml shows 0.6.1 as the kepler version. 0.7.x is working on my machine, 0.6.x is not. 0.7.2 is the current version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.