caas-team / caas-carbon-footprint Goto Github PK

View Code? Open in Web Editor NEW

10.0 3.0 0.0 7.41 MB

Support Sustainable Computing to provide customer with metrics for their carbon footprint workload

Dockerfile 4.57% Smarty 9.04% Python 86.39%

carbon-emissions container entso-e entsoe entsoe-api kepler kubernetes sustainability

caas-carbon-footprint's People

Contributors

Stargazers

Watchers

caas-carbon-footprint's Issues

Grafana dashboards aren't populated properly when the scrape config interval is too high

After updating the scrape interval of the service monitor for kepler to a higher value, the default dashboards aren't displaying any data:

This can be addressed if the granularity of the grafana queries is turned down; instead of grabbing the rates over 1m, 3-5m should be fine at first.

The affected dashboards are:

Pod/Process Power Consumption (W) in Namespace
Pod/Process CO2 FOS Emission (C02g/h) in Namespace
Total Power Consumption (W) in Namespace
Total Power Consumption (PKG+DRAM+OTHER+GPU) by Namespace (kWh per day)

The last dashaboard isn't available anymore, because the metric kepler_container_joules_total isn't being exposed anymore and must be calculated separately.

The same must be done for the caas-project-monitoring kepler dashboards.

Entsoe crash with ZeroDivisionError

since a day Entsoe return Error 500 while crashing the flask app:

10.42.70.250 - - [05/Feb/2024:08:31:45 +0000] "GET /metrics HTTP/1.1" 500 20 "-" "Prometheus/2.46.0"
[2024-02-05 08:35:54,011] ERROR in app: Exception on /metrics [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 869, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/appuser/app.py", line 171, in metrics
    result_eco = (int(result_b01) + int(result_b09) + int(result_b10) + int(result_b11) + int(result_b12) + int(result_b16) + int(result_b17) + int(result_b18) + int(result_b19)) / int(result_sum)
ZeroDivisionError: division by zero

Summary power consumption of multiple clusters

As a requirement we need to know, how much power consumption has our platform in general, that means multiple cluster on multiple environments. If we have no multi-cluster monitoring in place, we can collect the information from each cluster:

The current power consumption of container workload in Joule. 1 Joule = 1 Wattsekunde = 1 VAs. This can be a very large number:

kubectl curl -n cattle-monitoring-system  "http://prometheus-rancher-monitoring-prometheus-0:9090/api/v1/query?query=sum(kepler_container_package_joules_total)" | jq -r '.data.result[]|.value[-1]'
128948308.37400006

ask the same and convert to more readable, let's say MegaJoule

kubectl curl -n cattle-monitoring-system  "http://prometheus-rancher-monitoring-prometheus-0:9090/api/v1/query?query=sum(kepler_container_package_joules_total)%2F1000%2F1000" | jq -r '.data.result[]|.value[-1]'
128.96780355300004

The daily power consumption, collected in the common metric kWh:

kubectl curl -n cattle-monitoring-system  "http://prometheus-rancher-monitoring-prometheus-0:9090/api/v1/query?query=sum(increase(kepler_container_package_joules_total%5B24h%3A1m%5D))%20*%200.00000027777777777" | jq -r '.data.result[]|.value[-1]'
10.91925820424631

This query is copied from the Kepler Grafana dashboard with the converting "watt_per_second_to_kWh", which is factor 0.0000002777777777 (1W*s = 1J and 1J = (1/3600000)kWh)

The same query for one hour

kubectl curl -n cattle-monitoring-system  "http://prometheus-rancher-monitoring-prometheus-0:9090/api/v1/query?query=sum(increase(kepler_container_package_joules_total%5B1h%3A1m%5D))%20*%200.00000027777777777" | jq -r '.data.result[]|.value[-1]'
0.4499151308907774

Which is a better visualization for a status page or status dashboard? Joule is in real time (in the second), but not very common.

Cc: @y-eight

hint: data collected via kubectl, curl plugin to ask Prometheus API on Prometheus Pod.

update kepler

Values.yml shows 0.6.1 as the kepler version. 0.7.x is working on my machine, 0.6.x is not. 0.7.2 is the current version.

caas-team / caas-carbon-footprint Goto Github PK

caas-carbon-footprint's People

Contributors

Stargazers

Watchers

caas-carbon-footprint's Issues

Grafana dashboards aren't populated properly when the scrape config interval is too high

Entsoe crash with ZeroDivisionError

Summary power consumption of multiple clusters

update kepler

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent