Giter Club home page Giter Club logo

hwpc-sensor's Introduction

Hardware Performance Counters (HwPC) Sensor

HardWare Performance Counter (HWPC) Sensor is a tool that monitor the Intel CPU performance counter and the power consumption of CPU.

Hwpc-sensor use the RAPL (Running Average Power Limit) technology to monitor CPU power consumption. This technology is only available on Intel Sandy Bridge architecture or higher.

The sensor use the perf API of the Linux kernel. It is only available on Linux and need to have root access to be used.

The sensor couldn’t be used in a virtual machine, it must access (via Linux kernel API) to the real CPU register to read performance counter values.

About

HwPC sensor is an open-source project developed by the Spirals research group (University of Lille 1 and Inria).

The documentation is available here

hwpc-sensor's People

Contributors

altor avatar dependabot[bot] avatar gfieni avatar ldesauw avatar pierrerustorange avatar rouvoy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hwpc-sensor's Issues

Issue: "event 'TSC' is invalid or unsupported by this machine"

I have an issue on a server where the sensor fails with following message : Could not get encoding for event 'TSC' : code -4.

Issues similar to this one have already been raised, but it is not the same problem than in #1 or #25 as the sensor is built here with the patched version of libpfm4 and I have been using the same sensor's container image successfully on other servers.

I suspect it has something to do with the version of the kernel and / or the generation of cpu used, but I couldn't find anything obvious in the libpfm4 source.

This is how the sensor is started:

/usr/bin/hwpc-sensor
     -n "sensor_$NODE_NAME" \
     -f "$REPORT_FREQ" \
     -r socket -U 127.0.0.1 -P  12000 \
     -s "rapl" -o -e "RAPL_ENERGY_PKG" \
     -s "msr"     -e "TSC" -e "APERF" -e "MPERF" \
     -c "core"    -e "CPU_CLK_THREAD_UNHALTED:REF_P" \
                  -e "CPU_CLK_THREAD_UNHALTED:THREAD_P" \
                  -e "LLC_MISSES"\
                  -e "INSTRUCTIONS_RETIRED"

And here is the full output of the sensor :

I: 22-07-26 10:15:23 build: version unknown (rev: unknown)
I: 22-07-26 10:15:23 uname: Linux 3.10.0-693.2.2.rt56.623.el7.x86_64 #1 SMP PREEMPT RT Thu Sep 14 16:53:49 CEST 2017 x86_64
I: 22-07-26 10:15:23 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 7 counters (4 general, 3 fixed)
I: 22-07-26 10:15:23 pmu: found perf 'perf_events generic PMU' having 133 events, 0 counters (0 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found rapl 'Intel RAPL' having 2 events, 3 counters (0 general, 3 fixed)
I: 22-07-26 10:15:23 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx 'Intel Skylake X' having 85 events, 11 counters (8 general, 3 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha0 'Intel SkylakeX CHA0 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha1 'Intel SkylakeX CHA1 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha2 'Intel SkylakeX CHA2 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha3 'Intel SkylakeX CHA3 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha4 'Intel SkylakeX CHA4 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha5 'Intel SkylakeX CHA5 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha6 'Intel SkylakeX CHA6 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha7 'Intel SkylakeX CHA7 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha8 'Intel SkylakeX CHA8 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha9 'Intel SkylakeX CHA9 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha10 'Intel SkylakeX CHA10 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha11 'Intel SkylakeX CHA11 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha12 'Intel SkylakeX CHA12 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha13 'Intel SkylakeX CHA13 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha14 'Intel SkylakeX CHA14 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha15 'Intel SkylakeX CHA15 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha16 'Intel SkylakeX CHA16 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha17 'Intel SkylakeX CHA17 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha18 'Intel SkylakeX CHA18 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha19 'Intel SkylakeX CHA19 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha20 'Intel SkylakeX CHA20 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha21 'Intel SkylakeX CHA21 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha22 'Intel SkylakeX CHA22 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha23 'Intel SkylakeX CHA23 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha24 'Intel SkylakeX CHA24 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha25 'Intel SkylakeX CHA25 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha26 'Intel SkylakeX CHA26 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_cha27 'Intel SkylakeX CHA27 uncore' having 99 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_iio0 'Intel SkylakeX IIO0 uncore' having 16 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_iio1 'Intel SkylakeX IIO1 uncore' having 16 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_iio2 'Intel SkylakeX IIO2 uncore' having 16 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_iio3 'Intel SkylakeX IIO3 uncore' having 16 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_iio4 'Intel SkylakeX IIO4 uncore' having 16 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_imc0 'Intel SkylakeX IMC0 uncore' having 46 events, 5 counters (4 general, 1 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_imc1 'Intel SkylakeX IMC1 uncore' having 46 events, 5 counters (4 general, 1 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_imc2 'Intel SkylakeX IMC2 uncore' having 46 events, 5 counters (4 general, 1 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_imc3 'Intel SkylakeX IMC3 uncore' having 46 events, 5 counters (4 general, 1 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_imc4 'Intel SkylakeX IMC4 uncore' having 46 events, 5 counters (4 general, 1 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_imc5 'Intel SkylakeX IMC5 uncore' having 46 events, 5 counters (4 general, 1 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_m2m0 'Intel SkylakeX M2M0 uncore' having 121 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_m2m1 'Intel SkylakeX M2M1 uncore' having 121 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_m3upi0 'Intel SkylakeX M3UPI0 uncore' having 111 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_m3upi1 'Intel SkylakeX M3UPI1 uncore' having 111 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_m3upi2 'Intel SkylakeX M3UPI2 uncore' having 111 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_pcu 'Intel SkylakeX PCU uncore' having 29 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_ubo 'Intel SkylakeX U-Box uncore' having 5 events, 3 counters (2 general, 1 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_upi0 'Intel SkylakeX UPI0 uncore' having 34 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_upi1 'Intel SkylakeX UPI1 uncore' having 34 events, 4 counters (4 general, 0 fixed)
I: 22-07-26 10:15:23 pmu: found skx_unc_upi2 'Intel SkylakeX UPI2 uncore' having 34 events, 4 counters (4 general, 0 fixed)
W: 22-07-26 10:15:23 Could not get encoding for event 'TSC' : code -4 
E: 22-07-26 10:15:23 config: event 'TSC' is invalid or unsupported by this machine
E: 22-07-26 10:15:23 config: failed to parse the provided command-line arguments

The CPU on this serveur is a Xeon Gold 6142, here is what lscpu returns:

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
Stepping:              4
CPU MHz:               2600.000
BogoMIPS:              5200.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              22528K
NUMA node0 CPU(s):     0-15,32-47
NUMA node1 CPU(s):     16-31,48-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

any idea ?

When running the sensor inside a container, it does not produce reports on containers

When running the sensor inside a container, it only generates report for the "all" and "" targets, no reports for any of containers is generated.
When running the sensor with the same command outside a container (installed using the .deb package https://github.com/powerapi-ng/hwpc-sensor/releases/tag/v1.1.0 ) it works fine.

This has been tested with

  • docker version 20.10.7, build 20.10.7-0ubuntu1~20.04.2
  • kernel 5.11.0-38-generic #42~20.04.1-Ubuntu SMP
  • sensor v1.1.0 for the deb package
  • sensor latest docker image from the docker hub, with an unreported version : build: version undefined (rev: undefined) (Sep 28 2021 - 14:40:24)

Receiving 'libczmq' missing error although czmq installed

I am trying to compile this on Fedora 37 and get the following error. The reason why I am using Fedora 37 is that it is on kernel 6.1 and the perf tool works for the newest INTEL CPUs. However, even after czmq is installed I still get this error.

[yuan@fedora hwpc-sensor-1.1.2]$ cmake .
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Checking for module 'libczmq'
--   Package 'libczmq', required by 'virtual:world', not found
CMake Error at /usr/share/cmake/Modules/FindPkgConfig.cmake:607 (message):
  A required package was not found
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPkgConfig.cmake:829 (_pkg_check_modules_internal)
  CMakeLists.txt:14 (pkg_check_modules)

Running the sensor without root

The sensor container must currently run as a privileged container and the user inside the container is root. This is generally an issue when running on productions systems as ops often require (and rightly so!) that we use non-root user inside our containers and that containers are not privileged.

The sensor can however run perfectly fine in a non-root container, provided it has been granted the correct capabilities

For example I use in my dockerfile

RUN    groupadd sensorgrp && \
    useradd -u 1001 -g sensorgrp --home-dir /home/powerapi powerapi  && \
    mkdir /home/powerapi

RUN setcap "cap_sys_admin=ep" /usr/bin/hwpc-sensor && \
    setcap -v "cap_sys_admin=ep" /usr/bin/hwpc-sensor

An at the end of the Docker file:

USER powerapi
ENTRYPOINT ["/usr/bin/run.sh"]

Then, when running the container, I add the capa

docker run --name sensortest -it --rm --cap-add SYS_ADMIN noroot-sensor:latest

Would you consider changing the sensor image so that it runs without root by default ?
It also requires disabling the check in sensor.c (

/* check if run as root */
)

CGroup v2 issue

Hi,

My distribution has cgroup v2 installed and I tried to follow the instructions from this issue:

  • to setup my docker in such a way that it uses cgroupfs
  • tried to disable the unified hierachy by adding the following in my grub config file
    GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"

Additionally, I tried to follow this to manually manage cgroups with cgroupfs, however, I seem to have errors when editing the cgroup.subtree_control file to add perf event controller

For reference, the only controllers I have active are: cpuset cpu io memory hugetlb pids rdma misc (cat /sys/fs/cgroup/cgroup.controllers)

Any recommendation for making this work on a Ubuntu 22.04 with a 5.15.0-58-generic kernel version? I am simply trying to use smartwatts and the sensor to measure some container level energy consumption on my machine...no kubernetes cluster or anything on top

Cannot build from local source files using Dockerfile

The docker file in the repository pulls the sources it compiles directly from two github repositories:

This means that it cannot be used to build the sensor in development with locally modified sources files.
That would be less of a problem it the sensor could be easily build using the provided makefile but no matter the dependencies I install I never managed to build outside docker on my ubuntu 20.04 box.

Cannot resolve name for containers running in kubernetes

When running the sensor (from the official powerapi/hwpc-sensor:latest image) in a kubernetes cluster, the sensor fails to resolve the name of the containers and does not monitor any container for which it does not find a name

Log:

I: 20-06-19 13:40:37 build: version unknown (rev: 7a63055ff2fdfbfdf776d42188a87de19882dd88) (Jul 18 2019 - 11:36:29)                                                                         
 I: 20-06-19 13:40:37 uname: Linux 5.3.0-59-generic #53~18.04.1-Ubuntu SMP Thu Jun 4 14:58:26 UTC 2020 x86_64                                                                                 
 I: 20-06-19 13:40:37 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 7 counters (4 general, 3 fixed)                                                                      
 I: 20-06-19 13:40:37 pmu: found perf 'perf_events generic PMU' having 191 events, 0 counters (0 general, 0 fixed)                                                                            
 I: 20-06-19 13:40:37 pmu: found rapl 'Intel RAPL' having 4 events, 3 counters (0 general, 3 fixed)                                                                                           
 I: 20-06-19 13:40:37 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)                                                                              
 I: 20-06-19 13:40:37 pmu: found skl 'Intel Skylake' having 83 events, 11 counters (8 general, 3 fixed)                                                                                       
 I: 20-06-19 13:40:37 pmu: found msr 'Intel MSR' having 6 events, 5 counters (0 general, 5 fixed)                                                                                             
 I: 20-06-19 13:40:37 sensor: configuration is valid, starting monitoring...                                                                                                                  
 I: 20-06-19 13:40:37 perf<all>: monitoring actor started                                                                                                                                     
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod3fe59be9-5416-4c4f-bb68-664e8fe666cd/807b14d95ee42d4281bc32a3bf5461 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod729286ef-0a0b-4564-b7b5-a0c447169564/6976fcfb5cebac0470aa120715cb79 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod75d8e723-8850-41af-8486-d88a94c6d301/ec761985f956f7b0364168f7650358 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod3fe59be9-5416-4c4f-bb68-664e8fe666cd/6b9c0ffb5214ea9f49c995f0fa8e05 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod75d8e723-8850-41af-8486-d88a94c6d301/488eba3ce355088674feebb79623d9 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/podb3685602-8406-41a0-95ce-8774c7c59103/76a4f691e461064fae31da537c81b7 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/podcb5c21b3-b556-4021-bac3-19ffd875751c/914b77cca24bd2682333b9dbc602ed 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/burstable/pode14b0994-f4da-4535-bebc-807f8969d1c2/28ce3bd812a83d6e6e21d924d8afc63 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod0e537378-aec8-4b28-8626-df2215de47de/1667851be5b15fb68da5bda264a343 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/poda2fa6ea4-fddc-4654-bb2a-256606c09bd1/3591eba18ecee2f97b4839d4b975481774db0b2da 
 

This issues happen on a k3s-based cluster, both when using docker or containerd as a container engine.

No data collected in 'core'-group

Hello,

I am trying to set up the HWPC Sensor with SmartWatts. As a first measure, I have tried to set both up according to the PowerAPI documentation. However the output of the HWPC Sensor doesn't correspond to what I expect. Data from the 'rapl' and 'msr' groups seem to be collected and passed on to MongoDB just fine, but there is no data from the 'core' group.

This is my config file for the Sensor:

{
  "name": "sensor",
  "verbose": true,
  "frequency": 500,
  "output": {
    "type": "mongodb",
    "uri": "mongodb://127.0.0.1",
    "database": "mongo_destination",
    "collection": "report_0"
  },
  "system": {
    "rapl": {
      "events": ["RAPL_ENERGY_PKG"],
      "monitoring_type": "MONITOR_ALL_CPU_PER_SOCKET"
    },
    "msr": {
      "events": ["TSC", "APERF", "MPERF"]
    }
  },
  "container": {
    "core": {
      "events": [
        "CPU_CLK_UNHALTED:REF_P",
	"CPU_CLK_UNHALTED:THREAD_P",
        "LLC_MISSES",
	"INSTRUCTIONS_RETIRED"
      ]
    }
  }
}

My output corresponds to the example given here: https://powerapi.org/reference/reports/reports/#hwpc-report
But I am only getting the second timestamp with the 'rapl' and 'msr' groups.

I am running on Debian 12 with an Intel i5-10210U.

Is there a way I can get this data? Since SmartWatts doesn't generate reports and I suspect it's because of these missing datapoints.

Thank you.

'RAPL_ENERGY_PKG' is invalid or unsupported by this machine

Hello everyone, running on

  • i5 13600k
  • Ubuntu 22.04.3 LTS
  • kernel 6.1.0-060100-generic
sudo docker run --rm \
--net=host \
--privileged \
--pid=host \
-v /sys:/sys \
-v /var/lib/docker/containers:/var/lib/docker/containers:ro \
-v /tmp/powerapi-sensor-reporting:/reporting \
-v $(pwd):/srv \
powerapi/hwpc-sensor \
-n "$(hostname -f)" \
-r "mongodb" -U "mongodb://127.0.0.1" -D "test" -C "prep" \
-s "rapl" -o -e "RAPL_ENERGY_PKG" \
-s "msr" -e "TSC" -e "APERF" -e "MPERF" \
-c "core" -e "CPU_CLK_UNHALTED:REF_P" -e "CPU_CLK_UNHALTED:THREAD_P" -e "LLC_MISSES" -e "INSTRUCTIONS_RETIRED"

I get this error

I: 23-11-19 18:26:24 build: version unknown (rev: unknown)
I: 23-11-19 18:26:24 uname: Linux 6.1.0-060100-generic #202303090726 SMP PREEMPT_DYNAMIC Thu Mar  9 12:33:28 UTC 2023 x86_64
I: 23-11-19 18:26:24 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 9 counters (6 general, 3 fixed)
I: 23-11-19 18:26:24 pmu: found perf 'perf_events generic PMU' having 198 events, 0 counters (0 general, 0 fixed)
I: 23-11-19 18:26:24 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)
I: 23-11-19 18:26:24 pmu: found intel_msr 'Intel MSR' having 6 events, 6 counters (0 general, 6 fixed)
E: 23-11-19 18:26:24 config: event 'RAPL_ENERGY_PKG' is invalid or unsupported by this machine
E: 23-11-19 18:26:24 config: failed to parse the provided command-line arguments

what can I do to fix this error? I need the RAPL energy for my studies.

Sensor fails to initialize in recent update

Hi,

I was using the image powerapi/hwpc-sensor with the latest version (1.3.0), but until version 1.2.0, it was working well, but with the recent update to the 1.3.0, my machine does not initialize correctly the sensor.

Here are the logs from the docker:
docker logs hwpc_sensor

I: 24-05-06 07:56:17 build: version v1.3.0 (rev: 4b8e54c88e7738aea9293a6dc75ea96fb3f8160b)
I: 24-05-06 07:56:17 uname: Linux 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 14:39:20 UTC 2 x86_64
I: 24-05-06 07:56:17 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 7 counters (4 general, 3 fixed)
I: 24-05-06 07:56:17 pmu: found perf 'perf_events generic PMU' having 82 events, 0 counters (0 general, 0 fixed)
I: 24-05-06 07:56:17 pmu: found rapl 'Intel RAPL' having 4 events, 3 counters (0 general, 3 fixed)
I: 24-05-06 07:56:17 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)
I: 24-05-06 07:56:17 pmu: found skl 'Intel Skylake' having 84 events, 11 counters (8 general, 3 fixed)
I: 24-05-06 07:56:17 pmu: found intel_msr 'Intel MSR' having 6 events, 6 counters (0 general, 6 fixed)
I: 24-05-06 07:56:17 sensor: configuration is valid, starting monitoring...
I: 24-05-06 07:56:17 perf<all>: monitoring actor started
E: 24-05-06 07:56:17 perf</pids/system.slice/thermald.service>: failed opening perf event for group=core cpu=7 event=CPU_CLK_THREAD_UNHALTED:REF_P errno=2
E: 24-05-06 07:56:17 perf</pids/system.slice/thermald.service>: failed to setup perf for group=core pkg=0 cpu=7
E: 24-05-06 07:56:17 perf</pids/system.slice/thermald.service>: cannot initialize perf monitoring
E: 24-05-06 07:56:17 perf</unified/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/gnome-terminal-server.service>: failed opening perf event for group=core cpu=7 event=CPU_CLK_THREAD_UNHALTED:REF_P errno=2
E: 24-05-06 07:56:17 perf</unified/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/gnome-terminal-server.service>: failed to setup perf for group=core pkg=0 cpu=7
E: 24-05-06 07:56:17 perf</unified/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/gnome-terminal-server.service>: cannot initialize perf monitoring
E: 24-05-06 07:56:17 perf</systemd/user.slice/user-1000.slice/[email protected]/app.slice/evolution-addressbook-factory.service>: failed opening perf event for group=core cpu=7 event=CPU_CLK_THREAD_UNHALTED:REF_P errno=2
E: 24-05-06 07:56:17 perf</systemd/user.slice/user-1000.slice/[email protected]/app.slice/evolution-addressbook-factory.service>: failed to setup perf for group=core pkg=0 cpu=7
E: 24-05-06 07:56:17 perf</systemd/user.slice/user-1000.slice/[email protected]/app.slice/evolution-addressbook-factory.service>: cannot initialize perf monitoring
File exists (src/epoll.cpp:100)

At the end, the container is removed, but ending with code Exited(134).

I was considering that maybe there can be some problems with permission, but it's weird that in the version of 1.2.0 everything works normally.

Sensor fails to detect containers

Hi,

I have a system where the sensor fails to detect any running container, both when using docker or kubernetes directly.

When looking at the source code, I see that the sensor

  • looks for containers in the subdirectories of the perf_event cgroup ll /sys/fs/cgroup/perf_event/ (can be changed with the -p flag
  • try to detect a type based on the path in this directory see
    target_detect_type(const char *cgroup_path)
  • then, for docker and kubernetes, validates the type based on an expected path 👍
    • perf_event/docker/... for docker
    • perf_event/kubepods/... for kubernetes
  • If the validation fails, the container is ignored

On my system,

  • kubernetes pods appear in /sys/fs/cgroup/perf_event/kubepods.slice/ , are detected but fail type validation
  • docker containers appear in /sys/fs/cgroup/perf_event/system.slice/ are detected but fail type validation too
    As a consequences, no container is monitored.

Maybe we could bypass type detection and file validation altogether, as I proposed in the old #2 PR ? That should solve this issue.

Finally, I have no idea on why the /sys/fs/cgroup/perf_event the system is diffferent on this system. It's a centos 7.2, I usually use debian and ubuntu systems and never had this issue before.

Please let me know if I can add something useful or if you have any idea to get me started on this issue.

No name resolution for socket output

When using the socket output, the cofiguration requires using an IP address, which is not convenient in container environments.
For example, when running the sensor and a formula with docker compose, we don't know the address at which the formula (i.e. the socket server the sensor must connect to) runs but we can use the name of the service, which will be resolved by docker to the corresponding ip address. However, as the sensor does not resolve address names, this does not work.

This can easily be tested by running the sensor with a name for the -U parameter :

docker run --privileged --rm --name sensorhwpc --network="host" --pid host \
         -v /sys:/sys  \
         -v /var/lib/docker/containers:/var/lib/docker/containers:ro     \
         powerapi/hwpc-sensor \
           -n sensor \
           -f 2000 \
           -r socket -U localhost -P  12000 \
           -s "rapl" -o -e "RAPL_ENERGY_PKG" \
           -s "msr"     -e "TSC" -e "APERF" -e "MPERF" \
           -c "core"    -e "CPU_CLK_THREAD_UNHALTED:REF_P" \
                        -e "CPU_CLK_THREAD_UNHALTED:THREAD_P" \
                        -e "LLC_MISSES"\
                    -e "INSTRUCTIONS_RETIRED"

Due to another issue (see PR #13) This command currently simply blocks, once the error is fixed it fails even tough a socket server is running at localhost.

A solution would be to use getaddrinfo to resolve the address name when the supplied argument is not a valid IP address.

Container name resolution fails in docker

Now that the DockerFile uses the powerapi instead of root using CAP_SYS_ADMIN (and thanks for that !!) , the name resolution fails.

Indeed, in target_docker.c the name is extracted from the /var/lib/docker/containers/%.*s/config.v2.json config file, which cannot be read by the user powerapi.

While it does not really bothers me for my use cases, I suspect it might be troublesome for some users who rely on that name to recognize the power consumption of each container.

Trailing `\n` in socket name breaks csv output

When detecting the socket id, a trailing \n character is left in the package_id.

This breaks the csv output by breaking lines, for example, I get this in an rapl output csv file:

k8s@k8s-nuc:~/Projects/powerapi/hwpc-sensor$ tail -f /tmp/powerapi-sensor-reporting/rapl 
timestamp,sensor,target,socket,cpu,RAPL_ENERGY_PKG,time_enabled,time_running
1592814808213,pr-sensor,all,0
,7,19138871296,155813121,155813121
1592814809242,pr-sensor,all,0
,7,86548152320,1184425793,1184425793
1592814810280,pr-sensor,all,0
,7,67986522112,2222247455,2222247455

This can also be seen by adding some logs :

I: 20-06-22 09:13:11 hwinfo: found cpu  'cpu7' id : '7' for pkg '0
' 

Sensor does not work when built from source despite working with .deb installation

Thanks for the work on this project. I am trying to add some functionality to the sensor for a project of mine, however, I am unable to sucessfully run the sensor when compiling from source even before making any changes. This is puzzling as the sensor works perfectly when install the .deb package on Ubuntu LTS 20.04 and run it that way. I am not using docker.

The sensor outputs the following:
E: 22-06-27 10:33:14 config: event 'TSC' is invalid or unsupported by this machine
E: 22-06-27 10:33:14 config: failed to parse the provided config file

I have identified the issue as being the pfm_get_os_event_encoding() function from libpfm in events.c. It consistently returns a PFM_ERR_NOTFOUND for both msr and rapl events in the config file that I know exist in my architecture (Haswell). All events appear as expected using "perf list."

I have attempted building directly from the source of the latest release to no avail. Might you have any ideas as to why this may be happening, given that the sensor runs successfully and is able to find these events when the sensor installed from the .deb package?

Potential memory leak

When stopping the sensor, I generally get a warning such as this :

=================================================================
==1055067==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 3174400 byte(s) in 775 object(s) allocated from:
    #0 0x7f4e13fc5720 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe9720)
    #1 0x7f4e13e49d6e in bson_realloc (/usr/lib/x86_64-linux-gnu/libbson-1.0.so.0+0x1ed6e)

SUMMARY: AddressSanitizer: 3174400 byte(s) leaked in 775 allocation(s).

The sensor is running with the socket output.

Access to RAPL counters on some CPU / kernel combinaison

On some system, the sensor fails to access RAPL counters and we get this error at startup:

E: 21-12-07 11:14:26 config: event 'RAPL_ENERGY_PKG' is invalid or unsupported by this machine

However, on the same systems, we can see rapl data in the powercap sysfs.

powerapi-ng/powerapi#125 is probably an example of such error.

Actually the sensor use the perf subsystem to access rapl, which is implemented in a different part of the kernel source tree than powercap. Thus I suspect that this can happens when the kernel contains, for the cpu of the machine, the implementation of powercap but not of rapl access in perf.

I suggest we implement a fallback access to RAPL using powercap sysfs, when we cannot use perf.

perf: perf_event_open failed with error: Permission denied

Hi,

I am following the instructions for running the HWPC sensor with docker (link).

When I am executing the docker command in the "Running the Sensor via CLI parameters" section I am getting getting following error:

I: 23-09-21 20:23:06 build: version unknown (rev: unknown)
I: 23-09-21 20:23:06 uname: Linux 6.4.15-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Sep  7 00:25:01 UTC 2023 x86_64
E: 23-09-21 20:23:06 perf: perf_event_open failed with error: Permission denied
E: 23-09-21 20:23:06 perf: error while testing the perf_event support

Do I need to install some perf related package or give the container more permissions?

Sensor does connect to the formula after disconnection

Test scenario:

  • run the sensor
  • start the formula :
    • it receives the report and correctly generates power reports
  • stop the formula , restart it
    • the formula does not receives any report from the sensor

The following command were used when testing:

Sensor:

docker run --privileged --rm --name sensor --network="host"  powerapi/hwpc-sensor \
         -v /sys:/sys  \
         -v /var/lib/docker/containers:/var/lib/docker/containers:ro     \
           -n sensor \
           -f 2000 \
           -r socket -U 127.0.0.1 -P 12000 \
           -s "rapl" -o -e "RAPL_ENERGY_PKG" \
           -s "msr"     -e "TSC" -e "APERF" -e "MPERF" \
           -c "core"    -e "CPU_CLK_THREAD_UNHALTED:REF_P" \
                        -e "CPU_CLK_THREAD_UNHALTED:THREAD_P" \
                        -e "LLC_MISSES"\
                    -e "INSTRUCTIONS_RETIRED"

BTW, the sensor in the latest docker image does not returns its own version :
I: 21-10-26 15:28:23 build: version undefined (rev: undefined) (Sep 28 2021 - 14:40:24)

Formula:

python -m smartwatts --debug --config-file config_file.json

version: today's pull on master.

configuration file:

{
  "verbose": true,
  "stream": true,
  "input": {
    "puller": {
      "model": "HWPCReport",
      "type": "socket",
      "uri": "127.0.0.1",
      "port": 12000
    }
  },
  "output": {
    "pusher_power": {
      "type": "csv",
      "uri": ".",
      "model" : "PowerReport"
    }
  },
  "cpu-frequency-base": 2300,
  "cpu-frequency-min": 800,
  "cpu-frequency-max": 5100,
  "cpu-error-threshold": 2.0,
  "disable-dram-formula": true,
  "sensor-report-sampling-interval": 1000
}

problem sensor with socket

Hi,
When i run a sensor with this configuration file i don't have error but it doesn't work work.

My config with socket :

{
  "name": "sensor",
  "verbose": true,
  "frequency": 500,
  "output": {
  	"type": "socket",
        "uri": "localhost",
        "port": 8080
  },
  "system": {
    "rapl": {
      "events": ["RAPL_ENERGY_PKG"],
      "monitoring_type": "MONITOR_ONE_CPU_PER_SOCKET"
    },
    "msr": {
      "events": ["TSC", "APERF", "MPERF"]
    }
  },
  "container": {
    "core": {
      "events": [
        "CPU_CLK_THREAD_UNHALTED:REF_P",
        "CPU_CLK_THREAD_UNHALTED:THREAD_P",
        "LLC_MISSES",
        "INSTRUCTIONS_RETIRED"
      ]
    }
  }
}

my run command:

docker run --rm --net=host --privileged --pid=host -v /sys:/sys -v /var/lib/docker/containers:/var/lib/docker/containers:ro -v /tmp/powerapi-sensor-reporting:/reporting -v $(pwd):/srv -v $(pwd)/config_file.json:/config_file.json powerapi/hwpc-sensor --config-file config_file.json

Log without confirmation sensor :

I: 21-12-10 14:07:29 build: version v1.1.0 (rev: 7d735067c2dfbb82ba0cb4afea3ed4dc1331919d) (Dec  7 2021 - 17:23:33)
I: 21-12-10 14:07:29 uname: Linux 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64
I: 21-12-10 14:07:29 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 7 counters (4 general, 3 fixed)
I: 21-12-10 14:07:29 pmu: found perf 'perf_events generic PMU' having 194 events, 0 counters (0 general, 0 fixed)
I: 21-12-10 14:07:29 pmu: found hsw 'Intel Haswell' having 74 events, 11 counters (8 general, 3 fixed)
I: 21-12-10 14:07:29 pmu: found rapl 'Intel RAPL' having 3 events, 3 counters (0 general, 3 fixed)
I: 21-12-10 14:07:29 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)
I: 21-12-10 14:07:29 pmu: found intel_msr 'Intel MSR' having 6 events, 6 counters (0 general, 6 fixed)

When i run a command with mongo configuration :

{
  "name": "sensor",
  "verbose": true,
  "frequency": 500,
  "output": {
  	"type": "mongodb",
        "uri": "mongodb://127.0.0.1",
        "database": "db_sensor",
        "collection": "report_0"
  },
  "system": {
    "rapl": {
      "events": ["RAPL_ENERGY_PKG"],
      "monitoring_type": "MONITOR_ONE_CPU_PER_SOCKET"
    },
    "msr": {
      "events": ["TSC", "APERF", "MPERF"]
    }
  },
  "container": {
    "core": {
      "events": [
        "CPU_CLK_THREAD_UNHALTED:REF_P",
        "CPU_CLK_THREAD_UNHALTED:THREAD_P",
        "LLC_MISSES",
        "INSTRUCTIONS_RETIRED"
      ]
    }
  }
}

I have this additional log

: 21-12-10 14:01:20 sensor: configuration is valid, starting monitoring...
I: 21-12-10 14:01:20 perf<all>: monitoring actor started
I: 21-12-10 14:01:20 perf<mongo>: monitoring actor started

Cdlt
Achraf

The sensor fails to read performance events inside a kubernetes container

When running the sensor inside a kubernetes container it fails to read some performance events.

When running the following command

/usr/bin/hwpc-sensor "test" -n test  -r csv -U "/reporting/" 
   -s "rapl" -o -e "RAPL_ENERGY_PKG" 
   -s "msr"  -e "TSC" -e "APERF" -e "MPERF" 
  -c "core" -e "CPU_CLK_THREAD_UNHALTED:REF_P" -e "CPU_CLK_THREAD_UNHALTED:THREAD_P" -e "LLC_MISSES" -e "INSTRUCTIONS_RETIRED"

I get this error:

I: 20-06-16 15:00:37 build: version unknown (rev: 7a63055ff2fdfbfdf776d42188a87de19882dd88) (Jun  8 2020 - 13:05:23)
I: 20-06-16 15:00:37 uname: Linux 5.3.0-59-generic #53~18.04.1-Ubuntu SMP Thu Jun 4 14:58:26 UTC 2020 x86_64
E: 20-06-16 15:00:37 config: event 'TSC' is invalid or unsupported by this machine

I have this issue with TSC, MPERF and APERF, but other events (RAPL, etc) work fine.

Error while linking hwpc-sensor with mongoc driver

I am trying to run Power API which has dependency of MongoC Driver. First, I was getting error in mongodb_initialize function which wad due to older version of mongoc driver.

Now, I have complied and installed lastest version of mongoc driver but it is giving undefined references while linking libmongoc

[ 5%] Linking C executable hwpc-sensor
//usr/local/lib/libmongoc-1.0.so: undefined reference to bson_isspace' //usr/local/lib/libmongoc-1.0.so: undefined reference to bson_steal'
//usr/local/lib/libmongoc-1.0.so: undefined reference to bson_as_relaxed_extended_json' //usr/local/lib/libmongoc-1.0.so: undefined reference to bson_reserve_buffer'
//usr/local/lib/libmongoc-1.0.so: undefined reference to bson_reader_reset' //usr/local/lib/libmongoc-1.0.so: undefined reference to bson_validate_with_error'
//usr/local/lib/libmongoc-1.0.so: undefined reference to `bson_strcasecmp'
collect2: error: ld returned 1 exit status
CMakeFiles/hwpc-sensor.dir/build.make:484: recipe for target 'hwpc-sensor' failed
make[2]: *** [hwpc-sensor] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/hwpc-sensor.dir/all' failed
make[1]: *** [CMakeFiles/hwpc-sensor.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Please let me know if there is a version conflict or any other issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.