Giter Club home page Giter Club logo

experiment-impact-tracker's Introduction

experiment-impact-tracker

The experiment-impact-tracker is meant to be a simple drop-in method to track energy usage, carbon emissions, and compute utilization of your system. Currently, on Linux systems with Intel chips (that support the RAPL or powergadget interfaces) and NVIDIA GPUs, we record: power draw from CPU and GPU, hardware information, python package versions, estimated carbon emissions information, etc. In California we even support realtime carbon emission information by querying caiso.com!

Once all this information is logged, you can generate an online appendix which shows off this information like seen here:

https://breakend.github.io/RL-Energy-Leaderboard/reinforcement_learning_energy_leaderboard/pongnoframeskip-v4_experiments/ppo2_stable_baselines,_default_settings/0.html

Installation

To install:

pip install experiment-impact-tracker

Usage

Please go to the docs page for detailed info on the design, usage, and contributing: https://breakend.github.io/experiment-impact-tracker/

If you think the docs aren't helpful or need more expansion, let us know with a Github Issue!

Below we will walk through an example together.

Add Tracking

We included a simple example in the project which can be found in examples/my_experiment.py

As show in my_experiment.py, you just need to add a few lines of code!

from experiment_impact_tracker.compute_tracker import ImpactTracker
tracker = ImpactTracker(<your log directory here>)
tracker.launch_impact_monitor()

This will launch a separate python process that will gather compute/energy/carbon information in the background.

NOTE: Because of the way python multiprocessing works, this process will not interrupt the main one even if the monitoring process errors out. To address this, you can add the following to periodically read the latest info from the log file and check for any errors that might've occurred in the tracking process. If you have a better idea on how to handle exceptions in the tracking thread please open an issue or submit a pull request!

info = tracker.get_latest_info_and_check_for_errors()

Alternatively, you can use context management!

experiment1 = tempfile.mkdtemp()
experiment2 = tempfile.mkdtemp()

with ImpactTracker(experiment1):
    do_something()

with ImpactTracker(experiment2):
    do_something_else()

To kick off our simple experiment, run python my_experiment.py. You will see our training starts and in the end the script will output something like Please find your experiment logs in: /var/folders/n_/9qzct77j68j6n9lh0lw3vjqcn96zxl/T/tmpcp7sfese

Now let's go over to the temp dir, we can see our logging there!

$ log_path=/var/folders/n_/9qzct77j68j6n9lh0lw3vjqcn96zxl/T/tmpcp7sfese
$ cd $log_path
$ tree 
.
└── impacttracker
    ├── data.json
    ├── impact_tracker_log.log
    └── info.pkl

You can then access the information via the DataInterface:

from experiment_impact_tracker.data_interface import DataInterface

data_interface1 = DataInterface([experiment1_logdir])
data_interface2 = DataInterface([experiment2_logdir])

data_interface_both = DataInterface([experiment1_logdir, experiment2_logdir])

assert data_interface1.kg_carbon + data_interface2.kg_carbon == data_interface_both.kg_carbon
assert data_interface1.total_power + data_interface2.total_power == data_interface_both.total_power

Creating a carbon impact statement

We can also use a script to automatically generate a carbon impact statement for your paper! Just call this, we'll find all the logfiles generated by the tool and calculate emissions information! Specify your ISO3 country code as well to get a dollar amount based on the per-country cost of carbon.

generate-carbon-impact-statement my_directories that_contain all_my_experiments "USA"

Custom PUE

Some people may know the PUE of their data center, while we use a PUE of 1.58 in our calculations. To set a different PUE, do:

OVERRIDE_PUE=1.1 generate-carbon-impact-statement my_directories that_contain all_my_experiments "USA"

Generating an HTML appendix

After logging all your experiments into a dir, we can automatically search for the impact tracker's logs and generate an HTML appendix.

First, create a json file with the structure of the website you'd like to see (this lets you create hierarchies of experiment as web pages).

For an example of all the capabilities of the tool you can see the json structure here: https://github.com/Breakend/RL-Energy-Leaderboard/blob/master/leaderboard_generation_format.json

Basically, you can group several runs together and specify variables to summarize. You should probably just copypaste the example above and remove what you don't need, but here are some descriptions of what is being specified:

"Comparing Translation Methods" : {
  # FILTER: this regex we use to look through the directory 
  # you specify and find experiments with this in the directory structure,
  "filter" : "(translation)", 
 
  # Use this to talk about your experiment
  "description" : "An experiment on translation.", 
  
  # executive_summary_variables: this will aggregate the sums and averages across these metrics.
  # you can see available metrics to summarize here: 
  # https://github.com/Breakend/experiment-impact-tracker/blob/master/experiment_impact_tracker/data_info_and_router.py
  "executive_summary_variables" : ["total_power", "exp_len_hours", "cpu_hours", "gpu_hours", "estimated_carbon_impact_kg"],   
  
  # The child experiments to group together
  "child_experiments" : 
        {
            "Transformer Network" : {
                                "filter" : "(transformer)",
                                "description" : "A subset of experiments for transformer experiments"
                            },
            "Conv Network" : {
                                "filter" : "(conv)",
                                "description" : "A subset of experiments for conv experiments"
                            }
                   
        }
}

Then you just run this script, pointing to your data, the json file and an output directory.

create-compute-appendix ./data/ --site_spec leaderboard_generation_format.json --output_dir ./site/

To see this in action, take a look at our RL Energy Leaderboard.

The specs are here: https://github.com/Breakend/RL-Energy-Leaderboard

And the output looks like this: https://breakend.github.io/RL-Energy-Leaderboard/reinforcement_learning_energy_leaderboard/

Looking up cloud provider emission info

Based on energy grid locations, we can estimate emission from cloud providers using our tools. A script to do that is here:

lookup-cloud-region-info aws

Or you can look up emissions information for your own address!

% get-region-emissions-info address --address "Stanford, California"

({'geometry': <shapely.geometry.multipolygon.MultiPolygon object at 0x1194c3b38>,
  'id': 'US-CA',
  'properties': {'zoneName': 'US-CA'},
  'type': 'Feature'},
 {'_source': 'https://github.com/tmrowco/electricitymap-contrib/blob/master/config/co2eq_parameters.json '
             '(ElectricityMap Average, 2019)',
  'carbonIntensity': 250.73337617853463,
  'fossilFuelRatio': 0.48888711737336304,
  'renewableRatio': 0.428373256377554})
  

Asserting certain hardware

It may be the case that you're trying to run two sets of experiments and compare emissions/energy/etc. In this case, you generally want to ensure that there's parity between the two sets of experiments. If you're running on a cluster you might not want to accidentally use a different GPU/CPU pair. To get around this we provided an assertion check that you can add to your code that will kill a job if it's running on a wrong hardware combo. For example:

from experiment_impact_tracker.gpu.nvidia import assert_gpus_by_attributes
from experiment_impact_tracker.cpu.common import assert_cpus_by_attributes

assert_gpus_by_attributes({ "name" : "GeForce GTX TITAN X"})
assert_cpus_by_attributes({ "brand": "Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz" })

Building docs

sphinx-build -b html docsrc docs

Compatible Systems

Right now, we're only compatible with Linux and Mac OS X systems running NVIDIA GPU's and Intel processors (which support RAPL or PowerGadget).

If you'd like support for your use-case or encounter missing/broken functionality on your system specs, please open an issue or better yet submit a pull request! It's almost impossible to cover every combination on our own!

Mac OS X Suppport

Currently, we support only CPU and memory-related metrics on Mac OS X for Intel-based CPUs. However, these require the Intel PowerGadget driver and the Intel PowerGadget tool. The easiest way to install this is:

$ brew cask install intel-power-gadget
$ which "/Applications/Intel Power Gadget/PowerLog"

or for newer versions of OS X

$ brew install intel-power-gadget
$ which "/Applications/Intel Power Gadget/PowerLog"

You can also see here: https://software.intel.com/content/www/us/en/develop/articles/intel-power-gadget.html

This will install a tool called PowerLog that we rely on to get power measurements on Mac OS X systems.

Tested Successfully On

GPUs:

  • NVIDIA Titan X
  • NVIDIA Titan V

CPUs:

  • Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
  • Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
  • 2.7 GHz Quad-Core Intel Core i7

OS:

  • Ubuntu 16.04.5 LTS
  • Mac OS X 10.15.6

Testing

To test, run:

pytest 

Citation

If you use this work, please cite our paper:

@misc{henderson2020systematic,
    title={Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning},
    author={Peter Henderson and Jieru Hu and Joshua Romoff and Emma Brunskill and Dan Jurafsky and Joelle Pineau},
    year={2020},
    eprint={2002.05651},
    archivePrefix={arXiv},
    primaryClass={cs.CY}
}

Also, we rely on a number of downstream packages and work to make this work possible. For carbon accounting, we relied on open source code from https://www.electricitymap.org/ as an initial base. psutil provides many of the compute metrics we use. nvidia-smi and Intel RAPL provide energy metrics.

experiment-impact-tracker's People

Contributors

breakend avatar jieru-hu avatar leondz avatar nikhil153 avatar rosstex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

experiment-impact-tracker's Issues

Carbon statement with impact 0

Hello, very good project. Thanks for sharing. I have run my_experiment.py on a Linux virtual machine (Ubuntu) and then running the script generate_carbon_impact_statement with "USA" I get the following output: "This work contributed 0.000 kg of....", that is, with 0s in all indicators. Is this normal? Thanks in advance....

Unexpected top level domain for RAPL package. Not yet supported.

A "hello world" example throws an exception:

leon@blade:/tmp$ cat eit-test.py
#!/usr/bin/env python3
import time
from experiment_impact_tracker.compute_tracker import ImpactTracker
tracker = ImpactTracker('logdir')
tracker.launch_impact_monitor()
time.sleep(20)
leon@blade:/tmp$ python3 eit-test.py
loading region bounding boxes for computing carbon emissions region, this may take a moment...
 454/454... rate=667.16 Hz, eta=0:00:00, total=0:00:00, wall=11:24 CETT
Done!
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Gathering system info for reproducibility...
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Done initial setup and information gathering...
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Starting process to monitor power
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint timestamp took 5.817413330078125e-05 seconds
experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - Encountered exception within power monitor thread!
experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR -   File "/home/leon/.local/lib/python3.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 105, in launch_power_monitor
    _sample_and_log_power(log_dir, initial_info, logger=logger)
  File "/home/leon/.local/lib/python3.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 69, in _sample_and_log_power
    results = header["routing"]["function"](process_ids, logger=logger, region=initial_info['region']['id'], log_dir=log_dir)
  File "/home/leon/.local/lib/python3.7/site-packages/experiment_impact_tracker/cpu/intel.py", line 134, in get_rapl_power
    "Unexpected top level domain for RAPL package. Not yet supported.")

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/leon/.local/lib/python3.7/site-packages/experiment_impact_tracker/utils.py", line 68, in process_func
    raise e
  File "/home/leon/.local/lib/python3.7/site-packages/experiment_impact_tracker/utils.py", line 62, in process_func
    ret = func(q, *args, **kwargs)
  File "/home/leon/.local/lib/python3.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 105, in launch_power_monitor
    _sample_and_log_power(log_dir, initial_info, logger=logger)
  File "/home/leon/.local/lib/python3.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 69, in _sample_and_log_power
    results = header["routing"]["function"](process_ids, logger=logger, region=initial_info['region']['id'], log_dir=log_dir)
  File "/home/leon/.local/lib/python3.7/site-packages/experiment_impact_tracker/cpu/intel.py", line 134, in get_rapl_power
    "Unexpected top level domain for RAPL package. Not yet supported.")
NotImplementedError: Unexpected top level domain for RAPL package. Not yet supported.

Running on an i7-6700HQ with Ubuntu 18.04.04 LTS under Python 3.7.5 and experiment-impact-tracker 0.1.8 in Copenhagen.

Empty log files from provided example my_experiment.py

I am just getting started using this repo, so my apologies for the naive question.
When I run python examples/my_experiment.py, it prints:

loading region bounding boxes for computing carbon emissions region, this may take a moment...
 454/454... rate=553.06 Hz, eta=0:00:00, total=0:00:00, wall=13:23 PSTT
Done!
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Gathering system info for reproducibility...
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Done initial setup and information gathering...
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Starting process to monitor power
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint timestamp took 0.00012373924255371094 seconds
Pass: 9
Pass: 19
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint rapl_power_draw_absolute took 2.107639789581299 seconds
Pass: 29
Pass: 39
Pass: 49
Pass: 59
Pass: 69
Pass: 79
Pass: 89
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint nvidia_draw_absolute took 6.554447650909424 seconds
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint cpu_count_adjusted_average_load took 0.00011205673217773438 seconds
Pass: 99
Please find your experiment logs in: /tmp/tmpw3feackg

OK, so what's in /tmp/tmpw3feackg?

cd /tmp/tmpw3feackg/impacttracker
ls -lah
total 64
-rw-rw-r-- 1 forrest forrest     0 Jun 15 13:23 data.json
-rw-rw-r-- 1 forrest forrest     0 Jun 15 13:23 impact_tracker_log.log
-rw-rw-r-- 1 forrest forrest 63309 Jun 15 13:23 info.pkl

Hmm, isn't it odd that 2 of the 3 files are empty?

But, let's see what's in the info.pkl file.

# python code
import pickle
with open('/tmp/tmpw3feackg/impacttracker/info.pkl', 'rb') as f:
    x = pickle.load(f)

Here's a pseudocode summary of what's in the pickle file:

{
  'python_package_info': <list of packages in my conda environment>,
  'cpu_info': {includes the model of CPU, the CPU frequency, and the clock frequency},
  L3, L2, and L1 cache size,
  gpu info: {gpu name, total memory, driver version, cuda version} for each GPU,
  'experiment_impact_tracker_version': '0.1.8',

  ... and now, the interesting stuff ...

  'region': {
    'type': 'Feature',
    'geometry': <shapely.geometry.multipolygon.MultiPolygon at 0x7fddd6d8e580>,
    'properties': {'zoneName': 'US-CA'}, 'id': 'US-CA'
  },
  'region_carbon_intensity_estimate': {
    '_source': 'https://github.com/tmrowco/electricitymap-contrib/blob/master/config/co2eq_parameters.json (ElectricityMap Average, 2019)',
    'carbonIntensity': 250.73337617853463,
    'fossilFuelRatio': 0.4888871173733636,
    'renewableRatio': 0.4283732563775541},
    'experiment_end': datetime.datetime(2020, 6, 15, 13, 23, 27, 343920)
  }
}

Questions

  1. When things are working correctly, what information should appear in the data.json and impact_tracker_log.logfiles?
  2. In the pickle file, it strikes me as odd that more elementary stats such as the total energy (in kilowatt-hours) and the total runtime aren't reported. Are those typically reported in the data.json and impact_tracker_log.log files that are empty in my case?

Problems in Mac OS X

I have run the example code (sudo python3 my_experiment.py) on an iMac19.1 with Intel Core i5 3GHz CPU macOS Big Sur and I get the following error:

loading region bounding boxes for computing carbon emissions region, this may take a moment...
454/454... rate=829.41 Hz, eta=0:00:00, total=0:00:00, wall=18:11 -044
Done!
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Gathering system info for reproducibility...
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Done initial setup and information gathering...
experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - Encountered exception when launching power monitor thread.
--- Logging error ---
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/compute_tracker.py", line 202, in launch_impact_monitor
self.p, self.queue = launch_power_monitor(self.logdir, self.initial_info, self.logger)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/utils.py", line 83, in wrapper
p.start()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'processify..process_func'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/init.py", line 1079, in emit
msg = self.format(record)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/init.py", line 923, in format
return fmt.format(record)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/init.py", line 659, in format
record.message = record.getMessage()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/init.py", line 363, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/Users/theuser/PycharmProjects/test_green/experiment-impact-tracker/examples/my_experiment.py", line 70, in
my_experiment()
File "/Users/theuser/PycharmProjects/test_green/experiment-impact-tracker/examples/my_experiment.py", line 55, in my_experiment
tracker.launch_impact_monitor()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/compute_tracker.py", line 210, in launch_impact_monitor
self.logger.error(ex_type, ex_value,
Message: <class 'AttributeError'>
Arguments: (AttributeError("Can't pickle local object 'processify..process_func'"), ' File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/compute_tracker.py", line 202, in launch_impact_monitor\n self.p, self.queue = launch_power_monitor(self.logdir, self.initial_info, self.logger)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/utils.py", line 83, in wrapper\n p.start()\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start\n self._popen = self._Popen(self)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen\n return _default_context.get_context().Process._Popen(process_obj)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen\n return Popen(process_obj)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init\n super().init(process_obj)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init\n self._launch(process_obj)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch\n reduction.dump(process_obj, fp)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump\n ForkingPickler(file, protocol).dump(obj)\n')
--- Logging error ---
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/compute_tracker.py", line 202, in launch_impact_monitor
self.p, self.queue = launch_power_monitor(self.logdir, self.initial_info, self.logger)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/utils.py", line 83, in wrapper
p.start()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'processify..process_func'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/init.py", line 1079, in emit
msg = self.format(record)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/init.py", line 923, in format
return fmt.format(record)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/init.py", line 659, in format
record.message = record.getMessage()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/init.py", line 363, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/Users/theuser/PycharmProjects/test_green/experiment-impact-tracker/examples/my_experiment.py", line 70, in
my_experiment()
File "/Users/theuser/PycharmProjects/test_green/experiment-impact-tracker/examples/my_experiment.py", line 55, in my_experiment
tracker.launch_impact_monitor()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/compute_tracker.py", line 210, in launch_impact_monitor
self.logger.error(ex_type, ex_value,
Message: <class 'AttributeError'>
Arguments: (AttributeError("Can't pickle local object 'processify..process_func'"), ' File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/compute_tracker.py", line 202, in launch_impact_monitor\n self.p, self.queue = launch_power_monitor(self.logdir, self.initial_info, self.logger)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/utils.py", line 83, in wrapper\n p.start()\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start\n self._popen = self._Popen(self)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen\n return _default_context.get_context().Process._Popen(process_obj)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen\n return Popen(process_obj)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init\n super().init(process_obj)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init\n self._launch(process_obj)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch\n reduction.dump(process_obj, fp)\n File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump\n ForkingPickler(file, protocol).dump(obj)\n')
Traceback (most recent call last):
File "/Users/theuser/PycharmProjects/test_green/experiment-impact-tracker/examples/my_experiment.py", line 70, in
my_experiment()
File "/Users/theuser/PycharmProjects/test_green/experiment-impact-tracker/examples/my_experiment.py", line 55, in my_experiment
tracker.launch_impact_monitor()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/compute_tracker.py", line 202, in launch_impact_monitor
self.p, self.queue = launch_power_monitor(self.logdir, self.initial_info, self.logger)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/experiment_impact_tracker/utils.py", line 83, in wrapper
p.start()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'processify..process_func'

Any idea what may be causing this error? I have googled this bug but found almost nothing about it.

problem with memory counting method ?

Hi all, thanks for this great tool.

I observed strange results using experiment-impact-tracker, with a rapl_power_draw_absolute < rapl_estimated_attributable_power_draw

I suspect a problem in the attributable memory counting method :

Can someone confirm / correct this statement ?

>>> from experiment_impact_tracker.compute_tracker import ImpactTracker
>>> import os
# add a breakpoint in code with : import pdb; pdb.set_trace()
>>> import experiment_impact_tracker.cpu.intel
>>> experiment_impact_tracker.cpu.intel.get_intel_power([42954])
{'rapl_power_draw_absolute': 51.95493564262022, 'rapl_estimated_attributable_power_draw': 57.03725903291969, 'cpu_time_seconds': {42954: OrderedDict([('user', 2599.63), ('system', 444.91), ('children_user', 0.17), ('children_system', 4.66), ('iowait', 0.0)])}, 'average_relative_cpu_utilization': 0.9966920738858607, 'absolute_cpu_utilization': 2.055019728531582, 'relative_mem_usage': 1.470720815077694, 'absolute_mem_usage': 13372609536.0, 'absolute_mem_percent_usage': 0.1341320368485436, 'mem_info_per_process': {42954: OrderedDict([('rss', 6764167168), ('vms', 46901260288), ('shared', 1675501568), ('text', 2330624), ('lib', 0), ('data', 42889551872), ('dirty', 0), ('uss', 6659334144), ('pss', 6713275392), ('swap', 0)])}}
>>>

Shouldn't relative_mem_usage be <= 1 ? Tracing further :

(Pdb) p system_wide_mem_percent
1.4931055744647492
(Pdb) p total_physical_memory
svmem(total=99697356800, available=90187993088, percent=9.5, used=7801208832, free=372105216, active=6481403904, inactive=90540580864, buffers=123908096, cached=91400134656, shared=1138413568, slab=670351360)
(Pdb) p mem_info_per_process
{42954: OrderedDict([('rss', 7177113600), ('vms', 46932361216), ('shared', 1675501568), ('text', 2330624), ('lib', 0), ('data', 42920652800), ('dirty', 0), ('uss', 7072268288), ('pss', 7126215680), ('swap', 0)])}

I patched locally and results look much more like expected (attributable power draw slightly under absolute power draw).

Tested on :

  • Dell R740, dual Xeon Silver 4110, 3x Nvidia Tesla T4, 96GB, CentOS 7.6
  • Dell T640, dual Xeon Silver 4215, 4x Nvidia Quadro RTX6000, CentOS 7.6
  • both case : git clone of branch master

Problem With Github Actions/Azure

We would ideally like to run CI through GH Actions. Actions run on a VM through Azure:

GitHub hosts Linux and Windows runners on Standard_DS2_v2 virtual machines in Microsoft Azure with the GitHub Actions runner application installed. The GitHub-hosted runner application is a fork of the Azure Pipelines Agent. Inbound ICMP packets are blocked for all Azure virtual machines, so ping or traceroute commands might not work. For more information about the Standard_DS2_v2 machine resources, see "Dv2 and DSv2-series" in the Microsoft Azure documentation.

GitHub hosts macOS runners in GitHub's own macOS Cloud.

However, these VMs don't seem to expose the RAPL interface. Is there a workaround such that the VMs can expose RAPL or we can install RAPL? Maybe we might want to open an issue with GH Actions: https://github.com/actions/virtual-environments/issues

ParserError parsing nvidia-smi output

I get the following error when starting the tracker:

Traceback (most recent call last):                                                                                                                                                                          
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap                                                                                                                             
    self.run()                                                                                                                                                                                              
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run                                                                                                                                     
    self._target(*self._args, **self._kwargs)                                                                                                                                                               
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/experiment_impact_tracker/utils.py", line 68, in process_func                                        
    raise e                                                                                                                                                                                                 
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/experiment_impact_tracker/utils.py", line 62, in process_func                                        
    ret = func(q, *args, **kwargs)                                                                                                                                                                          
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 105, in launch_power_monitor                     
    _sample_and_log_power(log_dir, initial_info, logger=logger)                                                                                                                                             
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 69, in _sample_and_log_power
    results = header["routing"]["function"](process_ids, logger=logger, region=initial_info['region']['id'], log_dir=log_dir)
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/experiment_impact_tracker/gpu/nvidia.py", line 123, in get_nvidia_gpu_power
    df = pd.read_csv(StringIO(out_str_final), engine='python', delim_whitespace=True)
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 460, in _read
    data = parser.read(nrows)
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 1198, in read
    ret = self._engine.read(nrows)
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 2585, in read
    alldata = self._rows_to_cols(content)
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 3237, in _rows_to_cols
    self._alert_malformed(msg, row_num + 1)
  File "/tsi/doctorants/ocifka/projects/phd/experiments/lakhnes_cover/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 2998, in _alert_malformed
    raise ParserError(msg)
pandas.errors.ParserError: Expected 8 fields in line 4, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

This seems to be a problem with the output of the command nvidia-smi pmon -c 5, which gives the following output on my machine:

# gpu        pid  type    sm   mem   enc   dec   command
# Idx          #   C/G     %     %     %     %   name
    0       9122     G     0     3     0     0   Xorg           
    0      11344     G     0     0     0     0   chromium --type
    0      22948     C     0     0     0     0   python3        
    0       9122     G     0     3     0     0   Xorg           
    0      11344     G     0     0     0     0   chromium --type
    0      22948     C     0     0     0     0   python3        
    0       9122     G     0     3     0     0   Xorg           
    0      11344     G     0     0     0     0   chromium --type
    0      22948     C     0     0     0     0   python3        
    0       9122     G     0     3     0     0   Xorg           
    0      11344     G     0     0     0     0   chromium --type
    0      22948     C     0     0     0     0   python3        
    0       9122     G     0     3     0     0   Xorg           
    0      11344     G     0     0     0     0   chromium --type
    0      22948     C     0     0     0     0   python3

I'm guessing the problem is chromium --type being two words.

On some platforms CPU freq results in error

Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../python3.7/site-packages/psutil/__init__.py", line 1877, in cpu_freq ret = _psplatform.cpu_freq() File ".../python3.7/site-packages/psutil/_pslinux.py", line 703, in cpu_freq "can't find current frequency file") NotImplementedError: can't find current frequency file

To fix, I think we should add a compatibility check here: https://github.com/Breakend/experiment-impact-tracker/blob/master/experiment_impact_tracker/data_info_and_router.py#L168

and a try-catch here so that it just returns an empty list if it gets that exception:

def get_cpu_freq(*args, **kwargs):
""" Returns all cpu freq of all cpu's available
"""
return [x._asdict() for x in psutil.cpu_freq(percpu=True)]

2 misc problems in create-compute-appendix

Hi all, it seems like there is 2 misc problems in current version of create-compute-appendix :

  • a pdb.set_trace() invocation not commented out
  • a test in parsing GPU infos

Here is a proposed diff :

diff --git a/scripts/create-compute-appendix b/scripts/create-compute-appendix
--- a/scripts/create-compute-appendix
+++ b/scripts/create-compute-appendix
@@ -211,7 +211,7 @@ def _aggregated_data_for_filterset(output_dir,
 
                 # {k: [v] for k, v in info["gpu_info"].items()})
                 if "gpu_info" in info:
-                    import pdb;pdb.set_trace()
+                    #import pdb;pdb.set_trace()
                     gpu_data_frame = pd.DataFrame.from_dict(info["gpu_info"])
                     gpu_infos_all[experiment_set_names[exp_set]].append(
                         gpu_data_frame)
@@ -283,7 +283,8 @@ def _create_leaf_page(output_directory, all_infos, exp_set_name, description, ex
             data_zip_paths_all[exp_set_name][i], html_output_dir)
 
         template_args = {}
-        if gpu_infos_all and gpu_infos_all[exp_set_name] and gpu_infos_all[exp_set_name][i]:
+        #if gpu_infos_all and gpu_infos_all[exp_set_name] and gpu_infos_all[exp_set_name][i]:
+        if gpu_infos_all and gpu_infos_all and pd.DataFrame(gpu_infos_all[exp_set_name][i]).empty:
             template_args["gpu_info"] = gpu_infos_all[exp_set_name][i].T
 
         template_args["exp_set_names_titles"] = [(_format_setname(experiment_set_names[exp_set]), experiment_set_names[exp_set])

Add colab example

Somewhat blocked by #18. In current latest trial, nvidia-pmon formatting is different in colab and judging by other issues can vary across systems. If we can figure out how to get the same data out of an API this issues should go away and we can create a colab example

Getting error "Problem with output in nvidia-smi pmon -c 10"

Hi, we're getting this error in the log file :

experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - Encountered exception within power monitor thread!
experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR -   File "/usr/local/lib/python3.7/dist-packages/experiment_impact_tracker/compute_tracker.py", line 105, in launch_power_monitor
    _sample_and_log_power(log_dir, initial_info, logger=logger)
  File "/usr/local/lib/python3.7/dist-packages/experiment_impact_tracker/compute_tracker.py", line 69, in _sample_and_log_power
    results = header["routing"]["function"](process_ids, logger=logger, region=initial_info['region']['id'], log_dir=log_dir)
  File "/usr/local/lib/python3.7/dist-packages/experiment_impact_tracker/gpu/nvidia.py", line 117, in get_nvidia_gpu_power
    raise ValueError('Problem with output in nvidia-smi pmon -c 10')

Is it an issue with our Nvidia GPU ? We are using Tesla T4.

ModuleNotFoundError: No module named 'experiment_impact_tracker.data_interface'

DataInterface seems to be missing from the repository. I get the below error while running generate-carbon-impact-statement:

Traceback (most recent call last):
File "/experiment-impact-tracker/scripts/generate-carbon-impact-statement", line 25, in
from experiment_impact_tracker.data_interface import DataInterface
ModuleNotFoundError: No module named 'experiment_impact_tracker.data_interface'

Update README for Mac OS X support

Hey,

Just want to point out that the install command for the Intel PowerGadget tool could be updated.

For more recent OS X versions, brew cask install [...] is just brew install [...]

Thanks!

Tutorial on how to run

Hello,

Is there any chance of any a grokking tutorial on how to run the experiment-impact-tracker? I was using jupyter notebooks and face some issues after running generate-carbon-impact-statement. I'm fairly new to programming (but I would like to be aware of my algorithms' impact from the start) and maybe I missed something really basic.
So far, my steps were:

  1. Import experiment-impact-tracker using pip
  2. In my jupyter notebook call from experiment_impact_tracker.compute_tracker import ImpactTracker tracker = ImpactTracker(<your log directory here>) tracker.launch_impact_monitor() (some warnings are thrown)
  3. Importing generate-carbon-impact-statementscript
  4. Running it from the terminal, providing the directory where my ImpactLogs folder is and using "GBR" as ISO3Code

Amongst an extensive list of erros, it finishes with is:

  File "/homes/pps30/venvs/venv_dl4am_a1/lib/python3.6/site-packages/experiment_impact_tracker/utils.py", line 101, in gather_additional_info
    exp_len = datetime.timestamp(info["experiment_end"]) - \
KeyError: 'experiment_end'

Maybe this is very vague for me to get any help, but in any case.

Thank you very much for your tool!

generate-carbon-impact-statement script

Hi there,

Cool library!
Got this issue when running:
generate-carbon-impact-statement tracker "USA"

The script returns this error:
`loading region bounding boxes for computing carbon emissions region, this may take a moment...
454/454... rate=556.09 Hz, eta=0:00:00, total=0:00:00, wall=15:15 UTCC
Done!
Traceback (most recent call last):

File "/home/retachet/.conda/envs/homer/bin/generate-carbon-impact-statement", line 7, in
exec(compile(f.read(), file, 'exec'))

File "/home/retachet/ExperimentImpactTracker/experiment-impact-tracker/scripts/generate-carbon-impact-statement", line 235, in
sys.exit(main(sys.argv[1:]))

File "/home/retachet/ExperimentImpactTracker/experiment-impact-tracker/scripts/generate-carbon-impact-statement", line 177, in main
extracted_info = gather_additional_info(info, log_dir)

File "/home/retachet/ExperimentImpactTracker/experiment-impact-tracker/experiment_impact_tracker/utils.py", line 102, in gather_additional_info
cpu_seconds = _get_cpu_hours_from_per_process_data(json_array)

File "/home/retachet/ExperimentImpactTracker/experiment-impact-tracker/experiment_impact_tracker/utils.py", line 94, in _get_cpu_hours_from_per_process_data
cpu_point = datapoint["cpu_time_seconds"]

KeyError: 'cpu_time_seconds'`

Nan values and pandas

Hi again, could you please take a look at the aggregation functions (e.g. mean) that you use with pandas Dataframe objects? Occasionally, a "TypeError: Could not convert .... to numeric" occurs. Apparently this error is caused due to the presence of some "nan" values in the column of the dataframe to be summarized. Maybe a solution would be to invoke the mean function as follows: dataframe.mean (skipna = True). Thank you!

AttributeError on start

When using experiment-impact-tracker 0.1.8 under Mac OS X 10.16.7 and Python 3.8.5 in my own experiment like this

from experiment_impact_tracker.compute_tracker import ImpactTracker
import tempfile
...
tracker = ImpactTracker(tempfile.mkdtemp())
tracker.launch_impact_monitor()
...

I get the following error: AttributeError: Can't pickle local object 'processify.<locals>.process_func'

Full log: tracker.log

Hard exit when using ray

When using ray, there is a hard exit where we always see an stack trace printed because of sys.exit being called on worker nodes. Is there a way to exit more gracefully in these situations?

(pid=42763) experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - Encountered exception within power monitor thread!
(pid=42763) ERROR:Encountered exception within power monitor thread!
(pid=42763) INFO:Done - Logging final info.
(pid=42763) /u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/site-packages/experiment_impact_tracker/data_utils
.py:30: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead
(pid=42763)   return json_normalize(json_array, max_level=max_level), json_array
(pid=42763) experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR -   File
"/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/site-packages/experiment_impact_tracker/compute_tracker.py",
line 161, in launch_power_monitor
(pid=42763)     _sample_and_log_power(log_dir, initial_info, logger=logger)
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3
.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 112, in _sample_and_log_power
(pid=42763)     log_dir=log_dir,
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3
.7/site-packages/experiment_impact_tracker/gpu/nvidia.py", line 127, in get_nvidia_gpu_power
(pid=42763)     out_str = sp.communicate()
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/subprocess.py", line 964, in communicate
(pid=42763)     stdout, stderr = self._communicate(input, endtime, timeout)
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/subprocess.py", line 1715, in _communicate
(pid=42763)     ready = selector.select(timeout)
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/selectors.py", line 415, in select
(pid=42763)     fd_event_list = self._selector.poll(timeout)
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/site-packages/ray/worker.py", line 392, in

sigterm_handler
(pid=42763)     sys.exit(1)
(pid=42763)
(pid=42763) ERROR:  File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3
.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 161, in launch_power_monitor
(pid=42763)     _sample_and_log_power(log_dir, initial_info, logger=logger)
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3
.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 112, in _sample_and_log_power
(pid=42763)     log_dir=log_dir,
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3
.7/site-packages/experiment_impact_tracker/gpu/nvidia.py", line 127, in get_nvidia_gpu_power
(pid=42763)     out_str = sp.communicate()
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/subprocess.py", line 964, in communicate
(pid=42763)     stdout, stderr = self._communicate(input, endtime, timeout)
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/subprocess.py", line 1715, in _communicate
(pid=42763)     ready = selector.select(timeout)
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/selectors.py", line 415, in select
(pid=42763)     fd_event_list = self._selector.poll(timeout)
(pid=42763)   File "/u/nlp/anaconda/main/anaconda3/envs/<anon>/lib/python3.7/site-packages/ray/worker.py", line 392, in

sigterm_handler
(pid=42763)     sys.exit(1)

(pid=29659) experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - Encountered exception within power monitor thread!
(pid=29659) ERROR:Encountered exception within power monitor thread!
INFO:time to complete: 0:01:39.574842
(pid=29659) experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR -   File
"/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3.7/site-packages/experiment_impact_tracker/compute_tracker.py",
line

161, in launch_power_monitor
(pid=29659)     _sample_and_log_power(log_dir, initial_info, logger=logger)
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 93, in _sample_and_log_power
(pid=29659)     required_headers = _get_compatible_data_headers(get_current_region_info_cached()[0])
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 182, in _get_compatible_data_headers
(pid=29659)     if not compatability_fn(region=region):
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/experiment_impact_tracker/cpu/common.py", line 32, in is_cpu_freq_compatible
(pid=29659)     test = [x._asdict() for x in psutil.cpu_freq(percpu=True)]
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/ray/thirdparty_files/psutil/__init__.py", line 1859, in cpu_freq
(pid=29659)     ret = _psplatform.cpu_freq()
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/ray/thirdparty_files/psutil/_pslinux.py", line 742, in cpu_freq
(pid=29659)     curr = cat(pjoin(path, "scaling_cur_freq"), fallback=None)
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/ray/thirdparty_files/psutil/_pslinux.py", line 293, in cat
(pid=29659)     return f.read().strip()
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3.7/site-packages/ray/worker.py", line 392, in
sigterm_handler
(pid=29659)     sys.exit(1)
(pid=29659)
(pid=29659) ERROR:  File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 161, in launch_power_monitor
(pid=29659)     _sample_and_log_power(log_dir, initial_info, logger=logger)
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 93, in _sample_and_log_power
(pid=29659)     required_headers = _get_compatible_data_headers(get_current_region_info_cached()[0])
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/experiment_impact_tracker/compute_tracker.py", line 182, in _get_compatible_data_headers
(pid=29659)     if not compatability_fn(region=region):
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/experiment_impact_tracker/cpu/common.py", line 32, in is_cpu_freq_compatible
(pid=29659)     test = [x._asdict() for x in psutil.cpu_freq(percpu=True)]
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/ray/thirdparty_files/psutil/__init__.py", line 1859, in cpu_freq
(pid=29659)     ret = _psplatform.cpu_freq()
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/ray/thirdparty_files/psutil/_pslinux.py", line 742, in cpu_freq
(pid=29659)     curr = cat(pjoin(path, "scaling_cur_freq"), fallback=None)
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3
.7/site-packages/ray/thirdparty_files/psutil/_pslinux.py", line 293, in cat
(pid=29659)     return f.read().strip()
(pid=29659)   File "/u/nlp/anaconda/main/anaconda3/envs/anon/lib/python3.7/site-packages/ray/worker.py", line 392, in
sigterm_handler
(pid=29659)     sys.exit(1)

Code:

 remote_class = ray.remote(num_cpus=1, num_gpus=num_gpus)(
                TestClass
            ).remote()
            output = remote_class.run.remote(
                model_path=model_path,
                dataset_path=data_path,
                train_batch_size=train_batch_size,
                run_stats=run_stats,
            )

class TestClass(object):
    def run(cls, model_path: str, dataset_path, train_batch_size, run_stats):
        """
        Computes energy metrics for one training epoch
        """
        # First copy model_path to temp directory
        logging_path = os.path.join(
            ENERGY_LOGGING_DIR, run_stats["hyperopt_results"]["experiment_id"]
        )
        tempdir = os.path.join(logging_path, "temp_model")
        shutil.copytree(model_path, tempdir)
        model = AnonModel.load(tempdir)
        with ImpactTracker(logging_path):
            (
                _,
                _,
                _,
            ) = model.train(
                dataset=dataset_path,
                training_set_metadat=os.path.join(
                    tempdir, "training_set_metadata.json"
                ),
            )
        data_interface = DataInterface([logging_path])
        carbon_output = {
            "kg_carbon": data_interface.kg_carbon,
            "total_power": data_interface.total_power,
            "PUE": data_interface.PUE,
            "duration_of_train_step": data_interface.exp_len_hours,
        }
        shutil.rmtree(tempdir)
        return carbon_output

PermissionError

Hi all,

I am trying to run some experiments but am getting the following error:
PermissionError: [Errno 13] Permission denied: '/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj'

I am using a remote server that I cannot change the permissions of the /sys/class/powercap/ and the energy_uj is readable only by root.

Any ideas?

Allow to set experiment-impact-tracker's log-level

I'm using absl.logging to log. It looks like experiment-impact-tracker does, too. When I set my log-level to logging.set_verbosity(logging.INFO), my logging gets flooded by experiment-impact-tracker by logs like INFO:Datapoint timestamp took 3.1948089599609375e-05 seconds.

Is there a way to easily lower experiment-impact-tracker's log-level or suppress these very fine-grained log messages?

Missing average carbon intensity data for US-DC

Hello Breakend,
Could you please add data for 'US-DC' ?
I get this error :

/usr/local/lib/python3.7/dist-packages/experiment_impact_tracker/emissions/get_region_metrics.py in get_zone_information_by_coords(coords)
      6 def get_zone_information_by_coords(coords):
      7     region = get_region_by_coords(coords)
----> 8     return region, ZONE_INFO[region["id"]]
      9 
     10 def get_region_by_coords(coords):

KeyError: 'US-DC'

NotImplementedError: Unexpected top level domain for RAPL package. Not yet supported.

Hello Everyone,

I have an issue running the experiment example. I saw that the issue was solved, but i didn't understand how i can fix it.

loading region bounding boxes for computing carbon emissions region, this may take a moment...
454/454... rate=658.04 Hz, eta=0:00:00, total=0:00:00, wall=16:46 CETT
Done!
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Gathering system info for reproducibility...
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Done initial setup and information gathering...
/tmp/tmpy37szyx0
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Starting process to monitor power
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint timestamp took 3.695487976074219e-05 seconds
experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - Encountered exception within power monitor thread!
experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/compute_tracker.py", line 105, in launch_power_monitor
_sample_and_log_power(log_dir, initial_info, logger=logger)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/compute_tracker.py", line 69, in _sample_and_log_power
results = header["routing"]["function"](process_ids, logger=logger, region=initial_info['region']['id'], log_dir=log_dir)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/cpu/intel.py", line 133, in get_rapl_power
raise NotImplementedError(

Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/utils.py", line 68, in process_func
raise e
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/utils.py", line 62, in process_func
ret = func(q, *args, **kwargs)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/compute_tracker.py", line 105, in launch_power_monitor
_sample_and_log_power(log_dir, initial_info, logger=logger)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/compute_tracker.py", line 69, in _sample_and_log_power
results = header["routing"]["function"](process_ids, logger=logger, region=initial_info['region']['id'], log_dir=log_dir)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/cpu/intel.py", line 133, in get_rapl_power
raise NotImplementedError(
NotImplementedError: Unexpected top level domain for RAPL package. Not yet supported.
Pass: 9
Traceback (most recent call last):
File "Documents/impacttracker.py", line 71, in
my_experiment()
File "Documents/impacttracker.py", line 64, in my_experiment
tracker.get_latest_info_and_check_for_errors()
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/compute_tracker.py", line 224, in get_latest_info_and_check_for_errors
raise ex_type(message)
NotImplementedError: Unexpected top level domain for RAPL package. Not yet supported. (in subprocess)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/utils.py", line 62, in process_func
ret = func(q, *args, **kwargs)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/compute_tracker.py", line 105, in launch_power_monitor
_sample_and_log_power(log_dir, initial_info, logger=logger)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/compute_tracker.py", line 69, in _sample_and_log_power
results = header["routing"]["function"](process_ids, logger=logger, region=initial_info['region']['id'], log_dir=log_dir)
File "/home/hasseneby/.local/lib/python3.8/site-packages/experiment_impact_tracker/cpu/intel.py", line 133, in get_rapl_power
raise NotImplementedError( `

My region is France and i don't use GPU for the test. Running on the newest Ubunto version 20.04.
Thanks for your return :)

Add FileLock for reads/writes

There's sometimes a race condition if you try to read the latest data for the impact tracker where the json returned is garbage because it's still being written. Currently have a retry but this should be replaced with a file lock.

Update PyPI package

Hey,

It seems like the PyPI package is not up to date at all anymore with the code and the documentation.

Notably, the ExperimentTracker class has no __enter__ method, resulting in a crash when trying to use the implementation,

with ExperimentTracker(..):

"Unexpected top level domain for RAPL package. Not yet supported."

Hello,

I just installed the package, then put in on my trainer in transformers and got the next error:

2020-10-03 22:04:26,602 - experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - Encountered exception within power monitor thread!
2020-10-03 22:04:26,604 - experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR -   File "/home/proto/anaconda3/envs/torch/lib/python3.6/site-packages/experiment_impact_tracker/compute_tracker.py", line 105, in launch_power_monitor
    _sample_and_log_power(log_dir, initial_info, logger=logger)
  File "/home/proto/anaconda3/envs/torch/lib/python3.6/site-packages/experiment_impact_tracker/compute_tracker.py", line 69, in _sample_and_log_power
    results = header["routing"]["function"](process_ids, logger=logger, region=initial_info['region']['id'], log_dir=log_dir)
  File "/home/proto/anaconda3/envs/torch/lib/python3.6/site-packages/experiment_impact_tracker/cpu/intel.py", line 134, in get_rapl_power
    "Unexpected top level domain for RAPL package. Not yet supported.")

https://github.com/huggingface/transformers/blob/9bdce3a4f91c6d53873582b0210e61c92bba8fd3/src/transformers/trainer.py#L729
I have added the code in between those lines.

        tracker = ImpactTracker(self.args.logging_dir)
        tracker.launch_impact_monitor()

Here is my pip freeze

What could be?
Thanks!

Getting an error in get_zone_information_by_coords

Hi, I'm trying to add the tracker in my project but I keep getting the same error when launching the tracker :

/usr/local/lib/python3.7/dist-packages/experiment_impact_tracker/emissions/get_region_metrics.py in get_zone_information_by_coords(coords)
      6 def get_zone_information_by_coords(coords):
      7     region = get_region_by_coords(coords)
----> 8     return region, ZONE_INFO[region["id"]]
      9 
     10 def get_region_by_coords(coords):

KeyError: 'US-NV'

Any idea of why ?

how to construct/destory ImpactTracker instances?

I'd like to be able to explicitly stop tracking. experiment_impact_tracker.compute_tracker.ImpactTracker doesn't seem to have a destructor, and doesn't implement __enter__ & __exit__ so can't be used with a with. What's the preferred route for this?

Support for abruptly stopped experiments

I interrupted an experiment and now when I run generate-carbon-impact-statement, it crashes with:

Traceback (most recent call last):
  File "venv/bin/generate-carbon-impact-statement", line 233, in <module>
    sys.exit(main(sys.argv[1:]))
  File "venv/bin/generate-carbon-impact-statement", line 177, in main
    extracted_info = gather_additional_info(info, log_dir)
  File "venv/lib/python3.6/site-packages/experiment_impact_tracker/utils.py", line 101, in gather_additional_info
    exp_len = datetime.timestamp(info["experiment_end"]) - \
KeyError: 'experiment_end'

I think this situation can be quite common, so the tracker should be able to recover from it (e.g. taking the last logged timestamp if experiment_end is not available) instead of crashing.

ImpactTracker Warnings

Hi!

I was wondering if it is normal to get the following warnings when using ImpactTracker:

image

My code is:

# Init tracker with log path and start it in a separate process
carbon_dir = os.path.join(conf['exp_dir'], 'carbon_logs')
os.makedirs(carbon_dir, exist_ok=True)  
tracker = ImpactTracker(carbon_dir)
tracker.launch_impact_monitor() 

Migrate to pynvml

Look into migrating to pynvml now that it seems possible to get GPU info per process.

Zero-values when generating carbon impact statement

Hi everyone.

I am writing my thesis on deriving an approximated cost model for NLP deployment, and this tracker would help me a lot.

When executing the example, everything runs smooth, but when generating the carbon impact statement, I only get zero values. I have tried to make the example work harder and, together with my deployment code with only zero values as a result. I can see in the JSON file that a lot of information is missing to calculate a non-zero value.

I have tested the example on a Linux machine and a macOS (Big Sur) machine with the same result. (Python 3.7, Pandas 1.3.5)

I have tested parts of the code where I get information about the system. I could continue to go through the code, but the time is short, and I am not using Intel Power Gadget as a backup solution.

Terminal printout during execution:
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Starting process to monitor power
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint timestamp took 0.0006999969482421875 seconds
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint cpu_count_adjusted_average_load took 7.891654968261719e-05 seconds
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint disk_write_speed took 0.5056366920471191 seconds
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint timestamp took 0.0004968643188476562 seconds
experiment_impact_tracker.compute_tracker.ImpactTracker - WARNING - Datapoint cpu_count_adjusted_average_load took 2.288818359375e-05 seconds

Generate carbon impact statement:
$ generate-carbon-impact-statement my_directories that_contain all_my_experiments "USA"

loading region bounding boxes for computing carbon emissions region, this may take a moment...
454/454... rate=707.43 Hz, eta=0:00:00, total=0:00:000
Done!
/Users/xxxxx/opt/anaconda3/envs/nlp_energy_project/bin/generate-carbon-impact-statement:37: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
pd.set_option('display.max_colwidth', -1)
This work contributed 0.000 kg of $\text{CO}_{2eq}$ to the atmosphere and used 0.000 kWh of electricity, having a USA-specific social cost of carbon of $0.00 ($0.00, $0.00). Carbon accounting information can be found here: ....
.
.
.

Has anybody had the same issue and found a solution?

Thanks all

Error in "get_region_by_coords" on a remote computing cluster

Hi,

I am able to run the code smoothly on my local machine. The same code + env in a singularity container fails on a remote computing cluster with following error:

loading region bounding boxes for computing carbon emissions region, this may take a moment...
 454/454... rate=566.68 Hz, eta=0:00:00, total=0:00:00, wall=11:38 ESTT
Done!
INFO:Gathering system info for reproducibility...
ERROR:Status code Unknown from http://ipinfo.io/json: ERROR - HTTPConnectionPool(host='ipinfo.io', port=80): Max retries exceeded with url: /json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ad6ba6184c0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "eval_with_tracker.py", line 565, in <module>
    tracker = ImpactTracker(log_dir)
  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 246, in __init__
    self.initial_info = gather_initial_info(logdir)
  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 225, in gather_initial_info
    data[key] = info_["routing"]["function"]()
  File "../../experiment-impact-tracker/experiment_impact_tracker/data_info_and_router.py", line 63, in <lambda>
    "routing": {"function": lambda: get_current_region_info_cached()[0]},
  File "../../experiment-impact-tracker/experiment_impact_tracker/emissions/get_region_metrics.py", line 65, in get_current_region_info_cached
    return get_current_region_info(ttl_hash=get_ttl_hash(seconds=60 * 60))
  File "../../experiment-impact-tracker/experiment_impact_tracker/emissions/get_region_metrics.py", line 43, in get_current_region_info
    return get_zone_information_by_coords(get_current_location())
  File "../../experiment-impact-tracker/experiment_impact_tracker/emissions/get_region_metrics.py", line 10, in get_zone_information_by_coords
    region = get_region_by_coords(coords)
  File "../../experiment-impact-tracker/experiment_impact_tracker/emissions/get_region_metrics.py", line 17, in get_region_by_coords
    point = Point(lon, lat)
  File "/usr/local/lib/python3.8/dist-packages/shapely/geometry/point.py", line 48, in __init__
    self._set_coords(*args)
  File "/usr/local/lib/python3.8/dist-packages/shapely/geometry/point.py", line 137, in _set_coords
    self._geom, self._ndim = geos_point_from_py(tuple(args))
  File "/usr/local/lib/python3.8/dist-packages/shapely/geometry/point.py", line 214, in geos_point_from_py
    dx = c_double(coords[0])
TypeError: must be real number, not NoneType

I am able to ping the ipinfo.io from the same node on the cluster.

ping ipinfo.io
PING ipinfo.io (216.239.34.21) 56(84) bytes of data.
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=1 ttl=111 time=0.655 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=2 ttl=111 time=0.809 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=3 ttl=111 time=0.836 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=4 ttl=111 time=0.733 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=5 ttl=111 time=0.797 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=6 ttl=111 time=0.741 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=7 ttl=111 time=0.762 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=8 ttl=111 time=0.744 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=9 ttl=111 time=0.749 ms
^C
--- ipinfo.io ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8008ms
rtt min/avg/max/mdev = 0.655/0.758/0.836/0.055 ms

Any suggestions? Thanks!

Getting a minimal example working

I'm trying to get a "hello, world"-type report out. So far I have:

test.py

#!/usr/bin/env python3

from experiment_impact_tracker.compute_tracker import ImpactTracker
import time

tracker = ImpactTracker('logs')
tracker.launch_impact_monitor()

time.sleep(2)

This populates the logs dir with:

$ ll logs/impacttracker/
total 92
drwxr-xr-x 2 leon leon  4096 Apr  9 19:54 ./
drwxr-xr-x 3 leon leon  4096 Apr  9 18:31 ../
-rw-r--r-- 1 leon leon     0 Apr  9 19:54 data.json
-rw-r--r-- 1 leon leon  2673 Apr  9 18:47 impact_tracker_log.log
-rw-r--r-- 1 leon leon 77938 Apr  9 19:54 info.pkl

When I try to get a report, this is thrown

$ generate-carbon-impact-statement logs/ "DNK"
loading region bounding boxes for computing carbon emissions region, this may take a moment...
 454/454... rate=702.29 Hz, eta=0:00:00, total=0:00:00, wall=20:08 CETT
Done!
/usr/local/bin/generate-carbon-impact-statement:37: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
  pd.set_option('display.max_colwidth', -1)
/usr/local/lib/python3.6/dist-packages/experiment_impact_tracker/data_utils.py:26: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead
  return json_normalize(json_array, max_level=max_level), json_array
Traceback (most recent call last):
  File "/home/leon/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'timestamp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/generate-carbon-impact-statement", line 233, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/usr/local/bin/generate-carbon-impact-statement", line 177, in main
    extracted_info = gather_additional_info(info, log_dir)
  File "/usr/local/lib/python3.6/dist-packages/experiment_impact_tracker/utils.py", line 107, in gather_additional_info
    time_differences = df["timestamp"].diff()
  File "/home/leon/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/leon/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'timestamp'

Looking further, I think df should be populated from data.json in the DATAPATH by load_data_into_frame in data_utils.py, but I can't figure out where in the code writes to this file. It's 0-length in my case, so doesn't have the timestamp key. Not sure if this is a bug or user error - how can I set up a minimal end-to-end example?

benchmark non-python programs

Hi, thanks for this great project! I'm wondering whether it's possible to track the usage of non-python programs, e.g., C++ code? And how?

i index variable used out of its loop

in the file

https://github.com/Breakend/experiment-impact-tracker/blob/master/experiment_impact_tracker/cpu/intel.py

the function :

def get_rapl_power(pid_list, logger=None, **kwargs):

at line 370
shoudn't it be:

for i, p in enumerate(process_list):

instead of

for p in process_list:

The way it is currently, it seems the i variable will get set to the last value from the previous loop and as a consequence, in the except block, the process added to the zombies list will also be the last process in the list process_list instead of being the process which generated the exception.

ModuleNotFoundError: No module named 'experiment_impact_tracker.data_interface'

Hi all,

I am testing the library on a simple tutorial from Pytorch. I was able to generate an "impacttracker" folder and data.json seems to contain the expected information. However, I am not able to use the DataInterface class:

>>> from experiment_impact_tracker.data_interface import DataInterface
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'experiment_impact_tracker.data_interface'

I use Anaconda environments. I have tried downgrading python to versions 3.7 and 3.8.
I have noticed my version of experiment_impact_tracker is 0.1.8 even though the version.py file specifies 0.1.9 but running "pip install experiment-impact-tracker --upgrade" doesn't fix it.

Here is my system info:
Operating System: Debian GNU/Linux 11 (bullseye)
Kernel: Linux 5.10.0-9-amd64
Architecture: x86-64

Thank you in advance for your help.

child experiment tracker

In multi-experiment code it would be constructive to launch one tracker and store child experiments within this. There seems to be some support for this but it's unclear how to use it - is this feature implemented?

Monitor thread errors out with IndexError

I am trying to test out a few different software and it seems that for some of them the monitor thread errors out during intel RAPL calls. I have tested this on two different CPUs (1) Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz and 2) Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz.

experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR - Encountered exception within power monitor thread!
ERROR:Encountered exception within power monitor thread!                                                                                                                                      
experiment_impact_tracker.compute_tracker.ImpactTracker - ERROR -   File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 161, in launch_power_monitor
    _sample_and_log_power(log_dir, initial_info, logger=logger)                                                                                                                               
  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 112, in _sample_and_log_power
    log_dir=log_dir,                                                                                                                                                                          
  File "../../experiment-impact-tracker/experiment_impact_tracker/cpu/intel.py", line 88, in get_intel_power
    return get_rapl_power(pid_list, logger, **kwargs)                                                                                                                                         
  File "../../experiment-impact-tracker/experiment_impact_tracker/cpu/intel.py", line 435, in get_rapl_power
    st2, st22, system_wide_pt2, pt2 = infos2[i]                                                                                                                                               
                                                                                               
ERROR:  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 161, in launch_power_monitor                                                                
    _sample_and_log_power(log_dir, initial_info, logger=logger)
  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 112, in _sample_and_log_power                                                                     
    log_dir=log_dir,                                                                           
  File "../../experiment-impact-tracker/experiment_impact_tracker/cpu/intel.py", line 88, in get_intel_power
    return get_rapl_power(pid_list, logger, **kwargs)                                                                                                                                         
  File "../../experiment-impact-tracker/experiment_impact_tracker/cpu/intel.py", line 435, in get_rapl_power                                                             
    st2, st22, system_wide_pt2, pt2 = infos2[i]   
    
Process Process-1:                      
Traceback (most recent call last):          
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap  
    self.run()                                                                                 
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run                        
    self._target(*self._args, **self._kwargs)                                                                                                                                                 
  File "../../experiment-impact-tracker/experiment_impact_tracker/utils.py", line 68, in process_func
    raise e                                                                                                                                                                                   
  File "../../experiment-impact-tracker/experiment_impact_tracker/utils.py", line 62, in process_func                                                                                         
    ret = func(q, *args, **kwargs)                                                                                                                                                            
  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 161, in launch_power_monitor
    _sample_and_log_power(log_dir, initial_info, logger=logger)                                                                                                                               
  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 112, in _sample_and_log_power
    log_dir=log_dir,                                                                                                                                                                          
  File "../../experiment-impact-tracker/experiment_impact_tracker/cpu/intel.py", line 88, in get_intel_power
    return get_rapl_power(pid_list, logger, **kwargs)                                                                                                                                         
  File "../../experiment-impact-tracker/experiment_impact_tracker/cpu/intel.py", line 435, in get_rapl_power
    st2, st22, system_wide_pt2, pt2 = infos2[i]                                                                                                                                               
IndexError: list index out of range  

Any suggestions? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.