smarr / rebench Goto Github PK
View Code? Open in Web Editor NEWExecute and document benchmarks reproducibly.
License: MIT License
Execute and document benchmarks reproducibly.
License: MIT License
Think currently the support is not there or broken to get multiple results for the same benchmark from the same run.
Configurations with template parameters in extra params are not recognized to be identical are not recognized with configurations that have filled in 'template parameters'/hardcoded values.
Problem is probably that we add the extra args by format var expansion:
ReBench/rebench/model/run_id.py
Line 196 in e5ebe70
The context-oriented programming is only used sparingly.
While it is kind of nice, it's not really necessary and makes things more complex than necessary.
To avoid systematic bias that might be caused by operating system caches or hardware/memory properties, the execution order of benchmark runs should be randomized.
This has the additional benefit of producing data points for more benchmarks and it is possible to see trends of the results earlier for long running sets of benchmarks.
There needs to be an option to suppress the randomization, of course.
Also, we should have an estimate of what that means in terms of memory usage when many microbenchmarks are executed (need to keep the data in memory to calculated confidence interval, etc...)
If we have a complex configuration, it can be that benchmarks with identical configurations are listed multiple times to produce separate data sets.
Try to avoid executing them more than necessary.
How to determine that runs are identical? Either based on the configuration, which is tricky, because we can use the string expansion ala %(cores)s. Or based on the resulting command line, which might also be tricky if there are subtile whitespace differences.
Typical scenarios:
One might also want to rerun, or run specific experiments with specific parameters, but this is outside the scope of this feature request.
So, we want something like:
rebench test.conf TestExperiment vm:TestRunner1
rebench -r test.conf TestExperiment vm:TestRunner1 # -r for rerun (or perhaps -c, for clear)
rebench -r test.conf TestExperiment s:TestSuite1
rebench -r test.conf TestExperiment vm:TestRunner1 s:TestSuite1
rebench -r test.conf TestExperiment s:TestSuite1:Bench1
Add a performance reader for JMH.
Example output:
# Run progress: 0.00% complete, ETA 01:00:00
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: benchmarks.DynamicProxy.directAdd
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Fork: 1 of 10
# Warmup Iteration 1: 847885.303 ops/ms
# Warmup Iteration 2: 869209.997 ops/ms
# Warmup Iteration 3: 787127.216 ops/ms
# Warmup Iteration 4: 849333.002 ops/ms
# Warmup Iteration 5: 862511.213 ops/ms
# Warmup Iteration 6: 786574.891 ops/ms
# Warmup Iteration 7: 867692.766 ops/ms
# Warmup Iteration 8: 791901.852 ops/ms
# Warmup Iteration 9: 868440.246 ops/ms
# Warmup Iteration 10: 873144.727 ops/ms
# Warmup Iteration 11: 858841.746 ops/ms
# Warmup Iteration 12: 864258.483 ops/ms
# Warmup Iteration 13: 867792.566 ops/ms
# Warmup Iteration 14: 873802.641 ops/ms
# Warmup Iteration 15: 789308.386 ops/ms
# Warmup Iteration 16: 872348.119 ops/ms
# Warmup Iteration 17: 876049.520 ops/ms
# Warmup Iteration 18: 855590.678 ops/ms
# Warmup Iteration 19: 790754.207 ops/ms
# Warmup Iteration 20: 844763.982 ops/ms
Iteration 1: 851585.492 ops/ms
Iteration 2: 855210.272 ops/ms
Iteration 3: 863139.120 ops/ms
Iteration 4: 854572.548 ops/ms
Iteration 5: 848365.018 ops/ms
Iteration 6: 868452.069 ops/ms
Iteration 7: 874102.630 ops/ms
Iteration 8: 871221.945 ops/ms
Iteration 9: 872087.960 ops/ms
Iteration 10: 871954.737 ops/ms
Iteration 11: 866641.653 ops/ms
Iteration 12: 871745.541 ops/ms
Iteration 13: 873303.464 ops/ms
Iteration 14: 871289.619 ops/ms
Iteration 15: 871734.355 ops/ms
Iteration 16: 879997.366 ops/ms
Iteration 17: 871969.580 ops/ms
Iteration 18: 804149.821 ops/ms
Iteration 19: 874024.426 ops/ms
Iteration 20: 875282.521 ops/ms
Result: 864541.507 ±(99.9%) 14445.542 ops/ms [Average]
Statistics: (min, avg, max) = (804149.821, 864541.507, 879997.366), stdev = 16635.508
Confidence interval (99.9%): [850095.965, 878987.049]
# Run progress: 1.11% complete, ETA 01:11:46
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: benchmarks.DynamicProxy.directAdd
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Fork: 2 of 10
# Warmup Iteration 1: 857739.490 ops/ms
# Warmup Iteration 2: 869167.053 ops/ms
# Warmup Iteration 3: 866989.763 ops/ms
# Warmup Iteration 4: 867616.777 ops/ms
# Warmup Iteration 5: 855941.061 ops/ms
# Warmup Iteration 6: 863636.436 ops/ms
# Warmup Iteration 7: 869266.711 ops/ms
# Warmup Iteration 8: 864455.908 ops/ms
# Warmup Iteration 9: 865891.557 ops/ms
# Warmup Iteration 10: 864545.288 ops/ms
# Warmup Iteration 11: 785449.100 ops/ms
# Warmup Iteration 12: 871062.463 ops/ms
# Warmup Iteration 13: 865995.950 ops/ms
# Warmup Iteration 14: 869501.998 ops/ms
# Warmup Iteration 15: 880105.688 ops/ms
# Warmup Iteration 16: 870951.292 ops/ms
# Warmup Iteration 17: 869497.593 ops/ms
# Warmup Iteration 18: 789584.957 ops/ms
# Warmup Iteration 19: 865307.329 ops/ms
# Warmup Iteration 20: 864320.819 ops/ms
Iteration 1: 846892.297 ops/ms
Iteration 2: 858812.483 ops/ms
Iteration 3: 779040.228 ops/ms
Iteration 4: 866954.433 ops/ms
Iteration 5: 874218.456 ops/ms
Iteration 6: 871035.856 ops/ms
Iteration 7: 878649.265 ops/ms
Iteration 8: 791281.176 ops/ms
Iteration 9: 863840.816 ops/ms
Iteration 10: 870654.903 ops/ms
Iteration 11: 858951.775 ops/ms
Iteration 12: 781786.693 ops/ms
Iteration 13: 857076.130 ops/ms
Iteration 14: 869513.038 ops/ms
Iteration 15: 872952.031 ops/ms
Iteration 16: 871831.447 ops/ms
Iteration 17: 787480.350 ops/ms
Iteration 18: 870333.741 ops/ms
Iteration 19: 878597.978 ops/ms
Iteration 20: 868287.689 ops/ms
Result: 850909.539 ±(99.9%) 30180.380 ops/ms [Average]
Statistics: (min, avg, max) = (779040.228, 850909.539, 878649.265), stdev = 34755.771
Confidence interval (99.9%): [820729.159, 881089.919]
# Run progress: 10.00% complete, ETA 01:05:16
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: benchmarks.DynamicProxy.directAdd
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Fork: 10 of 10
# Warmup Iteration 1: 850773.837 ops/ms
# Warmup Iteration 2: 859996.736 ops/ms
# Warmup Iteration 3: 842337.855 ops/ms
# Warmup Iteration 4: 834024.179 ops/ms
# Warmup Iteration 5: 848325.927 ops/ms
# Warmup Iteration 6: 851807.203 ops/ms
# Warmup Iteration 7: 865749.630 ops/ms
# Warmup Iteration 8: 841759.249 ops/ms
# Warmup Iteration 9: 843872.638 ops/ms
# Warmup Iteration 10: 852743.625 ops/ms
# Warmup Iteration 11: 870366.746 ops/ms
# Warmup Iteration 12: 860670.067 ops/ms
# Warmup Iteration 13: 855269.930 ops/ms
# Warmup Iteration 14: 860215.809 ops/ms
# Warmup Iteration 15: 862334.297 ops/ms
# Warmup Iteration 16: 861751.244 ops/ms
# Warmup Iteration 17: 855697.310 ops/ms
# Warmup Iteration 18: 773933.681 ops/ms
# Warmup Iteration 19: 855363.310 ops/ms
# Warmup Iteration 20: 860512.882 ops/ms
Iteration 1: 784181.953 ops/ms
Iteration 2: 861926.105 ops/ms
Iteration 3: 854354.042 ops/ms
Iteration 4: 863663.976 ops/ms
Iteration 5: 869966.533 ops/ms
Iteration 6: 821378.328 ops/ms
Iteration 7: 866908.171 ops/ms
Iteration 8: 787129.306 ops/ms
Iteration 9: 780665.906 ops/ms
Iteration 10: 783710.954 ops/ms
Iteration 11: 869766.607 ops/ms
Iteration 12: 874771.924 ops/ms
Iteration 13: 874007.935 ops/ms
Iteration 14: 786184.900 ops/ms
Iteration 15: 867349.710 ops/ms
Iteration 16: 818710.228 ops/ms
Iteration 17: 786727.556 ops/ms
Iteration 18: 853130.618 ops/ms
Iteration 19: 869214.341 ops/ms
Iteration 20: 867435.571 ops/ms
Result: 837059.233 ±(99.9%) 33100.880 ops/ms [Average]
Statistics: (min, avg, max) = (780665.906, 837059.233, 874771.924), stdev = 38119.022
Confidence interval (99.9%): [803958.353, 870160.113]
# Run progress: 11.11% complete, ETA 01:04:28
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: benchmarks.DynamicProxy.proxiedAdd
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Fork: 1 of 10
# Warmup Iteration 1: 99681.741 ops/ms
# Warmup Iteration 2: 111011.439 ops/ms
# Warmup Iteration 3: 83221.079 ops/ms
# Warmup Iteration 4: 86550.458 ops/ms
# Warmup Iteration 5: 87100.410 ops/ms
# Warmup Iteration 6: 86719.841 ops/ms
# Warmup Iteration 7: 87628.795 ops/ms
# Warmup Iteration 8: 86472.207 ops/ms
# Warmup Iteration 9: 86111.395 ops/ms
# Warmup Iteration 10: 86871.991 ops/ms
# Warmup Iteration 11: 87797.001 ops/ms
# Warmup Iteration 12: 86590.364 ops/ms
# Warmup Iteration 13: 87005.565 ops/ms
# Warmup Iteration 14: 88105.287 ops/ms
# Warmup Iteration 15: 88517.748 ops/ms
# Warmup Iteration 16: 86863.272 ops/ms
# Warmup Iteration 17: 87413.754 ops/ms
# Warmup Iteration 18: 85960.142 ops/ms
# Warmup Iteration 19: 87216.054 ops/ms
# Warmup Iteration 20: 86368.302 ops/ms
Iteration 1: 85897.591 ops/ms
Iteration 2: 85818.520 ops/ms
Iteration 3: 86150.077 ops/ms
Iteration 4: 86313.090 ops/ms
Iteration 5: 86278.108 ops/ms
Iteration 6: 86504.070 ops/ms
Iteration 7: 85584.778 ops/ms
Iteration 8: 86987.707 ops/ms
Iteration 9: 85158.246 ops/ms
Iteration 10: 87069.476 ops/ms
Iteration 11: 88860.713 ops/ms
Iteration 12: 87230.651 ops/ms
Iteration 13: 88672.239 ops/ms
Iteration 14: 87435.816 ops/ms
Iteration 15: 83644.226 ops/ms
Iteration 16: 86858.133 ops/ms
Iteration 17: 86321.756 ops/ms
Iteration 18: 87300.606 ops/ms
Iteration 19: 85362.787 ops/ms
Iteration 20: 86763.998 ops/ms
Result: 86510.629 ±(99.9%) 1020.717 ops/ms [Average]
Statistics: (min, avg, max) = (83644.226, 86510.629, 88860.713), stdev = 1175.459
Confidence interval (99.9%): [85489.913, 87531.346]
We need:
When restarting ReBench based on an existing data file, it would be useful to first sort out all the runs that are already completed to have better estimate on remaining time.
With large benchmark suites it can take hours to go through all runs, and it would be nice to get early feedback and allow the results to get refined with more measurements later on.
I am not using it anyway, because it makes reasoning about comparability of ratios hard for me.
And then, there seems to be an argument that is against it from a theoretical perspective: http://blog.regehr.org/archives/1024
Think it relies on the wrong number of total runs.
We already filtered out all the ones that are done.
generally, the warning is confusing, and the right way to fix it is to solve issue #45.
ReBench should set a return code for the process when it ends but wasn't able to collect all desired data.
This is a to avoid the need for wrapper scripts.
This is especially useful for round-robin or random execution
With this distinction, we can easily unify the benchmark names with post-processing.
We need to think about the codespeed reporting name as well, do we still need that?
There was a problem on serentity with using the latest version, missing RunId module, I think.
The version that is currently deployed has however also issues with failing runs.
Some return value does not have enough components to be decomposed properly.
Sometimes, we got a really unstable VM, or a VM that doesn't do proper shutdown after running the benchmarks. To be able to obtain some of the results, it would be useful to have a command-line switch to override the error checking.
Necessary adaptations include:
--- a/rebench/executor.py
+++ b/rebench/executor.py
@@ -151,15 +154,15 @@ class Executor:
stderr=subprocess.STDOUT,
shell=True,
timeout=run_id.bench_cfg.suite.max_runtime)
- if return_code != 0:
- run_id.indicate_failed_execution()
- run_id.report_run_failed(cmdline, return_code, output)
- if return_code == 126:
- logging.error(("Could not execute %s. A likely cause is that "
- "the file is not marked as executable.")
- % run_id.bench_cfg.vm.name)
- else:
- self._eval_output(output, run_id, gauge_adapter, cmdline)
+ #if return_code != 0:
+ # run_id.indicate_failed_execution()
+ # run_id.report_run_failed(cmdline, return_code, output)
+ # if return_code == 126:
+ # logging.error(("Could not execute %s. A likely cause is that "
+ # "the file is not marked as executable.")
+ # % run_id.bench_cfg.vm.name)
+ #else:
+ self._eval_output(output, run_id, gauge_adapter, cmdline)
return self._check_termination_condition(run_id, termination_check)
and
--- a/rebench/interop/rebench_log_adapter.py
+++ b/rebench/interop/rebench_log_adapter.py
@@ -47,9 +47,9 @@ class RebenchLogAdapter(GaugeAdapter):
current = DataPoint(run_id)
for line in data.split("\n"):
- if self.check_for_error(line):
- raise ResultsIndicatedAsInvalid(
- "Output of bench program indicated error.")
+ #if self.check_for_error(line):
+ # raise ResultsIndicatedAsInvalid(
+ # "Output of bench program indicated error.")
m = self.re_log_line.match(line)
if m:
Is required for use in CI environments
Useful notifications would be for completion.
Remove the graph generation that's currently in ReBench and replace it with an R implementation.
See whether it's possible and efficient to have incremental graph generation.
See how we can express the graphs dependencies on benchmarks properly to ReBench?
Avoid having to have them all in one file, also to make it easier to drop in new ones.
Most output is currently only displayed with the -d switch.
Make output such as the stuff that enables debugging of failed runs conditional on failed runs.
Display general progress output without requiring -d switch.
And make final reporting output human readable. We got machine-readable files already.
Needed for debugging and problem analysis.
Need to be able express in the config whether parallel execution is allowed.
Perhaps at least on the VM.
Current use case: Graal+Truffle are highly parallel, and use all cores, but RPython is strictly single-core.
Also, the interpreter versions probably do not interfere with each other.
Need to be able to configure some maximum degree of parallelism.
Am not entirely sure that parallel execution is going to be interference free, so, we definitely do not want to overload the machine.
Think, it consumes JSON for the data files, so that part should be easy.
https://github.com/scalameter/scalameter
License seems to be a 3-clause BSD version, so having the code side by side in the repo should be fine, I guess...
For example, warn when finding old ulimit entries, which are not supported anymore.
The profiling support is probably broken, and hasn't been used in years.
Remove it to simplify the code.
Currently, the support for Caliper's output is not yet adapted to the new ReBench implementation.
Relevant properties could be minimum runtime, and error (stddev or conf interval size).
It is demonstrated by configurator_test.py test_number_of_experiments_testconf.
Add incremental result reporting to codespeed.
Allows to see results earlier and when the benchmark run was aborted for some reason.
ulimit does not work for tile-monitor or wrapper scripts that do not have CPU utilisation
Needs to be replaced by something that timeouts with respect to wall clock.
Currently, the warning is also shown when another config is executed, but the 'unavailable' one is in the same config file.
For continuous performance tracking during development it is important to be able to automatically account for changes in the warmup time benchmarks take in order to keep track of the achievable peak performance. At the same time, it is still important to minimize the overall benchmark runtime to be able to experiment properly.
[Note: this is targeted towards micro- and macrobenchmarks with reasonably small runtimes to be practical.]
While Kalibera and Jones (2013, http://kar.kent.ac.uk/33611/) advocate for a convincing manual method to determine whether a real steady state is reached and the measurements from the same VM invocation reached an independent state, I need something more practical, something that is completely automatized, robust, and parameterizable.
I think, I am going to take a slightly parameterized version of Georges et al's method (2007, http://buytaert.net/files/oopsla07-georges.pdf):
CoV
min_i
k
max_i
max_runtime
CoV: standard deviation over all measurements (at least minimum number of iterations) divided by their mean (sd(m)/mean(m))
max_i
was reachedi
before reaching 'reasonably-steady-state'k
measurements (after i
iterations)Currently a failing benchmark (Richards) is recognized properly, but rebench fails to process the resulting exception and terminates.
Output:
Starting Richards benchmark ...
Results are incorrect
Traceback (most recent call last):
File "/usr/bin/rebench", line 9, in <module>
load_entry_point('ReBench==0.2.2', 'console_scripts', 'rebench')()
File "/home/smarr/Projects/ReBench/rebench/rebench.py", line 161, in main_func
return ReBench().run()
File "/home/smarr/Projects/ReBench/rebench/rebench.py", line 141, in run
self.execute_experiment()
File "/home/smarr/Projects/ReBench/rebench/rebench.py", line 156, in execute_experiment
executor.execute()
File "/home/smarr/Projects/ReBench/rebench/executor.py", line 177, in execute
self._scheduler.execute()
File "/home/smarr/Projects/ReBench/rebench/executor.py", line 70, in execute
completed = self._executor.execute_run(run)
File "/home/smarr/Projects/ReBench/rebench/executor.py", line 114, in execute_run
termination_check)
File "/home/smarr/Projects/ReBench/rebench/executor.py", line 148, in _generate_data_point
self._eval_output(output, run_id, perf_reader, cmdline)
File "/home/smarr/Projects/ReBench/rebench/executor.py", line 154, in _eval_output
data_points = perf_reader.parse_data(output, run_id)
File "/home/smarr/Projects/ReBench/rebench/performance.py", line 92, in parse_data
raise RuntimeError("Output of bench program indicated error.")
RuntimeError: Output of bench program indicated error.
It would be nice to pass on environment variables to the binary/vm
Currently, the exception terminates ReBench, it shouldn't...
Looks like this happens when "all" is used, and there are multiple experiments.
Currently results are report in microseconds which is not useful.
Time resolution isn't good enough anyway, and measurement errors are well beyond microseconds, too.
Need to check whether the nice
tool can be executed, i.e., whether sufficient permissions are available.
Should report that and either automatically suppress its usage, but avoid failing runs. Another option would be to abort directly.
Having somewhere a "%(benchmark)" without the s at the end (i.e., not "%(benchmark)s") leads to a cryptic ValueError.
Handle the error and print a nice error message.
Perhaps even pointing out where the conversion operator is missing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.