Giter Club home page Giter Club logo

rebench's Issues

Remove use of context.py

The context-oriented programming is only used sparingly.
While it is kind of nice, it's not really necessary and makes things more complex than necessary.

Randomize execution order of benchmarks

To avoid systematic bias that might be caused by operating system caches or hardware/memory properties, the execution order of benchmark runs should be randomized.

This has the additional benefit of producing data points for more benchmarks and it is possible to see trends of the results earlier for long running sets of benchmarks.

There needs to be an option to suppress the randomization, of course.
Also, we should have an estimate of what that means in terms of memory usage when many microbenchmarks are executed (need to keep the data in memory to calculated confidence interval, etc...)

Properly handle threading of the IRC support

  • we need to sync sending of reports to the thread
  • we need to properly end the thread when rebench is done
  • would also be good to not print the debug output for the IRC client on the normal debugging level

Avoid duplicated executing of identical run configurations

If we have a complex configuration, it can be that benchmarks with identical configurations are listed multiple times to produce separate data sets.

Try to avoid executing them more than necessary.

How to determine that runs are identical? Either based on the configuration, which is tricky, because we can use the string expansion ala %(cores)s. Or based on the resulting command line, which might also be tricky if there are subtile whitespace differences.

Support rerunning of selected experiments

Typical scenarios:

  • a benchmark changed
  • a VM changed

One might also want to rerun, or run specific experiments with specific parameters, but this is outside the scope of this feature request.

So, we want something like:

rebench test.conf TestExperiment vm:TestRunner1
rebench -r test.conf TestExperiment vm:TestRunner1 # -r for rerun (or perhaps -c, for clear)
rebench -r test.conf TestExperiment s:TestSuite1
rebench -r test.conf TestExperiment vm:TestRunner1 s:TestSuite1
rebench -r test.conf TestExperiment s:TestSuite1:Bench1

Add support for JMH

Add a performance reader for JMH.

Example output:

# Run progress: 0.00% complete, ETA 01:00:00
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: benchmarks.DynamicProxy.directAdd
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Fork: 1 of 10
# Warmup Iteration   1: 847885.303 ops/ms
# Warmup Iteration   2: 869209.997 ops/ms
# Warmup Iteration   3: 787127.216 ops/ms
# Warmup Iteration   4: 849333.002 ops/ms
# Warmup Iteration   5: 862511.213 ops/ms
# Warmup Iteration   6: 786574.891 ops/ms
# Warmup Iteration   7: 867692.766 ops/ms
# Warmup Iteration   8: 791901.852 ops/ms
# Warmup Iteration   9: 868440.246 ops/ms
# Warmup Iteration  10: 873144.727 ops/ms
# Warmup Iteration  11: 858841.746 ops/ms
# Warmup Iteration  12: 864258.483 ops/ms
# Warmup Iteration  13: 867792.566 ops/ms
# Warmup Iteration  14: 873802.641 ops/ms
# Warmup Iteration  15: 789308.386 ops/ms
# Warmup Iteration  16: 872348.119 ops/ms
# Warmup Iteration  17: 876049.520 ops/ms
# Warmup Iteration  18: 855590.678 ops/ms
# Warmup Iteration  19: 790754.207 ops/ms
# Warmup Iteration  20: 844763.982 ops/ms
Iteration   1: 851585.492 ops/ms
Iteration   2: 855210.272 ops/ms
Iteration   3: 863139.120 ops/ms
Iteration   4: 854572.548 ops/ms
Iteration   5: 848365.018 ops/ms
Iteration   6: 868452.069 ops/ms
Iteration   7: 874102.630 ops/ms
Iteration   8: 871221.945 ops/ms
Iteration   9: 872087.960 ops/ms
Iteration  10: 871954.737 ops/ms
Iteration  11: 866641.653 ops/ms
Iteration  12: 871745.541 ops/ms
Iteration  13: 873303.464 ops/ms
Iteration  14: 871289.619 ops/ms
Iteration  15: 871734.355 ops/ms
Iteration  16: 879997.366 ops/ms
Iteration  17: 871969.580 ops/ms
Iteration  18: 804149.821 ops/ms
Iteration  19: 874024.426 ops/ms
Iteration  20: 875282.521 ops/ms

Result: 864541.507 ±(99.9%) 14445.542 ops/ms [Average]
  Statistics: (min, avg, max) = (804149.821, 864541.507, 879997.366), stdev = 16635.508
  Confidence interval (99.9%): [850095.965, 878987.049]


# Run progress: 1.11% complete, ETA 01:11:46
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: benchmarks.DynamicProxy.directAdd
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Fork: 2 of 10
# Warmup Iteration   1: 857739.490 ops/ms
# Warmup Iteration   2: 869167.053 ops/ms
# Warmup Iteration   3: 866989.763 ops/ms
# Warmup Iteration   4: 867616.777 ops/ms
# Warmup Iteration   5: 855941.061 ops/ms
# Warmup Iteration   6: 863636.436 ops/ms
# Warmup Iteration   7: 869266.711 ops/ms
# Warmup Iteration   8: 864455.908 ops/ms
# Warmup Iteration   9: 865891.557 ops/ms
# Warmup Iteration  10: 864545.288 ops/ms
# Warmup Iteration  11: 785449.100 ops/ms
# Warmup Iteration  12: 871062.463 ops/ms
# Warmup Iteration  13: 865995.950 ops/ms
# Warmup Iteration  14: 869501.998 ops/ms
# Warmup Iteration  15: 880105.688 ops/ms
# Warmup Iteration  16: 870951.292 ops/ms
# Warmup Iteration  17: 869497.593 ops/ms
# Warmup Iteration  18: 789584.957 ops/ms
# Warmup Iteration  19: 865307.329 ops/ms
# Warmup Iteration  20: 864320.819 ops/ms
Iteration   1: 846892.297 ops/ms
Iteration   2: 858812.483 ops/ms
Iteration   3: 779040.228 ops/ms
Iteration   4: 866954.433 ops/ms
Iteration   5: 874218.456 ops/ms
Iteration   6: 871035.856 ops/ms
Iteration   7: 878649.265 ops/ms
Iteration   8: 791281.176 ops/ms
Iteration   9: 863840.816 ops/ms
Iteration  10: 870654.903 ops/ms
Iteration  11: 858951.775 ops/ms
Iteration  12: 781786.693 ops/ms
Iteration  13: 857076.130 ops/ms
Iteration  14: 869513.038 ops/ms
Iteration  15: 872952.031 ops/ms
Iteration  16: 871831.447 ops/ms
Iteration  17: 787480.350 ops/ms
Iteration  18: 870333.741 ops/ms
Iteration  19: 878597.978 ops/ms
Iteration  20: 868287.689 ops/ms

Result: 850909.539 ±(99.9%) 30180.380 ops/ms [Average]
  Statistics: (min, avg, max) = (779040.228, 850909.539, 878649.265), stdev = 34755.771
  Confidence interval (99.9%): [820729.159, 881089.919]


# Run progress: 10.00% complete, ETA 01:05:16
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: benchmarks.DynamicProxy.directAdd
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Fork: 10 of 10
# Warmup Iteration   1: 850773.837 ops/ms
# Warmup Iteration   2: 859996.736 ops/ms
# Warmup Iteration   3: 842337.855 ops/ms
# Warmup Iteration   4: 834024.179 ops/ms
# Warmup Iteration   5: 848325.927 ops/ms
# Warmup Iteration   6: 851807.203 ops/ms
# Warmup Iteration   7: 865749.630 ops/ms
# Warmup Iteration   8: 841759.249 ops/ms
# Warmup Iteration   9: 843872.638 ops/ms
# Warmup Iteration  10: 852743.625 ops/ms
# Warmup Iteration  11: 870366.746 ops/ms
# Warmup Iteration  12: 860670.067 ops/ms
# Warmup Iteration  13: 855269.930 ops/ms
# Warmup Iteration  14: 860215.809 ops/ms
# Warmup Iteration  15: 862334.297 ops/ms
# Warmup Iteration  16: 861751.244 ops/ms
# Warmup Iteration  17: 855697.310 ops/ms
# Warmup Iteration  18: 773933.681 ops/ms
# Warmup Iteration  19: 855363.310 ops/ms
# Warmup Iteration  20: 860512.882 ops/ms
Iteration   1: 784181.953 ops/ms
Iteration   2: 861926.105 ops/ms
Iteration   3: 854354.042 ops/ms
Iteration   4: 863663.976 ops/ms
Iteration   5: 869966.533 ops/ms
Iteration   6: 821378.328 ops/ms
Iteration   7: 866908.171 ops/ms
Iteration   8: 787129.306 ops/ms
Iteration   9: 780665.906 ops/ms
Iteration  10: 783710.954 ops/ms
Iteration  11: 869766.607 ops/ms
Iteration  12: 874771.924 ops/ms
Iteration  13: 874007.935 ops/ms
Iteration  14: 786184.900 ops/ms
Iteration  15: 867349.710 ops/ms
Iteration  16: 818710.228 ops/ms
Iteration  17: 786727.556 ops/ms
Iteration  18: 853130.618 ops/ms
Iteration  19: 869214.341 ops/ms
Iteration  20: 867435.571 ops/ms

Result: 837059.233 ±(99.9%) 33100.880 ops/ms [Average]
  Statistics: (min, avg, max) = (780665.906, 837059.233, 874771.924), stdev = 38119.022
  Confidence interval (99.9%): [803958.353, 870160.113]


# Run progress: 11.11% complete, ETA 01:04:28
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: benchmarks.DynamicProxy.proxiedAdd
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Fork: 1 of 10
# Warmup Iteration   1: 99681.741 ops/ms
# Warmup Iteration   2: 111011.439 ops/ms
# Warmup Iteration   3: 83221.079 ops/ms
# Warmup Iteration   4: 86550.458 ops/ms
# Warmup Iteration   5: 87100.410 ops/ms
# Warmup Iteration   6: 86719.841 ops/ms
# Warmup Iteration   7: 87628.795 ops/ms
# Warmup Iteration   8: 86472.207 ops/ms
# Warmup Iteration   9: 86111.395 ops/ms
# Warmup Iteration  10: 86871.991 ops/ms
# Warmup Iteration  11: 87797.001 ops/ms
# Warmup Iteration  12: 86590.364 ops/ms
# Warmup Iteration  13: 87005.565 ops/ms
# Warmup Iteration  14: 88105.287 ops/ms
# Warmup Iteration  15: 88517.748 ops/ms
# Warmup Iteration  16: 86863.272 ops/ms
# Warmup Iteration  17: 87413.754 ops/ms
# Warmup Iteration  18: 85960.142 ops/ms
# Warmup Iteration  19: 87216.054 ops/ms
# Warmup Iteration  20: 86368.302 ops/ms
Iteration   1: 85897.591 ops/ms
Iteration   2: 85818.520 ops/ms
Iteration   3: 86150.077 ops/ms
Iteration   4: 86313.090 ops/ms
Iteration   5: 86278.108 ops/ms
Iteration   6: 86504.070 ops/ms
Iteration   7: 85584.778 ops/ms
Iteration   8: 86987.707 ops/ms
Iteration   9: 85158.246 ops/ms
Iteration  10: 87069.476 ops/ms
Iteration  11: 88860.713 ops/ms
Iteration  12: 87230.651 ops/ms
Iteration  13: 88672.239 ops/ms
Iteration  14: 87435.816 ops/ms
Iteration  15: 83644.226 ops/ms
Iteration  16: 86858.133 ops/ms
Iteration  17: 86321.756 ops/ms
Iteration  18: 87300.606 ops/ms
Iteration  19: 85362.787 ops/ms
Iteration  20: 86763.998 ops/ms

Result: 86510.629 ±(99.9%) 1020.717 ops/ms [Average]
  Statistics: (min, avg, max) = (83644.226, 86510.629, 88860.713), stdev = 1175.459
  Confidence interval (99.9%): [85489.913, 87531.346]

Add support for warmup

We need:

  • a configuration to define the number of warmup runs
  • a template parameter for the command
  • ignore measurements for those runs (let's ignore situations were the harness supports it for the moment)

There might be a regression for failing runs.

There was a problem on serentity with using the latest version, missing RunId module, I think.
The version that is currently deployed has however also issues with failing runs.
Some return value does not have enough components to be decomposed properly.

Add command-line option to ignore return code and record also faulty/failing runs

Sometimes, we got a really unstable VM, or a VM that doesn't do proper shutdown after running the benchmarks. To be able to obtain some of the results, it would be useful to have a command-line switch to override the error checking.

Necessary adaptations include:

--- a/rebench/executor.py
+++ b/rebench/executor.py
@@ -151,15 +154,15 @@ class Executor:
                                                           stderr=subprocess.STDOUT,
                                                           shell=True,
                                                           timeout=run_id.bench_cfg.suite.max_runtime)
-        if return_code != 0:
-            run_id.indicate_failed_execution()
-            run_id.report_run_failed(cmdline, return_code, output)
-            if return_code == 126:
-                logging.error(("Could not execute %s. A likely cause is that "
-                               "the file is not marked as executable.")
-                              % run_id.bench_cfg.vm.name)
-        else:
-            self._eval_output(output, run_id, gauge_adapter, cmdline)
+        #if return_code != 0:
+        #    run_id.indicate_failed_execution()
+        #    run_id.report_run_failed(cmdline, return_code, output)
+        #    if return_code == 126:
+        #        logging.error(("Could not execute %s. A likely cause is that "
+        #                       "the file is not marked as executable.")
+        #                      % run_id.bench_cfg.vm.name)
+        #else:
+        self._eval_output(output, run_id, gauge_adapter, cmdline)

         return self._check_termination_condition(run_id, termination_check)

and

--- a/rebench/interop/rebench_log_adapter.py
+++ b/rebench/interop/rebench_log_adapter.py
@@ -47,9 +47,9 @@ class RebenchLogAdapter(GaugeAdapter):
         current = DataPoint(run_id)

         for line in data.split("\n"):
-            if self.check_for_error(line):
-                raise ResultsIndicatedAsInvalid(
-                    "Output of bench program indicated error.")
+            #if self.check_for_error(line):
+            #    raise ResultsIndicatedAsInvalid(
+            #        "Output of bench program indicated error.")

             m = self.re_log_line.match(line)
             if m:

Make output more useful, less verbose, and independent of debug output.

Most output is currently only displayed with the -d switch.
Make output such as the stuff that enables debugging of failed runs conditional on failed runs.
Display general progress output without requiring -d switch.
And make final reporting output human readable. We got machine-readable files already.

Add support for parallel execution of benchmarks

Need to be able express in the config whether parallel execution is allowed.
Perhaps at least on the VM.

Current use case: Graal+Truffle are highly parallel, and use all cores, but RPython is strictly single-core.
Also, the interpreter versions probably do not interfere with each other.

Need to be able to configure some maximum degree of parallelism.
Am not entirely sure that parallel execution is going to be interference free, so, we definitely do not want to overload the machine.

Fix Caliper support

Currently, the support for Caliper's output is not yet adapted to the new ReBench implementation.

Replace ulimit usage by wall-clock timeout

ulimit does not work for tile-monitor or wrapper scripts that do not have CPU utilisation

Needs to be replaced by something that timeouts with respect to wall clock.

Add feature to discover 'reasonably-steady-state'

For continuous performance tracking during development it is important to be able to automatically account for changes in the warmup time benchmarks take in order to keep track of the achievable peak performance. At the same time, it is still important to minimize the overall benchmark runtime to be able to experiment properly.

[Note: this is targeted towards micro- and macrobenchmarks with reasonably small runtimes to be practical.]

While Kalibera and Jones (2013, http://kar.kent.ac.uk/33611/) advocate for a convincing manual method to determine whether a real steady state is reached and the measurements from the same VM invocation reached an independent state, I need something more practical, something that is completely automatized, robust, and parameterizable.

I think, I am going to take a slightly parameterized version of Georges et al's method (2007, http://buytaert.net/files/oopsla07-georges.pdf):

  • give a parameter for the desired coefficient of variation CoV
  • a parameter for minimum number of iterations min_i
  • a parameter for the number of measurements k
  • a parameter for the maximum number of measurements max_i
  • a parameter for the maximum runtime max_runtime

CoV: standard deviation over all measurements (at least minimum number of iterations) divided by their mean (sd(m)/mean(m))

  • report whether 'reasonably-steady-state' was reached, if not, report whether timeout or max_i was reached
  • report number of necessary iterations i before reaching 'reasonably-steady-state'
  • report k measurements (after i iterations)

Failing benchmark recognized but terminates ReBench

Currently a failing benchmark (Richards) is recognized properly, but rebench fails to process the resulting exception and terminates.

Output:

Starting Richards benchmark ... 
Results are incorrect

Traceback (most recent call last):

  File "/usr/bin/rebench", line 9, in <module>
    load_entry_point('ReBench==0.2.2', 'console_scripts', 'rebench')()
  File "/home/smarr/Projects/ReBench/rebench/rebench.py", line 161, in main_func
    return ReBench().run()
  File "/home/smarr/Projects/ReBench/rebench/rebench.py", line 141, in run
    self.execute_experiment()
  File "/home/smarr/Projects/ReBench/rebench/rebench.py", line 156, in execute_experiment
    executor.execute()
  File "/home/smarr/Projects/ReBench/rebench/executor.py", line 177, in execute
    self._scheduler.execute()
  File "/home/smarr/Projects/ReBench/rebench/executor.py", line 70, in execute
    completed = self._executor.execute_run(run)
  File "/home/smarr/Projects/ReBench/rebench/executor.py", line 114, in execute_run
    termination_check)
  File "/home/smarr/Projects/ReBench/rebench/executor.py", line 148, in _generate_data_point
    self._eval_output(output, run_id, perf_reader, cmdline)
  File "/home/smarr/Projects/ReBench/rebench/executor.py", line 154, in _eval_output
    data_points = perf_reader.parse_data(output, run_id)
  File "/home/smarr/Projects/ReBench/rebench/performance.py", line 92, in parse_data
    raise RuntimeError("Output of bench program indicated error.")
RuntimeError: Output of bench program indicated error.

Report results as milliseconds

Currently results are report in microseconds which is not useful.
Time resolution isn't good enough anyway, and measurement errors are well beyond microseconds, too.

Add test whether `nice` can be executed and report

Need to check whether the nice tool can be executed, i.e., whether sufficient permissions are available.
Should report that and either automatically suppress its usage, but avoid failing runs. Another option would be to abort directly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.