Giter Club home page Giter Club logo

ctbench's Introduction

Compiler-assisted benchmarking for the study of C++ metaprogram compile times.

DOI

ctbench allows you to declare and generate compile-time benchmark batches for given ranges, run them, aggregate and wrangle Clang profiling data, and plot them.

The project was made to fit the needs of scientific data collection and analysis, thus it is not a one-shot profiler, but a set of tools that enable reproductible data gathering from user-defined, variably sized compile-time benchmarks using Clang's time-trace feature to understand the impact of metaprogramming techniques on compile time. On top of that, ctbench is also able to measure compiler execution time to support compilers that do not have built-in profilers like GCC.

It has two main components: a C++ plotting toolset that can be used as a CLI program and as a library, and a CMake boilerplate library to generate benchmark and graph targets.

The CMake library contains all the boilerplate code to define benchmark targets compatible with the C++ plotting toolset called grapher.

Rule of Cheese can be used as an example project for using ctbench.

Examples

As an example here are benchmark curves from the Poacher project. The benchmark case sources are available here.

Clang ExecuteCompiler time curve from poacher, generated by the compare_by plotter

Clang Total Frontend time curve from poacher, generated by the compare_by plotter

Using ctbench

Build prerequisites

ArchLinux and Ubuntu 23.04 are officially supported as tests are compiled and executed on both of these Linux distributions. Others including Fedora or any other Linux distro that provides CMake 3.25 or higher should be compatible.

  • Required ArchLinux packages: boost boost-libs catch2 clang cmake curl fmt git llvm llvm-libs ninja nlohmann-json tar tbb unzip zip

  • Required Ubuntu packages: catch2 clang cmake curl git libboost-all-dev libclang-dev libfmt-dev libllvm15 libtbb-dev libtbb12 llvm llvm-dev ninja-build nlohmann-json3-dev pkg-config tar unzip zip

The Sciplot library is required too. It can be installed on ArchLinux using the sciplot-git AUR package (NB: the non-git package isn't up-to-date). Otherwise, you can install it for your whole system using CMake or locally using vcpkg:

git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
./vcpkg/vcpkg install sciplot fmt

cmake --preset release \
  -DCMAKE_TOOLCHAIN_FILE=vcpkg/scripts/buildsystems/vcpkg.cmake

Note: The fmt dependency is needed, as vcpkg breaks fmt's CMake integration if you have it already installed.

Installing ctbench

git clone https://github.com/jpenuchot/ctbench
cd ctbench
cmake --preset release
cmake --build --preset release
sudo cmake --build --preset release --target install

An AUR package is available for easier install and update.

Integrating ctbench in your project

ctbench can be integrated to a CMake project using find_package:

find_package(ctbench REQUIRED)

The example project is provided as a reference project for ctbench integration and usage. For more details, an exhaustive CMake API reference is available.

Declaring a benchmark case target

A benchmark case is represented by a C++ file. It will be "instanciated", ie. compiled with BENCHMARK_SIZE defined to values in a range that you provide.

BENCHMARK_SIZE is intended to be used by the preprocessor to generate a benchmark instance of the desired size:

#include <boost/preprocessor/repetition/repeat.hpp>

// First we generate foo<int>().
// foo<int>() uses C++20 requirements to dispatch function calls accross 16
// of its instances, according to the value of its integer template parameter.

#define FOO_MAX 16

#define DECL(z, i, nope)                                                       \
  template <int N>                                                             \
  requires(N % FOO_MAX == i) constexpr int foo() { return N * i; }

BOOST_PP_REPEAT(BENCHMARK_SIZE, DECL, FOO_MAX);
#undef DECL

// Now we generate the sum() function for instanciation

int sum() {
  int i;

#define CALL(z, n, nop) i += foo<n>();
  BOOST_PP_REPEAT(BENCHMARK_SIZE, CALL, i);
#undef CALL
  return i;
}

By default, only compiler execution time is measured. If you want to generate plots using Clang's profiler data, add the following:

add_compile_options(-ftime-trace -ftime-trace-granularity=1)

Note that plotting profiler data takes more time and will generate a lot of plot files.

Then you can declare a benchmark case target in CMake with the following:

ctbench_add_benchmark(function_selection.requires # Benchmark case name
  function_selection-requires.cpp                 # Benchmark case file
  1                                               # Range begin
  32                                              # Range end
  1                                               # Range step
  10)                                             # Iterations per size

Declaring a graph target

Once you have several benchmark cases, you can start writing a graph config.

Example configs can be found here, or by running ctbench-grapher-utils --plotter=<plotter> --command=get-default-config. A list of available plotters can be retrieved by running ctbench-grapher-utils --help.

{
  "plotter": "compare_by",
  "demangle": true,
  "draw_average": true,
  "draw_points": true,
  "key_ptrs": [
    "/name",
    "/args/detail"
  ],
  "legend_title": "Timings",
  "plot_file_extensions": [
    ".svg",
    ".png"
  ],
  "value_ptr": "/dur",
  "width": 1500,
  "height": 500,
  "x_label": "Benchmark size factor",
  "y_label": "Time (µs)"
}

This configuration uses the compare_by plotter. It compares features targeted by the JSON pointers in key_ptrs across all benchmark cases. This is the easiest way to extract and compare as many relevant time-trace features at once.

Back to CMake, you can now declare a graph target using this config to compare the time spent in the compiler execution, the frontend, and the backend between the benchmark cases you declared previously:

ctbench_add_graph(function_selection-feature_comparison-graph # Target name
  ${CONFIGS}/feature_comparison.json                          # Config
  function_selection.enable_if                                # First case
  function_selection.enable_if_t                              # Second case
  function_selection.if_constexpr                             # ...
  function_selection.control
  function_selection.requires)

For each group descriptor, a graph will be generated with one curve per benchmark case. In this case, you would then get 3 graphs (ExecuteCompiler, Frontend, and Backend) each with 5 curves (enable_if, enable_if_t, if_constexpr, control, and requires).

Related work

References

Citing ctbench

@article{Penuchot2023,
  doi = {10.21105/joss.05165},
  url = {https://doi.org/10.21105/joss.05165},
  year = {2023},
  publisher = {The Open Journal},
  volume = {8},
  number = {88},
  pages = {5165},
  author = {Jules Penuchot and Joel Falcou},
  title = {ctbench - compile-time benchmarking and analysis},
  journal = {Journal of Open Source Software},
}

ctbench's People

Contributors

danielskatz avatar jfalcou avatar jpenuchot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ctbench's Issues

Single size benchmarks

ctbench only supports variable size benchmarks, support for single size benchmarks for A/B comparisons would be nice too. It would require work on everything from the CMake API to grapher core data structures, and possibly the CLI too.

Consider removing LLVM dependency

If I edit CMakeLists.txt to read:

cmake_minimum_required(VERSION 3.22)

and then run

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja ..
ninja

I see

/z/ctbench/grapher/lib/grapher/plotters/debug.cpp:1:10: fatal error: llvm/Support/raw_ostream.h: No such file or directory
    1 | #include <llvm/Support/raw_ostream.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/z/ctbench/grapher/grapher-utils.cpp:1:10: fatal error: llvm/Support/CommandLine.h: No such file or directory
    1 | #include <llvm/Support/CommandLine.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

Is it possible to remove this LLVM dependency in favour of using the C++ STL instead? If not, is it possible to make the LLVM linking mechanism more robust?

openjournals/joss-reviews#5165

Fix AUR package dependencies

To fix for later:

  • boost is missing
  • sciplot should be specified as sciplot-git (the regular sciplot package uses catch2-v2, which can't be installed with the new version)

[PREDICATE] Allow selecting events with a given parent

Adding a predicate that matches events with a given parent. The matching could be done by name, by pointer, or simply by using other predicates to make it modular.

The main sub-issue for this one will be to keep track of event arborescence.

Clearer docs

I'm having a hard time finding a good way to organize the documentation into something that's clear and concise. I'm currently not satisfied by the way JSON configuration values are documented as I feel like users have to dig way too much just to find JSON configuration information.

Additionally, the docs aren't clear about what's internal or not.

I'm leaving this issue here as long as I'm not satisfied with way the docs are presented, any suggestion or critics are welcome.

Optimization

nlohmann::json has a nice interface but performance doesn't scale for what we're doing (processing hundreds of megabytes of JSON trace events).

Running ctbench-grapher-plot through perf shows that nlohmann::json-related calls are the ones that account for most overhead.

Making ctbench installable and packageable

I'm not a CMake expert but maybe someone would be able to help making ctbench installable and usable with find_package. The end goal would be to end up with something that's easy to handle with makepkg (AUR), conan, and vcpkg.

Generic compiler execution time measurement

One way to easily add (limited) support for GCC would be to measure compiler execution time instead of relying on internal profiling data. This could be useful to compare metaprogram performance scaling across a variety of compilers.

The main issues are:

  • How to measure compiler execution time from CMake
  • How to adapt current data wrangling code for a new kind of format

And so far my favorite solutions are:

  • Adding compiler execution time measurement to the clang time-trace wrapper
  • Use the wrapper to generate a file in the same format as time-trace files, but only with compiler execution time data

Fix compiler execution time measurement

Compiler execution time measurement through ttw seems to be broken, the time values it reports aren't coherent with Clang's time trace profiler. More works needs to be done to measure compiler execution more precisely, and Hyperfine seems to have that figured out so it's worth having a look and reproduce their measurement method in ttw.

Set of ctbench-provided configs

Just figured that some configs I wrote in https://github.com/JPenuchot/rule-of-cheese/ are generic enough to be provided as examples. I'd like them to be baked in a default config directory, and maybe implement a lookup directory list for configurations.

Review JOSS paper

Hey, please find my review of your tool below:

  • Regarding the manuscript I don't have major comments, other than that there are many typos/grammatical errors, especially in the summary.

  • I found the installation of the package a bit tedious:

    • there is no description how to install on Mac/Windows. The latter can I guess be excused, but if the package is not easily installable/available on Mac this excludes a significant fraction of the research community,
    • to install I needed to use Docker and reproduce what you did here and additionally google how to run ./vcpkg/vcpkg install sciplot fmt on an ARM. I think it would be easiest to provide a Dockerfile that does the exact installation with all dependencies in the example folder (and have a GitHub action that tests on Mac M1).
  • I think providing install instructions using conda would be helpful and also make it straight-forward to use for Mac users. I think all required dependencies are on there.

  • I haven't done much C++ development in the last year but seeing how much traction Meson is gaining, I find it unusual that it isn't provided as alternative build system. Would that be easy to offer? Also, is the CMake 3.25 dependency really necessary?

  • I think providing a self-contained example with exact instructions how the visualisations are created is helpful. I have probably overlooked something, but it's not 100% clear how you made them from a quick look.

  • Create a CONTRIBUTING file with how-to's on contributing new code.

Cheers,
Simon

Explore using ROOT for graph generation

Turning graph saving off makes ctbench-grapher-plot execution a lot shorter (a several seconds vs several minutes). Generating graphs using ROOT instead of Sciplot might be a good way to cut graph generation times by a significant factor.

compare_by plotter: filenames too long for large symbols

The compare_by plotter saves plots with filenames generated using the key itself. This becomes problematic when the symbols grow large enough for filenames to be too long.

The issue is known and currently has no other solution than generating smaller symbols at the moment, eg. benchmark driver functions with names that do not depend on benchmark instantiation size.

Solutions for this problem are welcome. I'm open to having different name shortening strategies, generating index files to help find plot files, and others. I will address this issue if it becomes really problematic.

Automated tests

  • CI
  • Test infrastructure
  • Tests implementation

Maybe: add action for tests compiled with GCC?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.