Giter Club home page Giter Club logo

libscope's Introduction

libscope

A systems-oriented C++11 benchmark support library brining the following tools under one roof:

This work was started at the University of Illinois with Professor Wen-Mei Hwu's IMPACT research group in collaboration with IBM's T. J. Watson Research as the SCOPE project. This project reworks the SCOPE framework as a library.

The Comm|Scope multi-GPU communication benchmarking tool uses this library.

Quickstart

Get CMake 3.17+ (needed for FindCUDAToolkit)

Add to your CMakeLists.txt:

add_subdirectory(thirdparty/scope)
target_link_libraries(<target> scope::scope)

Include "scope/scope.hpp"

#include "scope/scope.hpp"

int main(int argc, char **argv) {
  // initialize scope framework things
  scope::init(&argc, argv);
  // run all registered benchmarks
  scope::run();
  // clean up scope
  scope::finalize();
}

Define a benchmark using google/benchmark. Scope includes it built in and supports all google benchmark command line flags.

How To

Command Line Flags

All Scope applications support the following command line options:

  • --cuda <device ID>: add GPU visibility (default: all). May be repeated to add more GPUs.
  • --numa <node ID>: add NUMA visibility (default: all). May be repeated to add more NUMA nodes.

CPU turbo (scope/turbo.hpp)

scope::init() will record the CPU's current turbo state, and attempt to disable it, if it is executed with sufficient permissions (sudo). When scope exits from SIGINT or finalize()s, the original state will be restored. Otherwise, use enable-turbo to enable CPU turbo again.

You may also programatically control the CPU turbo state with the following library functions:

namespace turbo {
/* true if we are able to control the turbo state
 */
bool can_modify();

/* enable turbo
 */
Result enable();

/* disable turbo
 */
Result disable();

/* record current turbo state in `state`.
 */

Result get_state(State *state);
/* set turbo to `state`
 */
Result set_state(const State &state);

/* record the current turbo state into the global state
*/
Result get_state();

/* set turbo state from the global state
*/
Result set_state();
}

CPU governor (scope/governor.hpp)

scope::init() will record the current CPU governor, and attempt to set it to maximum it, if it is executed with sufficient permissions (sudo). When scope exits from SIGINT or finalize()s, the original governor will be restored. Otherwise, use set-minimum to restore the powersave governor.

You may also programatically control the CPU turbo state with the following library functions:

namespace governor {

/* whether modifying the governor is supported
*/
bool can_modify();

/* "performance" on linux
*/
Result set_state_maximum();

/* "powersave" on linux
*/
Result set_state_minimum();

/* record the current CPU goverors to `state`
*/
Result get_state(State *state);

/* set the CPU governor to `state`
*/
Result set_state(const State &state);

/* save the current governor, to be used with restore()
*/
Result record();

/* restore the governor last captured with record()
*/
Result restore();

} // namespace turbo

NUMA (scope/numa.hpp)

by default scope is compiled with NUMA support (SCOPE_USE_NUMA=1). It can be turned off with cmake -DUSE_NUMA=0.

Either way, the following API is exposed in the numa namespace. If NUMA support is disabled, the API is consistent with a system that has a single NUMA domain with ID 0.

/* True if there is NUMA support and the system supports NUMA, false otherwise
 */
bool numa::available();

/* bind future processing and allocation by this thread to `node`.
If no NUMA support, does nothing
*/
void numa::bind_node(int node);

/* return the number of numa nodes
If no NUMA support, return 1
*/
int numa::node_count();

/* return the NUMA ids present in the system
 */
std::vector<int> numa::ids();

There is also a numa::ScopedBind class that is an RAII-wrapper around numa::bind_node()

// Code out here runs anywhere
{
numa::ScopedBind binder(13);
// this code now runs on node 13
}
// Code out here runs anywhere

Cache Control (scope/cache.hpp)

// flush the cache line containing p
void flush(void *p);

// mfence (amd64) or sync 0 (ppc64le)
void barrier_all();

// flush all cache lines for the n-byte region starting at p
void flush_all(void *p, const size_t n);

Roadmap

Changelog

  • v1.1.2 (July 17, 2020)
    • fix a bug in getting available NUMA nodes
  • v1.1.1 (July 17, 2020)
    • fix a bug in getting available CUDA devices
  • v1.1.0 (July 17, 2020)
    • Re-raise INT, HUP, and KILL signals after cleanup
    • add --cuda and --numa flags
    • Cache NUMA configuration to improve benchmark registration performance
  • v1.0.0

Publications

@inproceedings{10.1145/3297663.3310299,
author = {Pearson, Carl and Dakkak, Abdul and Hashash, Sarah and Li, Cheng and Chung, I-Hsin and Xiong, Jinjun and Hwu, Wen-Mei},
title = {Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects},
year = {2019},
isbn = {9781450362399},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3297663.3310299},
doi = {10.1145/3297663.3310299},
booktitle = {Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering},
pages = {209–218},
numpages = {10},
keywords = {nvlink, numa, power, x86, benchmarking, cuda, gpu},
location = {Mumbai, India},
series = {ICPE ’19}
}
@article{DBLP:journals/corr/abs-1809-08311,
  author    = {Carl Pearson and
               Abdul Dakkak and
               Cheng Li and
               Sarah Hashash and
               Jinjun Xiong and
               Wen{-}Mei W. Hwu},
  title     = {{SCOPE:} {C3SR} Systems Characterization and Benchmarking Framework},
  journal   = {CoRR},
  volume    = {abs/1809.08311},
  year      = {2018},
  url       = {http://arxiv.org/abs/1809.08311},
  archivePrefix = {arXiv},
  eprint    = {1809.08311},
  timestamp = {Fri, 05 Oct 2018 11:34:52 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1809-08311.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{pearson2018numa,
  title={NUMA-aware data-transfer measurements for power/NVLink multi-GPU systems},
  author={Pearson, Carl and Chung, I-Hsin and Sura, Zehra and Hwu, Wen-Mei and Xiong, Jinjun},
  booktitle={International Conference on High Performance Computing},
  pages={448--454},
  year={2018},
  organization={Springer}
}

Acks

Thanks to Sarah Hashash (MIT), I-Hsin Chung (IBM T. J. Watson), and Jinjun Xiong (IBM T. J. Watson) for their support, guidance, and contributions.

Built with ❤️ using

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.