Giter Club home page Giter Club logo

memcpy-gemm's Introduction

# memcpy-gemm

memcpy-gemm provides libraries and binaries for testing the communication
and compute performance of GPUs. The primary binary generated for this purpose
is the memcpy_gemm program. memcpy_gemm supports testing communication speed
between any GPU and it's host or peers, and can test the compute
performance in half, single, or double precision for matrix multiplication.

## Requirements:
memcpy-gemm requires the following tools for installation
* bazel
* autoconf

Bazel relies on the following libraries for execution. When building, bazel
will pull the versions specified in the WORKSPACE file from internet sources.
* abseil
* glog
* googletest
* libnuma
* half
* re2

Currently, libnuma depends on autoconf for configuration, which we wrap in bazel
at build time.

memcpy-gemm also requires a CUDA installation. The default location is
/usr/local/cuda. To specify a different directory, export a "CUDA_PATH"
environmental variable. Note that bazel will create a symlink to this path
as "cuda", which is how cuda headers are referenced in the code. memcpy-gemm
is compatibile with CUDA >= 9.0, and makes no guarantees about long-term support
for older CUDA versions going forward.

A few CUDA sources in memcpy-gemm need to be compiled with NVCC. If CUDA_PATH is
correctly set (or the default location is correct), NVCC will be found and
invoked automatically. The NVCC compiler has limits on which host compilers
work with it - so you may need to modify the bazel host compiler. The quickest
way to do this is via the CC and CXX environmental variables.

## Building:
With the proper dependencies installed, building the memcpy_gemm binary
should be as simple as:

`bazel build :memcpy_gemm`

and the memcpy_gemm binary is built in to bazel-bin.

If Cuda and/or host compilers need to be configured, a build invocation may
look something like:
```
# In this example, we have a custom CUDA location, and we need to specify
# GCC 9, because the default gcc (10 in this example) doesn't work with CUDA 11
# NVCC
CUDA_PATH=/usr/local/cuda-11.0 CC=gcc-9 CXX=g++-9 bazel build :memcpy_gemm
```

To run tests, "bazel test :memcpy_gemm_test" or "bazel test :all". Note that
the memcpy_gemm_test assumes that one GPU is available, but makes no assumptions
about its performance characteristics.

## Creating a static build:
By default, when built on most linux systems memcpy-gemm dynamically links to 
libgcc, libstdc++, and libc. While some dynamic dependencies to libc are unavoidable,
memcpy-gemm will link statically to libstdc++ and libgcc if compiled in the static configuration:

`bazel build --config=static :memcpy_gemm`

This mode is useful for sidestepping dependencies when building for cross-compilation.

### Configuring static build
memcpy-gemm uses it's own description of the C++ toolchain for static builds. On some linux systems, 
the toolchain include path to the standard c++ library may be incorrectly configured. When building in 
static mode, this can lead to errors such as:

> this rule is missing dependency declarations for the following files included by (some library)
> '/usr/lib/....some header'
> '/usr/lib/....other header'

In the short term, this issue can be fixed by modifying the C++ toolchain descriptor file. 
Open /(path/to/memcpy-gemm/source)/toolchain/cc_toolchain_config.bzl. Search for the 
cxx_bultin_include_directories variable, and add the path to your headers to the list.

## Running:
To see the list of options of memcpy-gemm, type "/path/to/memcpy_gemm --help".

### Supported compute types

memcpy-gemm supports a variety of input/output/compute precisions. The specific
supported precision depends on the GPU architecture and CUDA version. With CUDA
9 or later, the following data type combinations are allowed (cc=compute capability
of device).

| `input_precision` | `output_precision` | `compute_precision` | `restrictions` | `Notes`
| ----------------- | ------------------ | ------------------- | -------------- | -------
| single            | single             | single              | none           | same as fp_precision = 'single'
| double            | double             | double              | none           | same as fp_precision = 'double'
| half              | half               | half                | cc >= 6.0      | same as fp_precision = 'half'
| half              | single             | single              | cc >= 6.0      | commonly called 'mixed precision'
| int8              | int32              | int32               | cc >= 7.0      |
| int8              | single             | single              | cc >= 7.0      | On cc >= 7.5, can use tensor cores.

### Memory Usage

memcpy-GEMM allocates CPU and GPU memory for both memcpy and GEMM operations. For
memcpy tests, memcpy-gemm allocates a buffer on both the source and destination
devices (be it a CPU or GPU), with the size equal to the number of kilobytes
specified by the buffer_size_KiB flag. Buffers are not reused between flows, so
(for example) with flows 'c0-g0-a0 g0-g1-a1', GPU0 will allocate 2 buffers, one
for each flow.

GEMM calls allocate two input arrays and one ouput array on each device specified
in the --gpus flag, if the --gemm option is set to true. The total number of bytes
allocated is then:
bytes = (dim1_matrix1 * dim2_matrix1 * sizeof(input precision)) + \
        (dim1_matrix2 * dim2_matrix2 * sizeof(input precision)) + \
        (dim1_output_matrix * dim2_ouput_matrix * sizeof(output precision))
If M=N=K, using N as the dimension, this simplifies to:
bytes = 2 * (N^2 * sizeof(input precision)) + N^2 * sizeof(output precision)

Supported sizes of precisions are:
int8 - 1 byte
half - 2 bytes
bf16 - 2 bytes
single - 4 bytes
int32 - 4 bytes
double - 8 bytes

In addition, expect a small GPU memory overhead from the cublas environment initializations.

memcpy-gemm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

memcpy-gemm's Issues

Compile error

Hi Kevin, I ran in to a compile error below. Any idea what could be wrong or missing?

#bazel build :memcpy_gemm
INFO: Analyzed target //:memcpy_gemm (38 packages loaded, 1388 targets configured).
INFO: Found 1 target...
ERROR: missing input file '@libnuma//:config.h'
ERROR: /root/.cache/bazel/_bazel_root/0ba465a0ccf81b4aa04a864fc479aa2e/external/libnuma/BUILD:6:1: @libnuma//:numa: missing input file '@libnuma//:config.h'
Target //:memcpy_gemm failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /root/.cache/bazel/_bazel_root/0ba465a0ccf81b4aa04a864fc479aa2e/external/libnuma/BUILD:6:1 1 input file(s) do not exist
INFO: Elapsed time: 0.605s, Critical Path: 0.04s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

Refactor gemm_test_lib_internal

There is a lot of extra code with template specializations of CudaCublasInterface. Once CUDA 7 and 8 support is removed, we can hide the data types passed to cublasGemmEx through GetCudaComputeType and won't need the specializations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.