google / memcpy-gemm Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
# memcpy-gemm memcpy-gemm provides libraries and binaries for testing the communication and compute performance of GPUs. The primary binary generated for this purpose is the memcpy_gemm program. memcpy_gemm supports testing communication speed between any GPU and it's host or peers, and can test the compute performance in half, single, or double precision for matrix multiplication. ## Requirements: memcpy-gemm requires the following tools for installation * bazel * autoconf Bazel relies on the following libraries for execution. When building, bazel will pull the versions specified in the WORKSPACE file from internet sources. * abseil * glog * googletest * libnuma * half * re2 Currently, libnuma depends on autoconf for configuration, which we wrap in bazel at build time. memcpy-gemm also requires a CUDA installation. The default location is /usr/local/cuda. To specify a different directory, export a "CUDA_PATH" environmental variable. Note that bazel will create a symlink to this path as "cuda", which is how cuda headers are referenced in the code. memcpy-gemm is compatibile with CUDA >= 9.0, and makes no guarantees about long-term support for older CUDA versions going forward. A few CUDA sources in memcpy-gemm need to be compiled with NVCC. If CUDA_PATH is correctly set (or the default location is correct), NVCC will be found and invoked automatically. The NVCC compiler has limits on which host compilers work with it - so you may need to modify the bazel host compiler. The quickest way to do this is via the CC and CXX environmental variables. ## Building: With the proper dependencies installed, building the memcpy_gemm binary should be as simple as: `bazel build :memcpy_gemm` and the memcpy_gemm binary is built in to bazel-bin. If Cuda and/or host compilers need to be configured, a build invocation may look something like: ``` # In this example, we have a custom CUDA location, and we need to specify # GCC 9, because the default gcc (10 in this example) doesn't work with CUDA 11 # NVCC CUDA_PATH=/usr/local/cuda-11.0 CC=gcc-9 CXX=g++-9 bazel build :memcpy_gemm ``` To run tests, "bazel test :memcpy_gemm_test" or "bazel test :all". Note that the memcpy_gemm_test assumes that one GPU is available, but makes no assumptions about its performance characteristics. ## Creating a static build: By default, when built on most linux systems memcpy-gemm dynamically links to libgcc, libstdc++, and libc. While some dynamic dependencies to libc are unavoidable, memcpy-gemm will link statically to libstdc++ and libgcc if compiled in the static configuration: `bazel build --config=static :memcpy_gemm` This mode is useful for sidestepping dependencies when building for cross-compilation. ### Configuring static build memcpy-gemm uses it's own description of the C++ toolchain for static builds. On some linux systems, the toolchain include path to the standard c++ library may be incorrectly configured. When building in static mode, this can lead to errors such as: > this rule is missing dependency declarations for the following files included by (some library) > '/usr/lib/....some header' > '/usr/lib/....other header' In the short term, this issue can be fixed by modifying the C++ toolchain descriptor file. Open /(path/to/memcpy-gemm/source)/toolchain/cc_toolchain_config.bzl. Search for the cxx_bultin_include_directories variable, and add the path to your headers to the list. ## Running: To see the list of options of memcpy-gemm, type "/path/to/memcpy_gemm --help". ### Supported compute types memcpy-gemm supports a variety of input/output/compute precisions. The specific supported precision depends on the GPU architecture and CUDA version. With CUDA 9 or later, the following data type combinations are allowed (cc=compute capability of device). | `input_precision` | `output_precision` | `compute_precision` | `restrictions` | `Notes` | ----------------- | ------------------ | ------------------- | -------------- | ------- | single | single | single | none | same as fp_precision = 'single' | double | double | double | none | same as fp_precision = 'double' | half | half | half | cc >= 6.0 | same as fp_precision = 'half' | half | single | single | cc >= 6.0 | commonly called 'mixed precision' | int8 | int32 | int32 | cc >= 7.0 | | int8 | single | single | cc >= 7.0 | On cc >= 7.5, can use tensor cores. ### Memory Usage memcpy-GEMM allocates CPU and GPU memory for both memcpy and GEMM operations. For memcpy tests, memcpy-gemm allocates a buffer on both the source and destination devices (be it a CPU or GPU), with the size equal to the number of kilobytes specified by the buffer_size_KiB flag. Buffers are not reused between flows, so (for example) with flows 'c0-g0-a0 g0-g1-a1', GPU0 will allocate 2 buffers, one for each flow. GEMM calls allocate two input arrays and one ouput array on each device specified in the --gpus flag, if the --gemm option is set to true. The total number of bytes allocated is then: bytes = (dim1_matrix1 * dim2_matrix1 * sizeof(input precision)) + \ (dim1_matrix2 * dim2_matrix2 * sizeof(input precision)) + \ (dim1_output_matrix * dim2_ouput_matrix * sizeof(output precision)) If M=N=K, using N as the dimension, this simplifies to: bytes = 2 * (N^2 * sizeof(input precision)) + N^2 * sizeof(output precision) Supported sizes of precisions are: int8 - 1 byte half - 2 bytes bf16 - 2 bytes single - 4 bytes int32 - 4 bytes double - 8 bytes In addition, expect a small GPU memory overhead from the cublas environment initializations.
A lot of functionality was not covered anyway, and we have no guarantees of backwards compatibility.
Hi Kevin, I ran in to a compile error below. Any idea what could be wrong or missing?
#bazel build :memcpy_gemm
INFO: Analyzed target //:memcpy_gemm (38 packages loaded, 1388 targets configured).
INFO: Found 1 target...
ERROR: missing input file '@libnuma//:config.h'
ERROR: /root/.cache/bazel/_bazel_root/0ba465a0ccf81b4aa04a864fc479aa2e/external/libnuma/BUILD:6:1: @libnuma//:numa: missing input file '@libnuma//:config.h'
Target //:memcpy_gemm failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /root/.cache/bazel/_bazel_root/0ba465a0ccf81b4aa04a864fc479aa2e/external/libnuma/BUILD:6:1 1 input file(s) do not exist
INFO: Elapsed time: 0.605s, Critical Path: 0.04s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
There is a lot of extra code with template specializations of CudaCublasInterface. Once CUDA 7 and 8 support is removed, we can hide the data types passed to cublasGemmEx through GetCudaComputeType and won't need the specializations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.