Giter Club home page Giter Club logo

coralgemm's Introduction

 ______                    __ _______
|      |.-----.----.---.-.|  |     __|.-----.--------.--------.
|   ---||  _  |   _|  _  ||  |    |  ||  -__|        |        |
|______||_____|__| |___._||__|_______||_____|__|__|__|__|__|__|

Matrix Multiply Stress Test

Prerequisites

sudo apt install rocm-dkms rocm-libs to install all prerequisites.

Installing

  • cd src
  • make

Running DGEMM

  • 16 GB devices (Radeon VII): ./gemm R_64F R_64F R_64F R_64F OP_N OP_T 8640 8640 8640 8640 8640 8640 9 300

  • 32 GB devices (MI60, MI100): ./gemm R_64F R_64F R_64F R_64F OP_N OP_T 8640 8640 8640 8640 8640 8640 18 300

Running SGEMM

  • 16 GB devices (Radeon VII): ./gemm R_32F R_32F R_32F R_32F OP_N OP_T 8640 8640 8640 8640 8640 8640 18 300

  • 32 GB devices (MI60, MI100): ./gemm R_32F R_32F R_32F R_32F OP_N OP_T 8640 8640 8640 8640 8640 8640 36 300

Command-Line Details

    ./gemm PRECISION_A
           PRECISION_B
           PRECISION_C
           COMPUTE_PRECISION
           OP_A
           OP_B
           M
           N
           K
           LDA
           LDB
           LDC
           BATCH_COUNT
           TIME_SPAN    runtime duration in seconds
           [batched]    run batched GEMM
           [strided]    run strided batched GEMM
           [ex]         use the Ex API
           [hostA]      A in host memory
           [hostB]      B in host memory
           [hostC]      C in host memory
           [coherentA]  if in host memory, A is coherent (not cached)
           [coherentB]  if in host memory, B is coherent (not cached)
           [coherentC]  if in host memory, C is coherent (not cached)
           [sharedA]    one A for all devices
           [sharedB]    one B for all devices
           [zeroBeta]   set beta to zero
           [testing]    perform a basic sanity check
           [times]      print time in microseconds in addition to GFLOPS
           [hostname]   print the hostname

When TIME_SPAN is set to 0, one warmup run is done, followed by one timing run, and printing of column labels is disabled.

Supported Precisions:

  • R_32F: float
  • R_64F: double
  • C_32F: float complex
  • C_64F: float double
  • R_8I: 8-bit int
  • R_32I: 32-bit int

Supported Ops:

  • OP_N: non-transposed
  • OP_T: transposed
  • OP_C: conjugate-transposed

Details

  • benchmarks hipblas?gemm[Batched|StridedBatched][Ex]
  • allocates BATCH_SIZE number of matrices A, B, and C
  • initializes with hipRAND (random uniform, 0.0 to 1.0)
  • calls hipBLAS and collects execution times using std::chrono
  • sets alpha to 2.71828 and beta to 3.14159
  • for hipblas?gemm[Ex] launches a sequence of calls and takes the median time
  • for hipblas?gemm[Strided]Batched[Ex] launches one call and takes the overall time
  • reports the corresponding GFLOPS
  • repeats until TIME_SPAN exceeded
  • executes simulteneously on all devices

If testing is set, a primitive sanity test is ran. Entries of A, B, and C are set to 1, and so are the factors alpha and beta. Then, after GEMM is ran, all entries of C are checked to contain k+1. Note that performance is usually much higher when using integer initialization then when using random data.

Help

Jakub Kurzak ([email protected])

coralgemm's People

Contributors

jakurzak avatar afanfa avatar eugenehiew avatar mkv14 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.