Giter Club home page Giter Club logo

matrix-matrix-multiply's Introduction

Algorithms for matrix matrix multiplication, dgemm

The algorithms are taken form the books:

  1. David A. Patterson, John L. Hennessy "Computer Organization and Design. The hardware software interface. RISK-V Edition",
  2. David A. Patterson, John L. Hennessy "Computer Organization and Design. The hardware software interface. MIPS Edition"

There are the following algorithms implemented:

  1. Basic, unoptimized, see src/basic.cpp
  2. Using AVX with 256-bit intrinsics, see src/avx256.cpp
  3. Using AVX with 512-bit intinsics, see src/avx512.cpp
  4. Using AVX with 512-bit intinsics with loop unrolling, see src/avx512_subword_parallel.cpp
  5. Basic, unoptimized with blocking (use blocks), see src/basic_blocked.cpp

How to build?

To build the system, execute the following commands:

  1. git clone https://github.com/romz-pl/matrix-matrix-multiply
  2. cd matrix-matrix-multiply
  3. mkdir build
  4. cd build
  5. cmake ..
  6. make
  7. ./src/dgemm

The command ./src/dgemm executes the programm.

Results

  1. For Core i7 CPU, with matrix size equal to 128, I obtained the following results averaged over 1000 randomly generated matrices:
         dgemm_basic:  elapsed-time=      1661
 dgemm_basic_blocked:  elapsed-time=      1260     speed-up=   1.31825
        dgemm_avx256:  elapsed-time=       443     speed-up=   3.74944
        dgemm_avx512:  elapsed-time=       233     speed-up=   7.12876
      dgemm_unrolled:  elapsed-time=       106     speed-up=   15.6698
       dgemm_blocked:  elapsed-time=       100     speed-up=     16.61
  1. For Core i7 CPU, with matrix size equal to 640, I obtained the following results averaged over 10 randomly generated matrices:
         dgemm_basic:  elapsed-time=    241958
 dgemm_basic_blocked:  elapsed-time=    162224     speed-up=   1.49151
        dgemm_avx256:  elapsed-time=     66246     speed-up=   3.65242
        dgemm_avx512:  elapsed-time=     35604     speed-up=   6.79581
      dgemm_unrolled:  elapsed-time=     16634     speed-up=    14.546
       dgemm_blocked:  elapsed-time=     12981     speed-up=   18.6394
  1. For Core i7 CPU, with matrix size equal to 1280, I obtained the following results averaged over 5 randomly generated matrices:
         dgemm_basic:  elapsed-time=   4592295
 dgemm_basic_blocked:  elapsed-time=   1626700     speed-up=   2.82307
        dgemm_avx256:  elapsed-time=   1227037     speed-up=   3.74259
        dgemm_avx512:  elapsed-time=    637091     speed-up=   7.20822
      dgemm_unrolled:  elapsed-time=    558080     speed-up=   8.22874
       dgemm_blocked:  elapsed-time=    181634     speed-up=   25.2832
  1. For Core i7 CPU, with matrix size equal to 2560, I obtained the following results for one randomly generated matrices:
         dgemm_basic:  elapsed-time=  62731813
 dgemm_basic_blocked:  elapsed-time=  16474759     speed-up=   3.80775
        dgemm_avx256:  elapsed-time=  17050012     speed-up=   3.67928
        dgemm_avx512:  elapsed-time=   9012450     speed-up=   6.96057
      dgemm_unrolled:  elapsed-time=   5958033     speed-up=   10.5289
       dgemm_blocked:  elapsed-time=   1837494     speed-up=   34.1399
  1. For Core i7 CPU, with matrix size equal to 5120, I obtained the following results for one randomly generated matrices:
        dgemm_basic:  elapsed-time=1154120417
 dgemm_basic_blocked:  elapsed-time= 137582063     speed-up=    8.3886
        dgemm_avx256:  elapsed-time= 297156247     speed-up=   3.88388
        dgemm_avx512:  elapsed-time= 144941094     speed-up=   7.96269
      dgemm_unrolled:  elapsed-time=  97428303     speed-up=   11.8458
       dgemm_blocked:  elapsed-time=  18558107     speed-up=   62.1896

matrix-matrix-multiply's People

Contributors

romz-pl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

sci-42ver hankbo

matrix-matrix-multiply's Issues

I use ryzen 4800h with no avx512 support, and try to change `n` in `check.cpp`, but weird failure.

After commenting all functions related with avx512, all work well,

But error thrown after adding the following modification (initially I use 4*4 then try 8*4,etc):

diff --git a/src/check.cpp b/src/check.cpp
index 5172492..93e5876 100644
--- a/src/check.cpp
+++ b/src/check.cpp
@@ -43,16 +43,19 @@ static double calc_abs_sum(const uint32_t n, const double* c, const double* q)
 void check()
 {
     constexpr uint32_t trial_no = 11;
-    constexpr uint32_t n = 32 * 4;
-    constexpr double eps = 1e-6;
+    /* why 32/16/8 *4 work , but 4*4 fails*/
+    // constexpr uint32_t n = 8 * 4;
+    constexpr uint32_t n = 4 * 4;
 
     std::vector< Dgemm > all_dgemm =
     {
         {dgemm_basic_blocked, "dgemm_basic_blocked"},
         {dgemm_avx256, "dgemm_avx256"},
-        {dgemm_avx512, "dgemm_avx512"},
-        {dgemm_unrolled, "dgemm_unrolled"},
-        {dgemm_blocked, "dgemm_blocked"},
+        //{dgemm_avx512, "dgemm_avx512"},
+        //{dgemm_unrolled, "dgemm_unrolled"},
+        //{dgemm_blocked, "dgemm_blocked"},
     };

and when running with n = 4 * 4; error thrown:

$ ~/matrix-matrix-multiply/build/src/dgemm
double free or corruption (!prev)
zsh: IOT instruction (core dumped)  ~/matrix-matrix-multiply/build/src/dgemm
$ coredumpctl debug
pwndbg> bt
#0  0x00007f9b22ea08ec in ?? () from /usr/lib/libc.so.6
#1  0x00007f9b22e51ea8 in raise () from /usr/lib/libc.so.6
#2  0x00007f9b22e3b53d in abort () from /usr/lib/libc.so.6
#3  0x00007f9b22e3c29e in ?? () from /usr/lib/libc.so.6
#4  0x00007f9b22eaa657 in ?? () from /usr/lib/libc.so.6
#5  0x00007f9b22eac7bc in ?? () from /usr/lib/libc.so.6
#6  0x00007f9b22eaee63 in free () from /usr/lib/libc.so.6
#7  0x000056328f5f9383 in check () at /home/czg/matrix-matrix-multiply/src/check.cpp:87
#8  0x000056328f5f8599 in main () at /home/czg/matrix-matrix-multiply/src/main.cpp:6
#9  0x00007f9b22e3c790 in ?? () from /usr/lib/libc.so.6
#10 0x00007f9b22e3c84a in __libc_start_main () from /usr/lib/libc.so.6
#11 0x000056328f5f85d5 in _start ()
pwndbg> up 7
   f 0   0x7f9b22ea08ec
   f 1   0x7f9b22e51ea8 raise+24
   f 2   0x7f9b22e3b53d abort+215
   f 3   0x7f9b22e3c29e
   f 4   0x7f9b22eaa657
   f 5   0x7f9b22eac7bc
   f 6   0x7f9b22eaee63 free+115
 ► f 7   0x56328f5f9383 check()+963
   f 8   0x56328f5f8599 main+9
   f 9   0x7f9b22e3c790
   f 10   0x7f9b22e3c84a __libc_start_main+138
   f 11   0x56328f5f85d5 _start+37
pwndbg> disassemble
...
   0x000056328f5f9378 <check()+952>:    48 89 ef                mov    rdi,rbp
   0x000056328f5f937b <check()+955>:    4c 89 f3                mov    rbx,r14
   0x000056328f5f937e <check()+958>:    e8 cd 05 00 00          call   0x56328f5f9950 <Mtx::~Mtx()>
=> 0x000056328f5f9383 <check()+963>:    4c 89 ff                mov    rdi,r15

I am not very familiar with cpp, not know why pass rbp to rdi as parameter to free will throw error.

after directly debugging with gdb, the program invoke two free with operator delete with str like "dgemm_basic_blocked" as the parameter, then I have checked the parameter to free which throw error, and found that the pointer is located in heap which should not throw error

$ gdb ~/matrix-matrix-multiply/build/src/dgemm -ex 'br free' -ex 'r' -ex 'c 2'
pwndbg> bt
#0  0x00007ffff7a65df0 in free () from /usr/lib/libc.so.6
#1  0x0000555555557383 in check () at /home/czg/matrix-matrix-multiply/src/check.cpp:87
#2  0x0000555555556599 in main () at /home/czg/matrix-matrix-multiply/src/main.cpp:6
pwndbg> up
pwndbg> disassemble
...
   0x0000555555557378 <check()+952>:    48 89 ef                mov    rdi,rbp
   0x000055555555737b <check()+955>:    4c 89 f3                mov    rbx,r14
   0x000055555555737e <check()+958>:    e8 cd 05 00 00          call   0x555555557950 <Mtx::~Mtx()>
=> 0x0000555555557383 <check()+963>:    4c 89 ff                mov    rdi,r15
pwndbg> disassemble 0x555555557950
Dump of assembler code for function Mtx::~Mtx():
   0x0000555555557950 <+0>:     mov    rdi,QWORD PTR [rdi]
   0x0000555555557953 <+3>:     jmp    0x555555556100 <free@plt>
pwndbg> x/g $rbp
0x7fffffffdea0: 0x000055555556e940
pwndbg> vmmap
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
             Start                End Perm     Size Offset File
    0x555555554000     0x555555556000 r--p     2000      0 /home/czg/matrix-matrix-multiply/build/src/dgemm
    0x555555556000     0x555555558000 r-xp     2000   2000 /home/czg/matrix-matrix-multiply/build/src/dgemm
    0x555555558000     0x555555559000 r--p     1000   4000 /home/czg/matrix-matrix-multiply/build/src/dgemm
    0x555555559000     0x55555555a000 r--p     1000   4000 /home/czg/matrix-matrix-multiply/build/src/dgemm
    0x55555555a000     0x55555555b000 rw-p     1000   5000 /home/czg/matrix-matrix-multiply/build/src/dgemm
    0x55555555b000     0x55555557c000 rw-p    21000      0 [heap]

as shown above [$rbp]=0x000055555556e940 which is between 0x55555555b000 and 0x55555557c000 (i.e. is in heap).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.