DRT

Simulators for dynamic reflexive tiling and several accelerators.

Setup:

In src/makefile set the variable MKLROOT to your MKL library location
src/makefile contains the various targets. The relevant ones are:
- SpMSpM_ExTensor: executable for the ExTensor simulation
- SpMSpM_TACTile_twoInp: executable for the TACTile simulation
  - has options for static tiling and DRT tiling
- SpMSpM_OuterSpace: executable for the idealized OuterSPACE untiled baseline
- SpMSpM_OuterSpace_drt: executable for the idealized OuterSPACE baseline + DRT (with SUC options)
- SpMSpM_MatRaptor: executable for the idealized MatRaptor untiled baseline
- SpMSpM_MatRaptor_drt: executable for the idealized MatRaptor baseline + DRT (with SUC options)

To run the TACTile simulation (this assumes A is an IxJ matrix, B is a JxK matrix, and C is an IxK matrix):

src/SpMSpM_TACTile_twoInp -inp1=<path to first matrix> -inp2=<path to second matrix> 
 --tiledim=<micro tile dimension> 
 --staticdist=<rr|nnz|oracle>  # static distributor for the PEs 
 --intersect=<skip|parbi>  #skip --> ExTensor skip-based intersection, parbi --> TACTile cfi/parallel intersection unit \
 --tiling=<static|dynamic> 
 --aperc=0.05  # LLB percentage assigned for tensor a
 --bperc=0.35  # LLB percentage assigned for tensor b 
 --operc=0.6   # LLB percentage assigned for tensor o 
 --llbsize=30  # LLB total size (in MB)
 --constreuse=128 # # of micro tiles in a macro tile along the K dimension 
 --topbw=68.25   # Top DOT (DRAM) bandwidth in GB/s 
 --middlebw=2048 # Middle DOT NoC Bandwidth in GB/s 
 --itop= # num. of microtiles in a macrotile along the I dimension for SUC tiling 
 --jtop= # num. of microtiles in a macrotile along the J dimension for SUC tiling 
 --ktop= # num. of microtiles in a macrotile along the K dimension for SUC tiling

An example command run:
- src/SpMSpM_TACTile_twoInp --inp1=./data/test_amazon0302.mtx --inp2=./data/test_amazon0302.mtx --tiledim=32 --staticdist=rr --intersect=parbi --tiling=dynamic | tee out.txt
- 30MB is the default LLB size
- llb default partitioning: A = 5%, B=50%, C=45%

To run the ExTensor simulation: - src/SpMSpM_ExTensor --inp1=./data/test_amazon0302.mtx --inp2=./data/test_amazon0302.mtx --tiledim=32 --staticdist=rr --tiling=static --itop=32 --jtop=32 --ktop-32 | tee out.txt
If you are using slurm, you can use the python scripts in the run_... directories to launch similar runs as those found in the paper. Please replace the DATADIR, OUTDIR, and EXECDIR with your local file paths (to your dataset directory of mtx files, the output directory where you want results, and the directory of your executable, respectively). You can also use the slurmtest.py files to see example launch commands and configurations.

The simulator spits out text similar to the following (we only include relevant output below):

runtime: 0.002372, cycles: 2371708, busy_cycles: 110825158
Top DOT NoC:
 total_top: 0.159724 GBs, a_top: 53040696, b_top: 30309984, o_r_top: 0, o_w_top: 88151644
Middle DOT NoC:
 total_mid: 0.254072 GBs, a_mid: 53040696, b_mid: 134560180, o_r_mid: 0, o_w_mid: 85207216
...
A_csf: 30309984, B_csf: 30309984, O_csf: 88151644, O_COO: 62339776, bandwidth: 68.250000 GB/s

runtime: 0.002372, cycles: 2371708, busy_cycles: 110825158
- runtime in seconds of the accelerator on that workload
- cycles: # of cycles it would take the accelerator to execute
- busy cycles: # of total cycles across all PEs (that is, the cycles of each PE added up)
total_top: total number of GB transferred to and from the DRAM (in GB)
- a_top: total number of bytes transferred for the $A$ matrix (in DRAM)
- b_top: total number of bytes transferred for the $B$ matrix (in DRAM)
- o_r_top: total number of output tensor bytes read from the DRAM
- o_w_top: total number of output tensor bytes written to the DRAM
total_mid: total number of GB transferred to and from the LLB (in GB)
- a_mid: total number of bytes transferred for the $A$ matrix (in LLB)
- b_mid: total number of bytes transferred for the $B$ matrix (in LLB)
- o_r_mid: total number of bytes read in the LLB for the output tensor
- o_w_mid: total number of bytes written in the LLB for the output tensor
A_csf, B_csf, O_csf, O_COO: the size, in bytes, of each of the tensors for that particular data format.
bandwidth: the assumed DRAM bandwidth in GB/s

src: This contains the source files for the simulator.

scheduler_7.cpp: the main code for ExTensor modeling
scheduler_8.cpp: the main code for TACTile modeling
scheduler_8_diagonal_drt.cpp: source code for a diagonal variant of DRT in TACTile
scheduler_9.cpp: idealized model of OuterSPACE (no tiling)
scheduler_9_drt.cpp: idealized model of OuterSPACE + DRT
llb_mem.cpp: tracking memory transfers to and form the LLB

fpsg-uiuc / drt Goto Github PK

drt's Introduction

DRT

drt's People

Contributors

Stargazers

Watchers

Forkers

drt's Issues

Am I the first one who give the star to this excellent proj?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent