Giter Club home page Giter Club logo

drt's Introduction

DRT

Simulators for dynamic reflexive tiling and several accelerators.

Setup:

  1. In src/makefile set the variable MKLROOT to your MKL library location

  2. src/makefile contains the various targets. The relevant ones are:

    • SpMSpM_ExTensor: executable for the ExTensor simulation
    • SpMSpM_TACTile_twoInp: executable for the TACTile simulation
      • has options for static tiling and DRT tiling
    • SpMSpM_OuterSpace: executable for the idealized OuterSPACE untiled baseline
    • SpMSpM_OuterSpace_drt: executable for the idealized OuterSPACE baseline + DRT (with SUC options)
    • SpMSpM_MatRaptor: executable for the idealized MatRaptor untiled baseline
    • SpMSpM_MatRaptor_drt: executable for the idealized MatRaptor baseline + DRT (with SUC options)
  3. To run the TACTile simulation (this assumes A is an IxJ matrix, B is a JxK matrix, and C is an IxK matrix):

    src/SpMSpM_TACTile_twoInp -inp1=<path to first matrix> -inp2=<path to second matrix> 
     --tiledim=<micro tile dimension> 
     --staticdist=<rr|nnz|oracle>  # static distributor for the PEs 
     --intersect=<skip|parbi>  #skip --> ExTensor skip-based intersection, parbi --> TACTile cfi/parallel intersection unit \
     --tiling=<static|dynamic> 
     --aperc=0.05  # LLB percentage assigned for tensor a
     --bperc=0.35  # LLB percentage assigned for tensor b 
     --operc=0.6   # LLB percentage assigned for tensor o 
     --llbsize=30  # LLB total size (in MB)
     --constreuse=128 # # of micro tiles in a macro tile along the K dimension 
     --topbw=68.25   # Top DOT (DRAM) bandwidth in GB/s 
     --middlebw=2048 # Middle DOT NoC Bandwidth in GB/s 
     --itop= # num. of microtiles in a macrotile along the I dimension for SUC tiling 
     --jtop= # num. of microtiles in a macrotile along the J dimension for SUC tiling 
     --ktop= # num. of microtiles in a macrotile along the K dimension for SUC tiling
    
    • An example command run:
      • src/SpMSpM_TACTile_twoInp --inp1=./data/test_amazon0302.mtx --inp2=./data/test_amazon0302.mtx --tiledim=32 --staticdist=rr --intersect=parbi --tiling=dynamic | tee out.txt
      • 30MB is the default LLB size
      • llb default partitioning: A = 5%, B=50%, C=45%
  4. To run the ExTensor simulation: - src/SpMSpM_ExTensor --inp1=./data/test_amazon0302.mtx --inp2=./data/test_amazon0302.mtx --tiledim=32 --staticdist=rr --tiling=static --itop=32 --jtop=32 --ktop-32 | tee out.txt

  5. If you are using slurm, you can use the python scripts in the run_... directories to launch similar runs as those found in the paper. Please replace the DATADIR, OUTDIR, and EXECDIR with your local file paths (to your dataset directory of mtx files, the output directory where you want results, and the directory of your executable, respectively). You can also use the slurmtest.py files to see example launch commands and configurations.


The simulator spits out text similar to the following (we only include relevant output below):

runtime: 0.002372, cycles: 2371708, busy_cycles: 110825158
Top DOT NoC:
 total_top: 0.159724 GBs, a_top: 53040696, b_top: 30309984, o_r_top: 0, o_w_top: 88151644
Middle DOT NoC:
 total_mid: 0.254072 GBs, a_mid: 53040696, b_mid: 134560180, o_r_mid: 0, o_w_mid: 85207216
...
A_csf: 30309984, B_csf: 30309984, O_csf: 88151644, O_COO: 62339776, bandwidth: 68.250000 GB/s
  • runtime: 0.002372, cycles: 2371708, busy_cycles: 110825158
    • runtime in seconds of the accelerator on that workload
    • cycles: # of cycles it would take the accelerator to execute
    • busy cycles: # of total cycles across all PEs (that is, the cycles of each PE added up)
  • total_top: total number of GB transferred to and from the DRAM (in GB)
    • a_top: total number of bytes transferred for the $A$ matrix (in DRAM)
    • b_top: total number of bytes transferred for the $B$ matrix (in DRAM)
    • o_r_top: total number of output tensor bytes read from the DRAM
    • o_w_top: total number of output tensor bytes written to the DRAM
  • total_mid: total number of GB transferred to and from the LLB (in GB)
    • a_mid: total number of bytes transferred for the $A$ matrix (in LLB)
    • b_mid: total number of bytes transferred for the $B$ matrix (in LLB)
    • o_r_mid: total number of bytes read in the LLB for the output tensor
    • o_w_mid: total number of bytes written in the LLB for the output tensor
  • A_csf, B_csf, O_csf, O_COO: the size, in bytes, of each of the tensors for that particular data format.
  • bandwidth: the assumed DRAM bandwidth in GB/s

src: This contains the source files for the simulator.

  • scheduler_7.cpp: the main code for ExTensor modeling
  • scheduler_8.cpp: the main code for TACTile modeling
  • scheduler_8_diagonal_drt.cpp: source code for a diagonal variant of DRT in TACTile
  • scheduler_9.cpp: idealized model of OuterSPACE (no tiling)
  • scheduler_9_drt.cpp: idealized model of OuterSPACE + DRT
  • llb_mem.cpp: tracking memory transfers to and form the LLB

drt's People

Contributors

malloc26 avatar cwfletcher avatar

Stargazers

Xiuhua Yang avatar

Watchers

 avatar  avatar

Forkers

ritvik1sharma

drt's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.