Comments (11)
This table outlines the features needed for the benchmark applications:
Benchmarks | Nested Loops | Reductions | Tiling | Blocking Operations | Index Addition | Filter | while loops |
if conditions |
Bitwise operations | Indirection | unroll |
Banking | Pipelining | Queues | LUTs | Resource Bind | Recursion |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gemm/ncubed |
✅ | ✅ | - | - | - | ✅ | ✅ | ✅ | ✅ | ||||||||
spatial/fir |
- | - | |||||||||||||||
gemm/blocked |
✅ | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | ✅ | ||||||||
stencil/stencil2d |
✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||
stencil/stencil3d |
✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||
fft/strided |
✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ||||||||||
fft/transpose |
✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||
spmv/ellpack |
✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||
spmv/crs |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||
kmp/kmp |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||
sort/merge |
✅ | ||||||||||||||||
sort/radix |
✅ | ||||||||||||||||
bfs/bulk |
✅ | ||||||||||||||||
bfs/queue |
✅ | ||||||||||||||||
spatial/kmeans |
|||||||||||||||||
spatial/gda |
|||||||||||||||||
spatial/bs |
|||||||||||||||||
spatial/pagerank |
|||||||||||||||||
spatial/sw |
|||||||||||||||||
spatial/tq6 |
|||||||||||||||||
md/knn |
|||||||||||||||||
md/grid |
|||||||||||||||||
backprop/backprop |
|||||||||||||||||
nw/nw |
|||||||||||||||||
aes/aes |
|||||||||||||||||
viterbi/viterbi |
from dahlia.
Use #23
from dahlia.
gemm ncubed
- Remove library inclusion
#include "apcint.h"
It is needed by the host file too. So the header file seems a better place for it. - Local float variables? The fix Rachit mentioned works for this.
- reduction operation on
temp
. Reductions on local variables don't throw an error right now. It should make the distinction once reduction operators are implemented. - should get an error for multi-write due to unrolling multiwrite example as discussed during the 1/31 meeting.
from dahlia.
Hi! I don't have a strong feeling about whether MachSuite goes in #23 or its own issue (here). But either way, let's list all the benchmarks. This can help you prioritize the ones to work on next and keep a global view of how far along you are.
For the list of "gaps" indicating what we need before we can fully implement the benchmark, let's try to be as specific as possible and, when it's available, link to the existing issue that tracks that problem. For example, does "Local float variables?" mean that it's impossible to declare local variables with type float
? If so, that probably deserves its own issue (or at least a full explanation here).
from dahlia.
Yeah, I'd prefer specific issues that block benchmarks from being written.
from dahlia.
Hi, so Rachit prefers to have specific issues listed in #23. I'll repurpose this issue to keep track of MachSuite benchmark implementation.
from dahlia.
Language Features Required for MachSuite Apps
This post lists language features required to write each MachSuite app in Fuse.
gemm/ncubed
("Naive,O(n^3)
algorithm for dense matrix multiplication."):- nested loops
- reductions
gemm/blocked
("A blocked version of matrix multiplication, with better locality."):- the features needed for
gemm/ncubed
- tiling (requires index addition to access arrays; therefore, views are needed)
- the features needed for
fft/strided
("Recursive formulation of the Fast Fourier Transform."):- the features needed for
gemm/ncubed
- exponential (T: would you mind clarifying what this is @sa2257 ?)
(S: The app uses a complex for loop to avoid exponentials. Uncertain whether that's the best strategy for us. Maybe LUTs can be used to hardcode exponentials) - serialization
- bitwise operations
- the C code has more complex loops than Fuse supports; for example:
for(span=FFT_SIZE>>1; span; span>>=1, log++) { ... }
- the features needed for
stencil/stencil2d
("A two-dimensional stencil computation, using a 9-point square stencil."):- the features needed for
gemm/ncubed
- index addition (views needed?)
- mismatching matrix multiply (filtering)
- the features needed for
stencil/stencil3d
("A three-dimensional stencil computation, using a 7-point von Neumann stencil."):- the features needed for
stencil/stencil2d
- the features needed for
spmv/ellpack
("Sparse matrix-vector multiplication, using fixed-size neighbor lists."):- the features needed for
gemm/ncubed
- indirection (T: @sa2257, I'm not sure what this is referring to):
(S:si := val[j] * vec[cols[j]];
inspmv.fuse
line 22
, Do we support reasoning about this? Also, I haven't yet reasoned out whether we can find parallelism in this.):
- the features needed for
spmv/crs
("Sparse matrix-vector multiplication, using variable-length neighbor lists.")- the features needed for
spmv/ellpack
- index addition
- the features needed for
kmp
("The Knuth-Morris-Pratt string matching algorithm."):stencil/stencil2d
- while loops
- if conditions
fft/transpose
("A two-level FFT optimized for a small, fixed-size butterfly."):- the features needed for
fft/strided
- LUTs
- the features needed for
sort/merge
("The mergesort algorithm, on an integer array."):- the features needed for
stencil/stencil2d
- if methods
- a method of serialization
- recursion
- the features needed for
sort/radix
("Sorts an integer array by comparing 4-bits blocks at a time."):- the features needed for
sort/merge
- bitwise operation support
- the features needed for
bfs/bulk
("Data-oriented version of breadth-first search."):sort/merge
- some kind of "and" type like a
struct
; need a good way to represent nodes of a graph
bfs/queue
("The “expanding-horizon” version of breadth-first search."):- the features needed for
bfs/bulk
- queues
- the features needed for
Apps left:
- md
- backprop
- nw
- aes
- viterbi
from dahlia.
Optimizations Needed
gemm/ncubed - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline
gemm/blocked - resource allocation (multiplier), memory allocation, array partitioning, loop unroll with tiling, pipeline
stencil/stencil2d - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline
stencil/stencil3d - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline
spmv/ellpack - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline
spmv/crs - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline
kmp - memory allocation, array partitioning, loop unroll, pipeline
bfs/bulk - memory allocation, array partitioning, loop unroll, pipeline
bfs/queue - memory allocation, array partitioning, loop unroll, pipeline
fft/strided - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline
fft/transpose - allocation (multiplier and adder), memory allocation, array partitioning, loop unroll, pipeline
sort/merge - memory allocation, array partitioning, loop unroll, pipeline
sort/radix - memory allocation, array partitioning, loop unroll, pipeline
from dahlia.
Cool! Maybe a big table would be a good way to represent this information? The rows could be benchmarks, and the columns could be features. Then the cell could contain a checkmark if the benchmark needs that feature. (There could be two groups of columns: expressiveness/features and optimizations.) This way, we could easily see what will affect the most benchmarks.
Some things where I could use expanded definitions:
- "Memory allocation": Do the benchmarks call
malloc
? If so, do they really need to do this, or is it just a convenience? - "Resource allocation (multiplier)": What does this look like? What's the effect on the generated hardware?
- "Serialization": Is this like data structure serialization, or like preventing parallelism?
- "LUTs": What do these look like in the code? (Presumably you don't mean FPGA LUTs?)
- "mismatching matmul (filtering) views": This could use a little more explanation—is this within scope for our current thoughts on views, or beyond them?
- "queues": Are these memories, accessed in a special way? Do they also do synchronization?
from dahlia.
Okay. Sounds good.
- Memory and resource allocation means just binding a resource to a particular array/ function. Examples in the user guide are very ambiguous. Still trying to tease out how to use them.
- Serialization, I simply mean some way to ensure a certain block of logic happen strictly after another block. Blocking statements.
- Some data structure that will be already be populated at the time of execution. For example, to get sine and cosine, it is easier to store them than implement a function to generate them.
- I'm not entirely sure if it's within the scope, for instance to run a 3x1 filter on a 32 element array, inner
unroll
3 should allow 32 element array access if it's banked by 3 (mismatch with 32) or larger. (we can doctor the array size for simplicity) - I need to understand how queues get implemented in HLS. HLS allows FIFOs. Maybe it'll be a simple use of that. But I'm not clear how it gets implemented in MachSuite.
from dahlia.
I assume a combination of https://github.com/cucapra/fuse-benchmarks/issues/74 and direct issues are being used to track this now. @sa2257 reopen if you still need this.
from dahlia.
Related Issues (20)
- Add `@bound` attribute for loops with statically known bounds. HOT 1
- [Calyx] Don't generate conflicting assignments when lowering unrolled loops
- `split` views not lowered correctly
- [Calyx] Generate non-combinational reads for memories HOT 2
- Support for Constants and Dynamic Loops HOT 4
- Typechecker isSubtype does not handle alias type
- Typechecker Issue HOT 3
- Split views bypass typechecker and allow incorrect loop unrolling
- Parse error in let expression for bitwidth 32 and above
- Test has incorrect lowering HOT 3
- Parse issue for hexadecimal literals in cast expressions
- Issue in C++ backend when passing local array to function
- Question about using functions in unrolled loop HOT 1
- Scala 3 Migration HOT 1
- Possible regression: Incorrect Calyx IR generated for seq_mem_d1 HOT 6
- Idea to support for Records in the Calyx Backend HOT 4
- Dahlia's documentation publishing flow
- Scala 3 Migration Follow up: Context Parameters HOT 2
- Calyx back-end issues HOT 2
- About RTL interop with Dahlia via Calyx HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dahlia.