Giter Club home page Giter Club logo

cinnamon's Introduction


CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms

An MLIR Based Compiler Framework for Emerging Architectures
Paper Link»

About The Project

Emerging compute-near-memory (CNM) and compute-in-memory (CIM) architectures have gained considerable attention in recent years, with some now commercially available. However, their programmability remains a significant challenge. These devices typically require very low-level code, directly using device-specific APIs, which restricts their usage to device experts. With Cinnamon, we are taking a step closer to bridging the substantial abstraction gap in application representation between what these architectures expect and what users typically write. The framework is based on MLIR, providing domain-specific and device-specific hierarchical abstractions. This repository includes the sources for these abstractions and the necessary transformations and conversion passes to progressively lower them. It emphasizes conversions to illustrate various intermediate representations (IRs) and transformations to demonstrate certain optimizations.

Getting Started

This is an example of how you can build the framework locally.

Prerequisites

CINM depends on a patched version of LLVM 18.1.6. Additionally, a number of software packages are required to build it, like CMake.

Download and Build

The repository contains a script, build.sh that installs all needed dependencies and builds the sources.

  • Clone the repo
    git clone https://github.com/tud-ccc/Cinnamon.git
  • Build the sources
    cd Cinnamon
    chmod +x build.sh
    ./build.sh

Usage

All benchmarks at the cinm abstraction are in this repository under cinnamon/benchmarks/. The compile-benches.sh script compiles all the benchmarks using the Cinnamon flow. The generated code and the intermediate IRs for each bench can be found undercinnamon/benchmarks/generated/.

chmod +x compile-benches.sh
./compile-benches.sh

The user can also try running individual benchmarks by manually trying individual conversions. The benchmark files have a comment at the top giving the command used to lower them to the upmem IR.

Roadmap

  • cinm, cnm and cim abstractions and their necessary conversions
  • The upmem abstraction, its conversions and connection to the target
  • The tiling transformation
  • PyTorch Front-end
  • The xbar abstraction, conversions and transformatons
    • Associated conversions and transformations
    • Establshing the backend connection

See the open issues for a full list of proposed features (and known issues).

Contributing

If you have a suggestion, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". If you want to contribute in any way , that is also greatly appreciated.

License

Distributed under the BSD 2-claues License. See LICENSE.txt for more information.

Contributors

cinnamon's People

Contributors

oowekyala avatar h4midf avatar ge0mk avatar felix-rm avatar asifuet avatar

Stargazers

Jiale Li avatar Nesrine khouzami avatar

Watchers

 avatar  avatar  avatar

Forkers

ge0mk felix-rm

cinnamon's Issues

Hbmpim dialect

Structure of the system:
Although the simulator is configurable, we will stick to Samsung's hbmpim paper parameters. With that said, in a single call to the device, a maximum of 16 banks, 64 channels, 1 rank, and 8 GRF can be configured.

cnm.workgroup { cnm.physical_dims = ["banks", "channels", "ranks", "grf"] } : !cnm.workgroup<16x64x1x8>

At the moment, the simulator implements different APIs for writing the data to the device for different operations. For example, preloadNoReplacement and preloadGemv function calls are used for elementWiseOperations and GEMV kernels. Because of this, the cnm.scatter is not directly lowered in the conversion pass; instead, depending on the operation in the launch block, the respective function is created.
Also, for all of this to work, we should limit the launch block to only a single operation.

Flatten repo layout

We wrote that in the artifact description but it's annoying that the actual sources are in a cinnamon subdir. We should move the contents of that directory to the root.

Codegen should be able to generate one DPU binary per upmem.launch

Currently the UPMEM->DPU CPP pass generates one C file with just one main function.
We need to generate one binary for each upmem.launch though. The thread count is launch-specific, needs to be passed as compiler option when compiling DPU binary.

Draft design:

  • Outline all DPU kernels into single module
  • Have upmem->DPU CPP generate one function for each upmem.launch
  • Have a main that conditionally compiles exactly one of those functions
  • The conditional compilation variables along with the thread count are written in a header comment on line 1.
  • This comment is parsed by a script that then invokes the DPU compiler and generates as many binaries as there are upmem.launch blocks.
  • The binary path for each launch must be passed as argument to upmemrt_dpu_alloc.

Current TODO items

High priority

  • #4
  • Linkage issue with memrefCopy
  • Checking that VA runs (@h4midf)
  • Add histogram op (@h4midf)
    • In CINM
    • In CINM->CNM
  • #7
  • #6
  • Write a README
  • Remove artifact-specific stuff

Cost model (deadline 04.08)

Currently we have cinm.compute with attributes for workgroup shape and DPU memory size.
We assume this specification is correct, that is, the lowering pass cannot change them.
They should be obtained through the cost model.

  • Implement a simple Samsung dialect
  • Implement a pass that annotates Samsung and Upmem kernels with their time estimation
    • Implement the upmem cost estimator in C++

Lower priority

  • Add verifier for shape of scatter map in UPMEM
  • Fix the GPU lowering, was probably broken by recent changes to CNM

Optimization

  • Hoist buffer alloc and free outside of loops
  • Malloc avoidance
    • Avoid tensor reshapes that do a copy (Especially for VA that's a problem)
    • Unify buffers across loop iterations
  • Affine map simplification with dimension sizes

Make sure that input memref to `upmem.scatter` is legal, that is, all scattered elements must be contiguous in memory

This is in part enforced by the CNM scatter map shape. The scatter map mandates that the last dimensions of the input have the same shape as the buffer. However since then, CNM scatter has been made bufferizable. If the memref has a custom layout or strides, the elements may not be contiguous anyway.

For instance

%3 = scf.for %arg6 = %c0 to %c1024 step %c32 iter_args(%arg7 = %alloc_0) -> (memref<1x128xi32>) {
          %subview_2 = memref.subview %arg0[%arg2, %arg6] [1, 32] [1, 1] : memref<1x1024xi32> to memref<1x32xi32, strided<[1024, 1], offset: ?>>
          %4 = upmem.alloc_dpus : !upmem.hierarchy<1x128x1>
          upmem.scatter %subview_2[132, 32, #map] onto %4 : memref<1x32xi32, strided<[1024, 1], offset: ?>> onto !upmem.hierarchy<1x128x1>

is this legal?
-> yes, because the stride in the second dimension is 1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.