Giter Club home page Giter Club logo

triton-puzzles's Introduction

Triton Puzzles

w/ Tejas Ramesh and Keren Zhou based on Triton-Viz

Open In Colab

Programming for accelerators such as GPUs is critical for modern AI systems. This often means programming directly in proprietary low-level languages such as CUDA. Triton is an alternative open-source language that allows you to code at a higher-level and compile to accelerators like GPU.

Coding for Triton is very similar to Numpy and PyTorch in both syntax and semantics. However, as a lower-level language there are a lot of details that you need to keep track of. In particular, one area that learners have trouble with is memory loading and storage which is critical for speed on low-level devices.

This set is puzzles is meant to teach you how to use Triton from first principles in an interactive fashion. You will start with trivial examples and build your way up to real algorithms like Flash Attention and Quantized neural networks. These puzzles do not need to run on GPU since they use a Triton interpreter.

Discord: https://discord.gg/cudamode #triton-puzzles

image

If you are into this kind of thing, this is 7th in a series of these puzzles.

triton-puzzles's People

Contributors

jokeren avatar srush avatar xffxff avatar zhaoyuecheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

triton-puzzles's Issues

Index Mixup

I think you mixed up the indexes in question 3/4. Your code actually does

$$z_{j,i} = x_i + y_j\text{ for } i = 1\ldots B_0,\ j = 1\ldots B_1$$

instead of

$$z_{i, j} = x_i + y_j\text{ for } i = 1\ldots B_0,\ j = 1\ldots B_1$$

j and I are switched on z, I think.

You can see it when computing e.g.

add_vec_spec(torch.tensor([1,2,3]), torch.tensor([10,20,30]))

which returns:

tensor([[11, 12, 13],
        [21, 22, 23],
        [31, 32, 33]])

Anyways: Thanks for these puzzles :)

Edit: the same mixup seems to be in Q 5

Running in Google Colab Free Tier may trigger warnings and disconnects

I have received the following warning twice when running in Google Colab free tier, and am wary of continuing to run the Triton Puzzles notebook because of it: (I don't want to get banned)

2024-04-20-colab-warning-triton

I have not modified the notebook in any way (other than attempt to solve the puzzles, of course).

From what I can gather, this may be some false positive due to the use of Triton-Viz and/or HTML output by the notebook.
Specifically, in the Colab FAQ it's stated that these activities are disallowed in the free tier:

bypassing the notebook UI to interact primarily via a web UI

I suspect that Triton-Viz and/or HTML output by the notebook triggers this false positive (and thus the warning), although for me, this has happened sporadically - only twice. (I have finished 8 of 12 puzzles, in case that matters)

Doing a bit more digging, it appears that Google cracked down on this when Stable Diffusion Web UI usage on the free tier skyrocketed, causing usage concerns. See the following for more details:

  1. https://www.reddit.com/r/StableDiffusion/comments/12t8tc7/is_colab_going_to_start_banning_people_who_use_it/
  2. https://www.reddit.com/r/StableDiffusion/comments/17cylsx/automatic1111s_colab_runtime_disconnected_message/

There's a comment in there from a Colab PM who states:

just the webui bit is getting warned - use the models in notebooks to your heart's content. stability is awesome and I really like them.

we prioritize interactive notebook compute on the free tier, and the webUI piece grew really big fast.

This suggests that Web UIs may be the cause or factor here. I don't know enough about Triton, but I doubt the usage of Triton here would be considered the "disallowed" code.

I'm wondering if anyone else has experienced this? (The obvious mitigation would be just to purchase a plan or some credits)

Thanks,

Peter

[QST] Triton MLIR

@srush

Always appreciate your wonderful OSS educational contributions!

I'm relatively familiar with CUDA and triton but less so with machine learning compilers and am interested in getting into the weeds of triton's compilation pipeline.

I've come across a few resources for learning MLIR as well as related projects such as TVM (which has a comprehensive set of tutorials / learning materials spearheaded by Tianqi Chen of CMU), but have yet to bridge the gap from basic MLIR to something on the scale of triton.

The overarching motivation -- other than the fact that ML compilers are super-interesting :) -- is that in a world of increased demand for ML training / inference but limited GPU (NVIDIA) supply, the ability to write code that is backend-agnostic is evermore important.

A few questions:

  • Are you aware of any resources for learning MLIR incrementally, ideally building from basics to something like a toy triton, and more ambitiously, understanding enough of the triton backend to be able to contribute new optimization passes?
  • Is this something you're interested in and possibly collaborating on?

I'd be willing to do as much of the heavy lifting as needed:

  • I'd envision a step by step walkthrough of each of triton tutorials, starting with vec-add.
  • The goal would be to understand how each pass of the compilation pipeline translates high-level python to performant device code.
  • Something that pulls apart each component of the C++ MLIR pipeline and provides greater visibility -- and hackability -- than simply observing the output of MLIR_ENABLE_DUMP.

cc @Jokeren

viz=False does not work

I don't really understand the visualizations tbh, so I wanted to turn them off, but viz=False, leads to the following error:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
[<ipython-input-44-e13ba08403d2>](https://ucrqhfq45wo-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240403-060136_RC00_621473488#) in <cell line: 28>()
     26     tl.store(z_ptr + n0_range + n1_range * N0, z, mask)
     27 
---> 28 test(add_vec_block_kernel, add_vec_block_spec, nelem={"N0": 100, "N1": 90}, viz=False)

[<ipython-input-3-91403af2aaed>](https://ucrqhfq45wo-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240403-060136_RC00_621473488#) in test(puzzle, puzzle_spec, nelem, B, viz)
     40     if viz:
     41         failures = triton_viz.launch()
---> 42     if not match or failures:
     43         print("Invalid Access:", failures)
     44         print("Yours:", z)

UnboundLocalError: local variable 'failures' referenced before assignment

changing the test code a bit to something like this in the relevant lines worked for me:

failures=None
if viz:
    failures = triton_viz.launch()
if not match or failures:
    if failures:
        print("Invalid Access:", failures)

Puzzle 6 - shape annotation for x

For "Puzzle 6: Fused Outer Multiplication - Backwards", x appears to have the same meaning as in the previous problem.
However, it is annotated as having shape [90, 100] instead of [100]. Is this a typo?

question about long softmax

I solve the long softmax puzzels, but I have to store the intermediate results to z_ptr, which may cause unnecessary Memory I/O.

Essentially, I would like to know if there's a solution to create temporary array in shared memory and store intermediate results there in Triton?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.