When I develop a new operator, the time of backward always cost many time. If triton c

As I know, this project can do autodiff for LLVM. <a href="https://github.com/wsmoses/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support AutoDiff for one operator by use source-code-transform ? about triton HOT 8 CLOSED

triton-lang commented on July 22, 2024

Support AutoDiff for one operator by use source-code-transform ?

from triton.

Comments (8)

ptillet commented on July 22, 2024

That would actually be amazing. At the moment though, I have very little knowledge on how to auto-diff C-like code. Are you aware of any project that does something similar that I could take a look at out of curiosity?

Edit: I have found clad: https://github.com/vgvassilev/clad/. I think that such an autodiff for Triton would be outside the scope of Triton -- much like clad is outside of clang. Ultimately, Triton only intends to be a lightweight replacement for CUDA when writing DNN ops.

from triton.

FDInSky commented on July 22, 2024

As I know, this project can do autodiff for LLVM. https://github.com/wsmoses/Enzyme.
@ptillet Does it help?

from triton.

ptillet commented on July 22, 2024

Yes, this helps. However, I feel like this is a bit out of the scope of Triton, which aims just to be a replacement for CUDA. My intuition is that automatic differentiation of CUDA kernel could be a bit hard in the general case, i.e. when atomics are used or more generally when different kernel instances touch the same memory location. For example, it is not always the case that a single forward propagation kernel can be auto-differentiated into a single kernel.

from triton.

MarisaKirisame commented on July 22, 2024

https://dspace.mit.edu/handle/1721.1/122623 is a paper which does this for gpu.

from triton.

ludgerpaehler commented on July 22, 2024

@ptillet Enzyme maintainer here, we actually can do GPU these days - have been intrigued to see if Enzyme can handle Triton without adding new custom primitives for the adjoint generation. From having read the paper, and sifted over the code, Triton is essentially generating a reduced subset of LLVM IR?

Is there a switch inside of Triton to inspect the generated IR? Would be happy to give a prototype a go, but agree that it wouldn't necessarily make sense to really pull automatic differentiation inside of Triton itself. Would just be curious to see if Enzyme works on Triton-generated kernels.

from triton.

ptillet commented on July 22, 2024

@ludgerpaehler Thanks for reaching out :) I think my main worry isn't so much whether it can be done, but how efficiently it can be done. I've thought about the problem a bit, and I don't think something like matmul could be auto-differentiated into an optimal backward pass -- let alone flash attention. There would likely still be some utility to having a slow-ish backward pass though, as long as it's not 5x slower.

Triton does generate LLVM-IR (NVPTX) code eventually, but I am not sure about the performance of any approach that would try to auto-differentiate that code automatically, as the compiler has no way to know whether some shared memory (or global memory) values are used by different threads. One thing that could potentially work would be to auto-diff Triton-IR directly, but even there the existence of pointers in an SPMD model makes me think we'd have to at least double the amount of memory i/o (i.e., backward pass of pointers loads become atomic adds?) although we may retain optimal shared memory / tensor cores utilization. That still could have good enough performance to be useful.

I think long term, what would be absolutely bonkers would be something like Enzyme capable of operating on MLIR dialects (provided some interface, of course). Is this an idea that has been thrown around?

from triton.

ludgerpaehler commented on July 22, 2024

@ptillet I don't want to say too much and overpromise, but it is something we are looking at very closely and something which we are very keen to pursue. Are you going to be at the US LLVM Dev meeting in San Jose? Would be happy to discuss more in-depth in person (or via mail) :)

from triton.

ptillet commented on July 22, 2024

Sorry for the delay. Things have been very busy :) I won't be at the LLVM Dev meeting as it is around the time I'm traveling to France to see my family. Happy to talk more about it via mail or on a call; feel free to shoot me an email at [email protected]

from triton.

Support AutoDiff for one operator by use source-code-transform ? about triton HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent