Comments (6)
I've reproduced this and deduced that we weren't enabling scoped alias analysis (and thus Enzyme, without that improved alias info, had to assume extra things cached). Happily turning that on eliminates the caches completely (which is undergoing a PR now).
from enzyme.
Thanks for the quick fix ! 👍
from enzyme.
FYI, I cloned a fresh enzyme revision 6117bbd which should contain the current fix, then rebuild, and reinstall Enzyme.
But I still observe the bug with clang version 11.1.0
I'll try a more recent clang (hopefully it will have a compatible Cuda version).
from enzyme.
The bug is also present with
clang version 12.0.1
freshly installed from https://github.com/llvm/llvm-project/tree/release/12.x
I'll try clang's mainline
from enzyme.
Just to confirm, can you past the output of the analysis?
Specifically, the thing to look for is that there's no more lines like bugDense.cpp:21:31: remark: Caching instruction
There will still exist some lines like bugDense.cpp:21:21: remark: Load may need caching
which effectively say that you're overwriting out and if you need the old value of out, it needs to be cached because of this store. A different analysis (differential use analysis) will determine that Enzyme won't need the old value of out in the reverse, and thus there's no need to cache it (and thus no Caching instruction...)
from enzyme.
OK, it still spits plenty of lines but looking more carefully there is no longer "remark: Caching instruction".
👍 so I guess it should be OK now.
There is still a "remark: Load must be recomputed" though (but I care less about this provided it's not quadratic)
Here is the new output :
clang bugDense.cpp -lstdc++ -lm -fno-exceptions -Rpass=enzyme -Xclang -load -Xclang /usr/local/lib/ClangEnzyme-12.so -O2 -o bin/bugDense
remark: Load may need caching %arrayidx9.promoted.i = load double, double* %arrayidx9.i, align 8, !tbaa !47, !alias.scope !32, !noalias !44 due to store double %add10.i, double* %arrayidx9.i, align 8, !dbg !49, !tbaa !47, !alias.scope !32, !noalias !44 [-Rpass=enzyme]
bugDense.cpp:21:21: remark: Load may need caching %10 = load double, double* %arrayidx.i, align 8, !dbg !56, !tbaa !47, !alias.scope !26, !noalias !57 due to tail call void @llvm.memset.p0i8.i64(i8* align 8 %v6.i.i34, i8 0, i64 %15, i1 false) #12, !dbg !79, !alias.scope !80, !noalias !83 [-Rpass=enzyme]
out[i] += A[i*m+j] *x[j];
^
bugDense.cpp:21:31: remark: Load may need caching %11 = load double, double* %arrayidx6.i, align 8, !dbg !58, !tbaa !47, !alias.scope !30, !noalias !59 due to tail call void @llvm.memset.p0i8.i64(i8* align 8 %v6.i.i34, i8 0, i64 %15, i1 false) #12, !dbg !79, !alias.scope !80, !noalias !83 [-Rpass=enzyme]
out[i] += A[i*m+j] *x[j];
^
remark: Load may need caching %arrayidx9.promoted.i43 = load double, double* %arrayidx9.i42, align 8, !tbaa !47, !alias.scope !74, !noalias !83 due to store double %add10.i54, double* %arrayidx9.i42, align 8, !dbg !86, !tbaa !47, !alias.scope !74, !noalias !83 [-Rpass=enzyme]
bugDense.cpp:21:31: remark: Load must be recomputed %11 = load double, double* %arrayidx6.i, align 8, !dbg !58, !tbaa !47, !alias.scope !30, !noalias !59 in reverse_invertfor.body4.i due to tail call void @llvm.memset.p0i8.i64(i8* align 8 %v6.i.i34, i8 0, i64 %15, i1 false) #12, !dbg !79, !alias.scope !80, !noalias !83 [-Rpass=enzyme]
from enzyme.
Related Issues (20)
- Adjust tablegen to handle mutations during fwd-mode
- Incorrect value error HOT 11
- OpenMP Parallelization Support for CLASS
- differentiate reduction function with CUDA atomicAdd HOT 8
- sqrt behavior at 0 HOT 30
- Store not perssisted in reverse pass of rematerialized value
- inspect tape size with LLVM 9 HOT 3
- Enzyme hangs when using CoDiPack forward AD type in function HOT 7
- Support for Cuda Graphs to be able to AD Codes based on [AMReX](https://github.com/AMReX-Codes/amrex)
- `__enzyme_allocation_like` fails to detect null pointer argument index HOT 1
- faddeeva_erfc and friends not handled for macOS aarch64 HOT 7
- Create a default CustomErrorHandler using EmitFailure
- enzyme error when differentiating stateful lambda, but no problems with explicitly defined functor HOT 3
- compiler crash on simple `__enzyme_fwddiff()` call
- `__enzyme_autodiff` w/ `enzyme_out` + pass by reference crashes HOT 1
- Lookup undef error HOT 3
- Crashing input on LLVM 16 HOT 3
- No ClangEnzyme-<version>.so library created when compiling the code on MacOSX HOT 6
- No reverse pass found for Petsc Functions HOT 9
- Merge-Queue rerunning CI HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enzyme.