Comments (8)
Hi hxjz233,
I looked into this for a bit, firstly:
The MethodError: no method matching StridedView.StridedView(::FillArrays...)
is an incredibly annoying feature of Zygote, which generates these types of arrays when working with sum
. We currently do not have an implementation of tensoroperations that supports fillarrays, which is what makes that fail. (See the discussions here, here and this PR in zygote).
I do agree that we should probably fix that, but I currently haven't been able to find the time to tackle writing fallback tensoroperations methods for generic arrays.
The freed reference is actually interesting, and it's something that I overlooked. We try to offload some of the memory pressure of the GPU by inserting CUDA.unsafe_free
on temporary tensors that are generated during the @(cu)tensor
forward pass, which we cannot do for reverse mode AD, as then of course these objects are no longer temporary. I'll try and get a fix going, and update you when I figure it out!
from tensoroperations.jl.
Hi @hxjz233, my apologies, it seems like this previous message went under my radar. I'll try and get some time to investigate this week, in the meantime I re-opened the issue so it's less likely that I'll miss it 😉 Should I forget, definitely don't hesitate to ping me once more!
from tensoroperations.jl.
Thanks for testing out! Probably be a CUDA or Zygote issue.
My purpose was to write a line that would put a soft ceiling on a tensor which should be iteratively updated (A
here). Since I just found that norm(A,2)
was far faster than this clumsy max(abs(A))
normalization before taking the gradient (and also norm(A, Inf)
and max(abs, A)
which saves the time accessing the memory) I think I would stick to that solution first.
It can be a bit surprising for a beginner like me to see that finding a max element is not computationally cheaper than finding the squared norm, both for CPU and GPU. But anyway, many thanks for the inspiration on trying out other normalizations!
Closing this issue because this is indeed not TensorOperations-related. If you feel that there is anything about those normalization functions that is worth mentioning I would surely appreciate it!
from tensoroperations.jl.
Ok, I think at the very least that first PR should fix the freed reference errors. Do you mind testing it out, and letting me know if it resolves the issues?
from tensoroperations.jl.
Thanks for the quick fix! It works quite well for the freed reference problem (and is eventually enabling my whole cuda+tensoroperation+zygote project, when using real numbers).
The non_bitstype problem when use_complex=true, normalize=true
persists. Let's keep in touch on that!
from tensoroperations.jl.
Hi there, it's been a while and I find the former fix on the unsafe_free
useful! But I still feel confused by the non_bitstype
problem. Is there a way to fix or understand it? BTW I am good with the current sum
implementation.
To reproduce, please use use_complex=true, normalize=true
for the example code above. I see on my side
Error message
ERROR: LoadError: GPU compilation of MethodInstance for (::GPUArrays.var"#broadcast_kernel#38")(::CUDA.CuKernelContext, ::CuDeviceMatrix{ComplexF64, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, Zygote.var"#1412#1416"{1, Int64}, Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{ForwardDiff.Dual{Nothing, Float64, 2}, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, ::Int64) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, Zygote.var"#1412#1416"{1, Int64}, Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{ForwardDiff.Dual{Nothing, Float64, 2}, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, which is not isbits:
.args is of type Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{ForwardDiff.Dual{Nothing, Float64, 2}, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which is not isbits.
.1 is of type Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
.x is of type Matrix{Float64} which is not isbits.
Stacktrace:
[1] check_invocation(job::GPUCompiler.CompilerJob)
@ GPUCompiler D:\Julia\depot\packages\GPUCompiler\U36Ed\src\validation.jl:92
[2] macro expansion
@ D:\Julia\depot\packages\GPUCompiler\U36Ed\src\driver.jl:123 [inlined]
[3] macro expansion
@ D:\Julia\depot\packages\TimerOutputs\RsWnF\src\TimerOutput.jl:253 [inlined]
[4] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ GPUCompiler D:\Julia\depot\packages\GPUCompiler\U36Ed\src\driver.jl:121
[5] codegen
@ D:\Julia\depot\packages\GPUCompiler\U36Ed\src\driver.jl:110 [inlined]
[6] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
@ GPUCompiler D:\Julia\depot\packages\GPUCompiler\U36Ed\src\driver.jl:106
[7] compile
@ D:\Julia\depot\packages\GPUCompiler\U36Ed\src\driver.jl:98 [inlined]
[8] #1075
@ D:\Julia\depot\packages\CUDA\rXson\src\compiler\compilation.jl:247 [inlined]
[9] JuliaContext(f::CUDA.var"#1075#1077"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
@ GPUCompiler D:\Julia\depot\packages\GPUCompiler\U36Ed\src\driver.jl:47
[10] compile(job::GPUCompiler.CompilerJob)
@ CUDA D:\Julia\depot\packages\CUDA\rXson\src\compiler\compilation.jl:246
[11] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler D:\Julia\depot\packages\GPUCompiler\U36Ed\src\execution.jl:125
[12] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler D:\Julia\depot\packages\GPUCompiler\U36Ed\src\execution.jl:103
[13] macro expansion
@ D:\Julia\depot\packages\CUDA\rXson\src\compiler\execution.jl:359 [inlined]
[14] macro expansion
@ .\lock.jl:267 [inlined]
[15] cufunction(f::GPUArrays.var"#broadcast_kernel#38", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{ComplexF64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, Zygote.var"#1412#1416"{1, Int64}, Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{CuDeviceMatrix{ForwardDiff.Dual{Nothing, Float64, 2}, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA D:\Julia\depot\packages\CUDA\rXson\src\compiler\execution.jl:354
[16] cufunction
@ D:\Julia\depot\packages\CUDA\rXson\src\compiler\execution.jl:351 [inlined]
[17] macro expansion
@ D:\Julia\depot\packages\CUDA\rXson\src\compiler\execution.jl:104 [inlined]
[18] #launch_heuristic#1118
@ D:\Julia\depot\packages\CUDA\rXson\src\gpuarrays.jl:17 [inlined]
[19] launch_heuristic
@ D:\Julia\depot\packages\CUDA\rXson\src\gpuarrays.jl:15 [inlined]
[20] _copyto!
@ D:\Julia\depot\packages\GPUArrays\dAUOE\src\host\broadcast.jl:70 [inlined]
[21] copyto!
@ D:\Julia\depot\packages\GPUArrays\dAUOE\src\host\broadcast.jl:51 [inlined]
[22] copy
@ D:\Julia\depot\packages\GPUArrays\dAUOE\src\host\broadcast.jl:42 [inlined]
[23] materialize
@ .\broadcast.jl:873 [inlined]
[24] broadcast(::Zygote.var"#1412#1416"{1, Int64}, ::Matrix{Float64}, ::CuArray{ForwardDiff.Dual{Nothing, Float64, 2}, 2, CUDA.Mem.DeviceBuffer})
@ Base.Broadcast .\broadcast.jl:811
[25] #1411
@ D:\Julia\depot\packages\Zygote\jxHJc\src\lib\broadcast.jl:325 [inlined]
[26] ntuple
@ .\ntuple.jl:48 [inlined]
[27] bc_fwd_back
@ D:\Julia\depot\packages\Zygote\jxHJc\src\lib\broadcast.jl:324 [inlined]
[28] #4163#back
@ D:\Julia\depot\packages\ZygoteRules\M4xmc\src\adjoint.jl:72 [inlined]
[29] #291
@ D:\Julia\depot\packages\Zygote\jxHJc\src\lib\lib.jl:206 [inlined]
[30] #2173#back
@ D:\Julia\depot\packages\ZygoteRules\M4xmc\src\adjoint.jl:72 [inlined]
[31] Pullback
@ .\broadcast.jl:1311 [inlined]
[32] (::Zygote.Pullback{Tuple{typeof(Base.Broadcast.broadcasted), typeof(abs), CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{typeof(Base.Broadcast.broadcastable), CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.var"#2017#back#204"{typeof(identity)}, Zygote.var"#2849#back#673"{Zygote.var"#map_back#667"{typeof(Base.Broadcast.broadcastable), 1, Tuple{Tuple{}}, Tuple{Val{0}}, Tuple{}}}, Zygote.var"#2017#back#204"{typeof(identity)}, Zygote.var"#2173#back#293"{Zygote.var"#291#292"{Tuple{Tuple{Nothing, Nothing, Nothing}, Tuple{}}, Zygote.var"#4163#back#1426"{Zygote.var"#bc_fwd_back#1414"{1, CuArray{ForwardDiff.Dual{Nothing, Float64, 2}, 2, CUDA.Mem.DeviceBuffer}, Tuple{CuArray{ComplexF64, 2, CUDA.Mem.DeviceBuffer}}, Val{1}}}}}, Zygote.var"#2173#back#293"{Zygote.var"#291#292"{Tuple{Tuple{Nothing}, Tuple{}}, Zygote.var"#combine_styles_pullback#1168"{Tuple{Nothing, Nothing}}}}}})(Δ::Matrix{Float64})
@ Zygote D:\Julia\depot\packages\Zygote\jxHJc\src\compiler\interface2.jl:0
[33] Pullback
@ D:\MagBEC\juliatest\t_adjulia\cuTensorAD2.jl:22 [inlined]
[34] (::Zygote.Pullback{Tuple{var"##free_ref_or_nonbits#3", Bool, Bool, typeof(free_ref_or_nonbits), CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Any})(Δ::Float64)
@ Zygote D:\Julia\depot\packages\Zygote\jxHJc\src\compiler\interface2.jl:0
[35] Pullback
@ D:\MagBEC\juliatest\t_adjulia\cuTensorAD2.jl:5 [inlined]
[36] (::Zygote.Pullback{Tuple{typeof(Core.kwcall), NamedTuple{(:use_complex, :normalize), Tuple{Bool, Bool}}, typeof(free_ref_or_nonbits), CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Any})(Δ::Float64)
@ Zygote D:\Julia\depot\packages\Zygote\jxHJc\src\compiler\interface2.jl:0
[37] Pullback
@ D:\MagBEC\juliatest\t_adjulia\cuTensorAD2.jl:67 [inlined]
[38] (::Zygote.Pullback{Tuple{var"#f#5", CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{typeof(Core.kwcall), NamedTuple{(:use_complex, :normalize), Tuple{Bool, Bool}}, typeof(free_ref_or_nonbits), CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Any}, Zygote.var"#2017#back#204"{typeof(identity)}, Zygote.Pullback{Tuple{Type{NamedTuple{(:use_complex, :normalize)}}, Tuple{Bool, Bool}}, Tuple{Zygote.Pullback{Tuple{Type{NamedTuple{(:use_complex, :normalize), Tuple{Bool, Bool}}}, Tuple{Bool, Bool}}, Tuple{Zygote.var"#2224#back#315"{Zygote.Jnew{NamedTuple{(:use_complex, :normalize), Tuple{Bool, Bool}}, Nothing, true}}}}}}}})(Δ::Float64)
@ Zygote D:\Julia\depot\packages\Zygote\jxHJc\src\compiler\interface2.jl:0
[39] (::Zygote.var"#75#76"{Zygote.Pullback{Tuple{var"#f#5", CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{typeof(Core.kwcall), NamedTuple{(:use_complex, :normalize), Tuple{Bool, Bool}}, typeof(free_ref_or_nonbits), CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Any}, Zygote.var"#2017#back#204"{typeof(identity)}, Zygote.Pullback{Tuple{Type{NamedTuple{(:use_complex, :normalize)}}, Tuple{Bool, Bool}}, Tuple{Zygote.Pullback{Tuple{Type{NamedTuple{(:use_complex, :normalize), Tuple{Bool, Bool}}}, Tuple{Bool, Bool}}, Tuple{Zygote.var"#2224#back#315"{Zygote.Jnew{NamedTuple{(:use_complex, :normalize), Tuple{Bool, Bool}}, Nothing, true}}}}}}}}})(Δ::Float64)
@ Zygote D:\Julia\depot\packages\Zygote\jxHJc\src\compiler\interface.jl:91
[40] gradient(f::Function, args::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer})
@ Zygote D:\Julia\depot\packages\Zygote\jxHJc\src\compiler\interface.jl:148
[41] g
@ D:\MagBEC\juliatest\t_adjulia\cuTensorAD2.jl:68 [inlined]
[42] AD()
@ Main D:\MagBEC\juliatest\t_adjulia\cuTensorAD2.jl:72
[43] top-level scope
@ D:\MagBEC\juliatest\t_adjulia\cuTensorAD2.jl:76
from tensoroperations.jl.
I guess that the original issue reporter is not able to reopen an issue if formerly closed by the contributers. Please feel free to let me know if the problem is reproducible on your side, and maybe reopen the issue if you feel it necessary. Many thanks! @lkdvos
from tensoroperations.jl.
I started playing around with this a bit, and I actually don't know what is going on here, but I think it's worth opening an issue with either CUDA or Zygote, as the issue is not actually TensorOperations-related. The following code produces the same error:
A = CUDA.rand(ComplexF64, 2, 2)
function f(A)
normcoeff = maximum(abs.(A))
A = A / normcoeff
return maximum(abs.(A))
end
gradient(f, A)[1]
I don't know enough of the inner workings of Zygote nor CUDA to know where to point you to, but I hope they will be able to help you further.
PS: I also tried norm(A, Inf)
, hoping that this would work better, but my preliminary testing yielded the same results
from tensoroperations.jl.
Related Issues (20)
- possible memory leak with metaprogramming
- Why drop caching Tensors? HOT 3
- Is TensorOperations able to take advantage of symmetry in the output? HOT 8
- Manual allocation strategy HOT 2
- Floating Point Accuracy of @tensor results with CUDA HOT 3
- Enable multithreads when doing the permutedims in the TTGT algorithms HOT 2
- Unexpected `DimensionMismatch` (v4.0.2 -> v4.0.3) HOT 3
- Wrong result with subnetworks with equal labels HOT 2
- Bug in CUDA backend HOT 6
- Unintuitive `ncon` result when scalar HOT 3
- Taking gradients of traces HOT 6
- np.einsum_path vs TensorOperations HOT 3
- `ncon` fails with AD HOT 2
- `tensortrace` not working on Arrays of Symbolic Expressions from Symbolics.jl. HOT 2
- Combining LinearAlgebra.Diagonal with a CuArray inside @tensor HOT 2
- Compability with CUDA 5.2 HOT 4
- Confusion when using cuTENSOR HOT 5
- cuTENSOR not working with automatic differentiation HOT 5
- TensorOperationscuTENSORExt fails to compile HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensoroperations.jl.