Comments (4)
I strongly believe that the issue is related to the affine implementation of batchnorm function in Knet. Here, I will report all the observations I made as well as my custom solution although I might be wrong at some points (please correct me).
First of all, in our mwe, if I do not feed bparam
(parameter for batch normalization used in affine implementation) inside the batchnorm function, I do not get any error and the code works fine. Therefore, the below code works fine
using AutoGrad, Random, CUDA, Knet
ZDIM=50
XDIM=100
BATCH=10
# atype = Array{Float64}
atype = (CUDA.functional() ? KnetArray{Float32} : Array{Float32})
target = atype(randn(XDIM,BATCH))
w = Param(atype(randn(XDIM,ZDIM)))
bparam = Param(atype((bnparams(XDIM)))) # HERE I DEFINE A PARAMETER FOR BATCH NORMALIZATION
z0 = Param(atype(zeros(ZDIM,BATCH))) # I ADDED A BATCH DIMENSION TO z0
# WE DO NOT FEED bparam VECTOR INSIDE THE BELOW BATCH NORMALIZATION FUNCTION
decoder(z) = batchnorm(w*z, bnmoments(); training = true ) # AFTER A LINEAR LAYER, I ADDED BATCH NORMALIZATION OPERATION
qloss(x,y) = sum(abs2, x .- y)
function loss(x)
d = @diff qloss(x, decoder(z0))
z = -grad(d,z0)
qloss(x, decoder(z))
end
J = @diff loss(target)
grad(J, w) |> summary |> println
Since I do not use bparam
, batch normalization function only uses the mu
and ivar
from the data (check _batchnorm4_fused
in
Knet.jl/src/ops20/batchnorm.jl
Lines 19 to 54 in 0485870
y .= (y .- mu) .* ivar
(Eq. 1)
However, what I want is the following,
y .= g .* (y .- mu) .* ivar .+ b
(Eq. 2)
Therefore, I wrote a custom batch normalization function based on the batchnorm of Knet as the following.
# Dimension helpers
@inline _wsize(y) = ((1 for _=1:ndims(y)-2)..., size(y)[end-1], 1)
function mybatchnorm(x, moments, bparam; training = true)
bparam_dim = size(bparam,1)
g = reshape(bparam[1:bparam_dim/2], _wsize(x))
b = reshape(bparam[bparam_dim/2 + 1 : bparam_dim], _wsize(x))
return g.* batchnorm(x, moments; training = training) .+ b
end
In this function, I feed my learnable parameter bparam
which includes g
and b
vectors which are used in Eq. 2. Then, I return the affine transformation I need. I believe that it corresponds to the exactly same thing implemented in Knet batchnorm function. However, if I use this custom batch normalization function, I do not get any error while taking the derivative of the loss function. In conclusion, the following piece of code works,
using AutoGrad, Random, CUDA, Knet
# Dimension helpers
@inline _wsize(y) = ((1 for _=1:ndims(y)-2)..., size(y)[end-1], 1)
function mybatchnorm(x, moments, bparam; training = true)
bparam_dim = size(bparam,1)
g = reshape(bparam[1:bparam_dim/2], _wsize(x))
b = reshape(bparam[bparam_dim/2 + 1 : bparam_dim], _wsize(x))
return g.* batchnorm(x, moments; training = training) .+ b
end
ZDIM=50
XDIM=100
BATCH=10
# atype = Array{Float64}
atype = (CUDA.functional() ? KnetArray{Float32} : Array{Float32})
target = atype(randn(XDIM,BATCH))
w = Param(atype(randn(XDIM,ZDIM)))
bparam = Param(atype((bnparams(XDIM)))) # HERE I DEFINE A PARAMETER FOR BATCH NORMALIZATION
z0 = Param(atype(zeros(ZDIM,BATCH))) # I ADDED A BATCH DIMENSION TO z0
decoder(z) = mybatchnorm(w*z, bnmoments(), bparam; training = true ) # AFTER A LINEAR LAYER, I ADDED BATCH NORMALIZATION OPERATION
qloss(x,y) = sum(abs2, x .- y)
function loss(x)
d = @diff qloss(x, decoder(z0))
z = -grad(d,z0)
qloss(x, decoder(z))
end
J = @diff loss(target)
grad(J, w) |> summary |> println
I could not figure out the reason of the error I get with affine implementation of batchnorm function in Knet. I hope my observations help to figure it out together. I will keep working on it. I will appreciate any comment to understand the main reason of the issue.
from knet.jl.
Dear Barışcan, could you try the following:
- Try your model without the batchnorm operations.
- Try your model with Array and/or CuArray instead of KnetArray for array type.
- Send me a minimal working example https://en.wikipedia.org/wiki/Minimal_working_example: complete source
file or notebook which I can directly run to get the error. Your explanations above are good, but I cannot run the code.
Here is an mwe that I tried with the same logic, but it does not reproduce the error:
using AutoGrad, Random, CUDA, Knet
ZDIM=50
XDIM=100
BATCH=10
atype = Array{Float64}
target = atype(randn(XDIM))
w = Param(atype(randn(XDIM,ZDIM)))
z0 = Param(atype(zeros(ZDIM)))
decoder(z) = w*z
qloss(x,y) = sum(abs2, x .- y)
function loss(x)
d = @diff qloss(x, decoder(z0))
z = -grad(d,z0)
qloss(x, decoder(z))
end
J = @diff loss(target)
grad(J, w) |> summary |> println
from knet.jl.
Hello again,
I appreciate your comments, which were helpful for me to understand several issues.
When I do not use the batchnorm function, I was able to take the derivative of my loss_train(theta, x)
function. I test both Array{Float32}
and KnetArray{Float32}
for my array type. Both work fine if I do not include batch normalization in my model code. However, batch normalization is an important component of the model in my opinion. When I modify your example as in the following, I get exactly the same error (which might serve as a minimal working example in this case)
using AutoGrad, Random, CUDA, Knet
ZDIM=50
XDIM=100
BATCH=10
# atype = Array{Float64}
atype = (CUDA.functional() ? KnetArray{Float32} : Array{Float32})
target = atype(randn(XDIM,BATCH)) # I ADDED A BATCH DIMENSION TO target
w = Param(atype(randn(XDIM,ZDIM)))
bparam = Param(atype((bnparams(XDIM)))) # HERE I DEFINE A PARAMETER FOR BATCH NORMALIZATION
z0 = Param(atype(zeros(ZDIM,BATCH))) # I ADDED A BATCH DIMENSION TO z0
decoder(z) = batchnorm(w*z, bnmoments(), bparam; training = true ) # AFTER A LINEAR LAYER, I ADDED BATCH NORMALIZATION OPERATION
qloss(x,y) = sum(abs2, x .- y)
function loss(x)
d = @diff qloss(x, decoder(z0))
z = -grad(d,z0)
qloss(x, decoder(z))
end
J = @diff loss(target)
grad(J, w) |> summary |> println
Now, I conclude that the error is related to the batch normalization layer. Without batchnorm, the optimization of the model goes fine. However, the final performance of the model is worse compared to the one which uses batch normalization (in Pytorch). Therefore, I cannot obtain exatly the same results given in the offical code for GON (https://github.com/BariscanBozkurt/GON/blob/master/Variational-GON.py). How can I arrange my code (or the modified example code I provided in this comment) to take the derivative of the loss which uses a model with a batch normalization layer?
from knet.jl.
Hello again,
I appreciate your comments, which were helpful for me to understand several issues.
When I do not use the batchnorm function, I was able to take the derivative of my
loss_train(theta, x)
function. I test bothArray{Float32}
andKnetArray{Float32}
for my array type. Both work fine if I do not include batch normalization in my model code. However, batch normalization is an important component of the model in my opinion. When I modify your example as in the following, I get exactly the same error (which might serve as a minimal working example in this case)using AutoGrad, Random, CUDA, Knet ZDIM=50 XDIM=100 BATCH=10 # atype = Array{Float64} atype = (CUDA.functional() ? KnetArray{Float32} : Array{Float32}) target = atype(randn(XDIM,BATCH)) # I ADDED A BATCH DIMENSION TO z0 w = Param(atype(randn(XDIM,ZDIM))) bparam = Param(atype((bnparams(XDIM)))) # HERE I DEFINE A PARAMETER FOR BATCH NORMALIZATION z0 = Param(atype(zeros(ZDIM,BATCH))) # I ADDED A BATCH DIMENSION TO z0 decoder(z) = batchnorm(w*z, bnmoments(), bparam; training = true ) # AFTER A LINEAR LAYER, I ADDED BATCH NORMALIZATION OPERATION qloss(x,y) = sum(abs2, x .- y) function loss(x) d = @diff qloss(x, decoder(z0)) z = -grad(d,z0) qloss(x, decoder(z)) end J = @diff loss(target) grad(J, w) |> summary |> println
Now, I conclude that the error is related to the batch normalization layer. Without batchnorm, the optimization of the model goes fine. However, the final performance of the model is worse compared to the one which uses batch normalization (in Pytorch). Therefore, I cannot obtain exatly the same results given in the offical code for GON (https://github.com/cwkx/GON). How can I arrange my code (or the modified example code I provided in this comment) to take the derivative of the loss which uses a model with a batch normalization layer?
In the above minimal working example, I use KnetArray{Float32}
for my array type. Now, I realized if use Array{Float32}
as in the following code
using AutoGrad, Random, CUDA, Knet
ZDIM=50
XDIM=100
BATCH=10
atype = Array{Float32}
# atype = (CUDA.functional() ? KnetArray{Float32} : Array{Float32})
target = atype(randn(XDIM, BATCH))
w = Param(atype(randn(XDIM,ZDIM)))
bparam = Param(atype((bnparams(XDIM)))) # HERE I DEFINE A PARAMETER FOR BATCH NORMALIZATION
z0 = Param(atype(zeros(ZDIM,BATCH))) # I ADDED A BATCH DIMENSION TO z0
decoder(z) = batchnorm(w*z, bnmoments(), bparam; training = true ) # AFTER A LINEAR LAYER, I ADDED BATCH NORMALIZATION OPERATION
qloss(x,y) = sum(abs2, x .- y)
function loss(x)
d = @diff qloss(x, decoder(z0))
z = -grad(d,z0)
qloss(x, decoder(z))
end
J = @diff loss(target)
grad(J, w) |> summary |> println
I get the following error (which is a very similar error with the previous one)
Stacktrace:
[1] setindex!
@ ./array.jl:845 [inlined]
[2] setindex!
@ ./multidimensional.jl:639 [inlined]
[3] macro expansion
@ ./broadcast.jl:984 [inlined]
[4] macro expansion
@ ./simdloop.jl:77 [inlined]
[5] copyto!
@ ./broadcast.jl:983 [inlined]
[6] copyto!
@ ./broadcast.jl:936 [inlined]
[7] copyto!(x::AutoGrad.Result{Array{Float32, 4}}, y::Base.Broadcast.Broadcasted{Base.Broadcast.Style{AutoGrad.Value}, NTuple{4, Base.OneTo{Int64}}, typeof(identity), Tuple{AutoGrad.Result{Array{Float32, 4}}}})
@ AutoGrad ~/.julia/packages/AutoGrad/TTpeo/src/core.jl:55
[8] materialize!
@ ./broadcast.jl:894 [inlined]
[9] materialize!
@ ./broadcast.jl:891 [inlined]
[10] materialize!(dest::AutoGrad.Result{Array{Float32, 4}}, x::AutoGrad.Result{Array{Float32, 4}})
@ Base.Broadcast ./broadcast.jl:887
[11] batchnorm4_back(g::AutoGrad.Result{Array{Float32, 4}}, x::AutoGrad.Result{Array{Float32, 4}}, dy::AutoGrad.Result{Array{Float32, 4}}; eps::Float64, training::Bool, cache::Knet.Ops20.BNCache, moments::Knet.Ops20.BNMoments, o::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Knet.Ops20 ~/.julia/packages/Knet/RCkV0/src/ops20/batchnorm.jl:262
[12] #batchnorm4g#189
@ ~/.julia/packages/Knet/RCkV0/src/ops20/batchnorm.jl:296 [inlined]
[13] #back#196
@ ./none:0 [inlined]
[14] differentiate(::Function; o::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ AutoGrad ~/.julia/packages/AutoGrad/TTpeo/src/core.jl:165
[15] differentiate
@ ~/.julia/packages/AutoGrad/TTpeo/src/core.jl:135 [inlined]
[16] loss(x::Vector{Float32})
@ Main ./In[4]:17
[17] (::var"#15#16")()
@ Main ~/.julia/packages/AutoGrad/TTpeo/src/core.jl:205
[18] differentiate(::Function; o::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ AutoGrad ~/.julia/packages/AutoGrad/TTpeo/src/core.jl:144
[19] differentiate(::Function)
@ AutoGrad ~/.julia/packages/AutoGrad/TTpeo/src/core.jl:135
[20] top-level scope
@ In[4]:22
[21] eval
@ ./boot.jl:360 [inlined]
[22] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1116
[23] softscope_include_string(m::Module, code::String, filename::String)
@ SoftGlobalScope ~/.julia/packages/SoftGlobalScope/u4UzH/src/SoftGlobalScope.jl:65
[24] execute_request(socket::ZMQ.Socket, msg::IJulia.Msg)
@ IJulia ~/.julia/packages/IJulia/e8kqU/src/execute_request.jl:67
[25] #invokelatest#2
@ ./essentials.jl:708 [inlined]
[26] invokelatest
@ ./essentials.jl:706 [inlined]
[27] eventloop(socket::ZMQ.Socket)
@ IJulia ~/.julia/packages/IJulia/e8kqU/src/eventloop.jl:8
[28] (::IJulia.var"#15#18")()
@ IJulia ./task.jl:411
MethodError: Cannot `convert` an object of type AutoGrad.Result{Float32} to an object of type Float32
Closest candidates are:
convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:180
convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:136
...
Stacktrace:
[1] differentiate(::Function; o::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ AutoGrad ~/.julia/packages/AutoGrad/TTpeo/src/core.jl:148
[2] differentiate(::Function)
@ AutoGrad ~/.julia/packages/AutoGrad/TTpeo/src/core.jl:135
[3] top-level scope
@ In[4]:22
[4] eval
@ ./boot.jl:360 [inlined]
[5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1116
from knet.jl.
Related Issues (20)
- dropout seeds the global RNG
- MethodError: no method matching LinearIndices HOT 2
- ERROR: LoadError: KeyError: key :nw not found
- missing docstring error for "File I/O" section
- Making a new CUDA 3 compatible release? HOT 7
- Create a DeepMap.jl package
- Knet 1.4.7: libknet8 library not found HOT 4
- LoadError: Failed to precompile Knet [1902f260-5fb4-5aff-8c31-6271790ab950] to my julia directory HOT 3
- TagBot trigger issue HOT 2
- load/save problem with CuArrays in tutorial/60.rnn.ipynb HOT 2
- R1 Regularization HOT 9
- type DataType has no field mutable HOT 2
- Quick Start tutorial notebook is broken HOT 1
- Cannot locate artifact 'libknet8' for aarch64-linux-gnu-libgfortran5-cxx11-libstdcxx29-julia_version+1.7.1 in Docker Container on Apple Silicon HOT 9
- Fix deprecations from MLDatasets in tutorial notebooks
- Problem with conv4 on gpu HOT 4
- UndefVarError: accuracy not defined in tutorial 30.lin.ipynb HOT 1
- CuArray{Float32} in ytype raised error HOT 4
- how to use the resnet modules in example folder
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from knet.jl.