Comments (8)
I can post a pretrained model with the next release, this is a good suggestion!
The error you report looks like a CUDA.jl bug or a problem with unsupported drivers.
Are you using the master version of the repo or the v0.3 release?
Can you try again after a "Pkg.update()"?
from alphazero.jl.
Thanks for quick answer! Initially I tried current master branch, but after your suggestion I switched to release 0.3.0, where there was new cuda related error. Updating packages, and reinstalling Nvidia driver to older version didn't help.
Cublas documentation states that CUBLAS_STATUS_EXECUTION_FAILED means:
The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons. To correct: check that the hardware, an appropriate version of the driver, and the cuBLAS library are correctly installed.
So it's hard to detect actual cause, but probably it's related to my setup, and I'll try to run on different PC. Strangely self play games and explorer works fine, so at least CUDA is installed, but it fails during forward pass.
from alphazero.jl.
Following the connect-four tutorial, I get the same error message: ERROR: LoadError: CUBLASError: the GPU program failed to execute (code 13, CUBLAS_STATUS_EXECUTION_FAILED)
My setup is Ubuntu 18.04 with Nvidia driver 440.95.01 and AlphaZero 0.3.0 (8ed9eb0b-7496-408d-8c8b-2119aeea02cd).
Any recommendations? Thanks in advance!
from alphazero.jl.
Would you mind also trying master?
from alphazero.jl.
On master I get the same error message as on v0.3.0.
from alphazero.jl.
Can I get the whole stacktrace?
from alphazero.jl.
Stacktrace:
[1] throw_api_error(::CUDA.CUBLAS.cublasStatus_t) at /home/martijn/.julia/packages/CUDA/dZvbp/lib/cublas/error.jl:47
[2] macro expansion at /home/martijn/.julia/packages/CUDA/dZvbp/lib/cublas/error.jl:58 [inlined]
[3] cublasSgemm_v2(::Ptr{Nothing}, ::Char, ::Char, ::Int64, ::Int64, ::Int64, ::Array{Float32,1}, ::CUDA.CuArray{Float32,2}, ::Int64, ::CUDA.CuArray{Float32,2}, ::Int64, ::Array{Float32,1}, ::CUDA.CuArray{Float32,2}, ::Int64) at /home/martijn/.julia/packages/CUDA/dZvbp/lib/utils/call.jl:93
[4] gemm!(::Char, ::Char, ::Float32, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,2}, ::Float32, ::CUDA.CuArray{Float32,2}) at /home/martijn/.julia/packages/CUDA/dZvbp/lib/cublas/wrappers.jl:721
[5] gemm_wrapper!(::CUDA.CuArray{Float32,2}, ::Char, ::Char, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,2}, ::Float32, ::Float32) at /home/martijn/.julia/packages/CUDA/dZvbp/lib/cublas/linalg.jl:188
[6] mul! at /home/martijn/.julia/packages/CUDA/dZvbp/lib/cublas/linalg.jl:222 [inlined]
[7] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/matmul.jl:208 [inlined]
[8] * at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/matmul.jl:160 [inlined]
[9] (::Flux.Dense{typeof(NNlib.relu),CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1}})(::CUDA.CuArray{Float32,2}) at /home/martijn/.julia/packages/Flux/05b38/src/layers/basic.jl:123
[10] Dense at /home/martijn/.julia/packages/Flux/05b38/src/layers/basic.jl:134 [inlined]
[11] applychain at /home/martijn/.julia/packages/Flux/05b38/src/layers/basic.jl:36 [inlined] (repeats 4 times)
[12] (::Flux.Chain{Tuple{Flux.Conv{2,4,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},typeof(Flux.flatten),Flux.Dense{typeof(NNlib.relu),CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1}},Flux.Dense{typeof(tanh),CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1}}}})(::CUDA.CuArray{Float32,4}) at /home/martijn/.julia/packages/Flux/05b38/src/layers/basic.jl:38
[13] forward(::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}) at /home/martijn/AlphaZero.jl/src/networks/flux.jl:162
[14] evaluate(::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}) at /home/martijn/AlphaZero.jl/src/networks/network.jl:253
[15] losses(::AlphaZero.FluxLib.ResNet{Game}, ::LearningParams, ::Float32, ::Float32, ::Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}) at /home/martijn/AlphaZero.jl/src/learning.jl:62
[16] learning_status(::AlphaZero.Trainer, ::Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}) at /home/martijn/AlphaZero.jl/src/learning.jl:142
[17] (::AlphaZero.var"#71#74"{AlphaZero.Trainer})(::Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}) at ./none:0
[18] iterate at ./generator.jl:47 [inlined]
[19] collect(::Base.Generator{Base.Generator{Array{Tuple{Array{Float32,2},Array{Float32,4},Array{Float32,2},Array{Float32,2},Array{Float32,2}},1},AlphaZero.Util.var"#9#11"{AlphaZero.var"#70#73"{AlphaZero.Trainer}}},AlphaZero.var"#71#74"{AlphaZero.Trainer}}) at ./array.jl:686
[20] learning_status(::AlphaZero.Trainer) at /home/martijn/AlphaZero.jl/src/learning.jl:157
[21] learning_step!(::Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}, ::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/martijn/AlphaZero.jl/src/training.jl:170
[22] macro expansion at ./timing.jl:310 [inlined]
[23] macro expansion at /home/martijn/AlphaZero.jl/src/report.jl:229 [inlined]
[24] train!(::Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}, ::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/martijn/AlphaZero.jl/src/training.jl:295
[25] resume!(::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/martijn/AlphaZero.jl/src/ui/session.jl:383
[26] top-level scope at /home/martijn/AlphaZero.jl/scripts/alphazero.jl:89
[27] include(::Function, ::Module, ::String) at ./Base.jl:380
[28] include(::Module, ::String) at ./Base.jl:368
[29] exec_options(::Base.JLOptions) at ./client.jl:296
[30] _start() at ./client.jl:506
in expression starting at /home/martijn/AlphaZero.jl/scripts/alphazero.jl:80
from alphazero.jl.
Thanks! I recommend that you open an issue on CUDA.jl, with your exact config, Pkg.status() and stack trace.
from alphazero.jl.
Related Issues (20)
- Training Stuck at 'Network Only against MinMax (depth 6)' after Modifying TicTacToe Board Size to 4x4 HOT 4
- [Feedback Requested] Wishlist for the next major release of AlphaZero.jl HOT 4
- Maximum number of iterations HOT 1
- Success stories HOT 2
- How to prevent GPU OOM? HOT 9
- What next after training an agent? HOT 2
- How to play a one-player game automatically and get the move sequence? HOT 1
- I can't install it to run training HOT 1
- LoadError: UndefVarError: #flatten not defined
- Any way to start when playing against AlphaZero? HOT 2
- AZ much worse than generic solution for simple game HOT 2
- When should I stop learning? HOT 2
- How many iterations are required?
- Julia 1.9?
- How important is loss during learning
- How to restart a training session that has completed HOT 3
- Not invoking alternate implementation of select_move HOT 1
- Success report and request for help HOT 3
- got Out of GPU memory when learning HOT 6
- Non-trivial games HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alphazero.jl.