jonathan-laurent / alphazero.jl Goto Github PK
View Code? Open in Web Editor NEWA generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
Home Page: https://jonathan-laurent.github.io/AlphaZero.jl/stable/
License: MIT License
A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
Home Page: https://jonathan-laurent.github.io/AlphaZero.jl/stable/
License: MIT License
I’ve downloaded the codebase from github (v0.3) and have gone thru the instructions to run the sample, but I am getting iteration times that are substantially longer than what is mentioned in the documentation (8 – 16 hrs / iteration vs 30-50 minutes). I am wondering if I don’t have something set correctly on my system or with the Julia environment. See screen cap below which shows 4% with an ETA of 16 hrs and 37 minutes. System I have is reasonably capable but even on a higher end system (64 gig ram, rtx 2080 super) still getting much slower results.
Thoughts on the problem? Thank you.
I've started to play around with AlphaZero.jl over the last few days by implementing Othello and Hex. In the process I've run into similar issues a few times: As soon as I let the Game-struct be more stateful, training throws weird errors (e.g. "MCTS.explore! must be called before MCTS.policy", but also others that were more verbose and even less helpful). I expect that is the reason why these lines of code exists in Connect4:
function Base.copy(g::Game)
history = isnothing(g.history) ? nothing : copy(g.history)
Game(g.board, g.curplayer, g.finished, g.winner, copy(g.amask), history)
end
I am, generally speaking, at a loss when it comes to debugging this. I think I want to write a few tests to catch these kinds of errors, but I have no idea where to start. Any help would be appreciated.
Trying to train per instructions in Training a Connect Four Agent section.
Ubuntu 18.04 with RTX 2080ti
At first thought it might be a Julia version issue.
Tried with 1.4.0 and 1.3.1 but both have an error (1.4.0 outputs more warning type info).
Perhaps I'm doing something wrong:
brian@1920x-Ubuntu:~$ julia
_
_ _ ()_ | Documentation: https://docs.julialang.org
() | () () |
_ _ | | __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ ` | |
| | || | | | (| | | Version 1.3.1 (2019-12-30)
/ |_'|||_'_| | Official https://julialang.org/ release
|__/ |
julia>
brian@1920x-Ubuntu:$ git clone https://github.com/jonathan-laurent/AlphaZero.jl.git$ cd AlphaZero.jl/
Cloning into 'AlphaZero.jl'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (47/47), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 5859 (delta 15), reused 47 (delta 15), pack-reused 5812
Receiving objects: 100% (5859/5859), 8.56 MiB | 12.84 MiB/s, done.
Resolving deltas: 100% (3141/3141), done.
brian@1920x-Ubuntu:
brian@1920x-Ubuntu:/AlphaZero.jl$ julia --project -e "import Pkg; Pkg.instantiate()"/AlphaZero.jl$ julia --project --color=yes scripts/alphazero.jl --game connect-four train
Updating registry at ~/.julia/registries/General
Updating git-repo https://github.com/JuliaRegistries/General.git
brian@1920x-Ubuntu:
CuArrays.jl SplittingPool statistics:
Initializing a new AlphaZero environment
Initial report
Number of network parameters: 617,480
Number of regularized network parameters: 617,408
Memory footprint per MCTS node: 380 bytes
Running benchmark: AlphaZero against MCTS (1000 rollouts)
UndefVarError: lib not defined
Stacktrace:
[1] broadcasted(::typeof(NNlib.relu), ::Knet.KnetArray{Float32,4}) at /home/brian/.julia/packages/Knet/vxHRi/src/unary.jl:17
[2] (::AlphaZero.KNets.BatchNorm)(::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet/layers.jl:85
[3] (::AlphaZero.KNets.Chain)(::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet/layers.jl:19
[4] forward(::ResNet{Game}, ::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet.jl:148
[5] evaluate(::ResNet{Game}, ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,2}) at /home/brian/AlphaZero.jl/src/networks/network.jl:288
[6] evaluate_batch(::ResNet{Game}, ::Array{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},1}) at /home/brian/AlphaZero.jl/src/networks/network.jl:313
[7] inference_server(::AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}) at ./util.jl:288
[8] macro expansion at /home/brian/AlphaZero.jl/src/util.jl:64 [inlined]
[9] (::AlphaZero.MCTS.var"#21#23"{AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}})() at ./task.jl:333
***************** Hangs here so after ctrl-C
^C
signal (2): Interrupt
in expression starting at /home/brian/AlphaZero.jl/scripts/alphazero.jl:70
epoll_pwait at /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/epoll_pwait.c:42
uv__io_poll at /workspace/srcdir/libuv/src/unix/linux-core.c:270
uv_run at /workspace/srcdir/libuv/src/unix/core.c:359
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:448
poptaskref at ./task.jl:660
wait at ./task.jl:667
wait at ./condition.jl:106
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
_wait at ./task.jl:238
sync_end at ./task.jl:278
macro expansion at ./task.jl:319 [inlined]
macro expansion at /home/brian/AlphaZero.jl/src/mcts.jl:427 [inlined]
macro expansion at ./util.jl:212 [inlined]
explore_async! at /home/brian/AlphaZero.jl/src/mcts.jl:426
explore! at /home/brian/AlphaZero.jl/src/mcts.jl:452 [inlined]
think at /home/brian/AlphaZero.jl/src/play.jl:176 [inlined]
#play_game#90 at /home/brian/AlphaZero.jl/src/play.jl:246
#play_game at ./none:0 [inlined]
#pit#93 at /home/brian/AlphaZero.jl/src/play.jl:296
unknown function (ip: 0x7efca1f99dd9)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
#pit at ./none:0
unknown function (ip: 0x7efca1f99a4a)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
macro expansion at /home/brian/AlphaZero.jl/src/benchmark.jl:111 [inlined]
macro expansion at ./util.jl:288 [inlined]
run at /home/brian/AlphaZero.jl/src/benchmark.jl:110
run_duel at /home/brian/AlphaZero.jl/src/ui/session.jl:252
run_benchmark at /home/brian/AlphaZero.jl/src/ui/session.jl:275
zeroth_iteration! at /home/brian/AlphaZero.jl/src/ui/session.jl:285
#Session#126 at /home/brian/AlphaZero.jl/src/ui/session.jl:356
Type at ./none:0
unknown function (ip: 0x7efca1f42f79)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1631 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:328
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:417
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:368 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:778
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:888
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7efcbc3d6c0f)
unknown function (ip: 0x7)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:897
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:814
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:873
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:878
include at ./boot.jl:328 [inlined]
include_relative at ./loading.jl:1105
include at ./Base.jl:31
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
exec_options at ./client.jl:287
_start at ./client.jl:460
jfptr__start_2084.clone_1 at /opt/julia-1.3.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
unknown function (ip: 0x401931)
unknown function (ip: 0x401533)
__libc_start_main at /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
unknown function (ip: 0x4015d4)
unknown function (ip: 0xffffffffffffffff)
Allocations: 159067857 (Pool: 159028147; Big: 39710); GC: 99
CuArrays.jl SplittingPool statistics:
Right now, a lot of the code uses custom logging solutions.
We should use Base.Logging.jl and ProgressLogging.jl instead. This would make the code simpler and allow custom logging backends other than ANSI terminals (e.g. JuliaHub).
Hallo Jonathan,
thanks for this great project.
I would like to dive deeper into it, but I have a problem with the Pkg...
julia> import Pkg; Pkg.add("AlphaZero")
Updating registry at C:\Users\H\.juliapro\JuliaPro_v1.4.2-1\registries\JuliaPro
ERROR: The following package names could not be resolved:
Do you have any helping advice?
Question from a beginner: I am wondering how to get a trained player post research phase with AlphaZero.jl to be used for inference in production phase, through a javascript web application that end users would run in their browser?
Is there an equivalent of importing a Keras network into TensorFlow.js that could leverage a Knet or Flux network trained with AlphaZero.jl?
Thanks!
I am currently implementing a board game called tak. In this game, it is possible to move stones around, so it is possible to move a stone back and forth. Theoretically it is possible to reach a terminal state, at least a draw, from every state when choosing the right actions. Practically, the MCTS decides to loop infinitely, resulting in:
Initializing a new AlphaZero environment
Initial report
Number of network parameters: 159,457
Number of regularized network parameters: 156,736
Memory footprint per MCTS node: 24056 bytes
Running benchmark: AlphaZero against MCTS (1000 rollouts)
StackOverflowError:StackOverflowError:
Stacktrace:
[1] check_win(board::Array{Union{Nothing, Tuple{Main.tak.TakEnv.Stone, Main.tak.TakEnv.Player}}, 3}, active_player::Main.tak.TakEnv.Player)
@ Main.tak.TakEnv ~/Programming/tak/src/TakEnv.jl:622
[2] play!(g::Main.tak.TakInterface.TakGame, action_idx::Int64)
@ Main.tak.TakInterface ~/Programming/tak/src/TakInterface.jl:81
[3] run_simulation!(env::AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}, game::Main.tak.TakInterface.TakGame; η::Vector{Float64}, root::Bool)
@ AlphaZero.MCTS ~/.julia/packages/AlphaZero/eAGva/src/mcts.jl:214
[4] run_simulation!(env::AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}, game::Main.tak.TakInterface.TakGame; η::Vector{Float64}, root::Bool) (repeats 11808 times)
@ AlphaZero.MCTS ~/.julia/packages/AlphaZero/eAGva/src/mcts.jl:218
[5] explore!(env::AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}, game::Main.tak.TakInterface.TakGame, nsims::Int64)
@ AlphaZero.MCTS ~/.julia/packages/AlphaZero/eAGva/src/mcts.jl:243
[6] think(p::MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}}, game::Main.tak.TakInterface.TakGame)
@ AlphaZero ~/.julia/packages/AlphaZero/eAGva/src/play.jl:198
[7] think
@ ~/.julia/packages/AlphaZero/eAGva/src/play.jl:259 [inlined]
[8] play_game(gspec::Main.tak.TakInterface.TakSpec, player::TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.Batchifier.BatchedOracle{AlphaZero.Batchifier.var"#6#7"}}}, MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}}}; flip_probability::Float64)
@ AlphaZero ~/.julia/packages/AlphaZero/eAGva/src/play.jl:308
[9] (::AlphaZero.var"#simulate_game#70"{TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.Batchifier.BatchedOracle{AlphaZero.Batchifier.var"#6#7"}}}, MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}}}, AlphaZero.Benchmark.var"#5#9"{ProgressMeter.Progress}, Simulator{AlphaZero.Benchmark.var"#4#8"{Env{Main.tak.TakInterface.TakSpec, SimpleNet, Tuple{BitVector, Main.tak.TakEnv.Player}}, AlphaZero.Benchmark.Duel}, AlphaZero.Benchmark.var"#net#6"{Env{Main.tak.TakInterface.TakSpec, SimpleNet, Tuple{BitVector, Main.tak.TakEnv.Player}}, AlphaZero.Benchmark.Duel}, typeof(record_trace)}, Main.tak.TakInterface.TakSpec, SimParams})(sim_id::Int64)
@ AlphaZero ~/.julia/packages/AlphaZero/eAGva/src/simulations.jl:232
[10] macro expansion
@ ~/.julia/packages/AlphaZero/eAGva/src/util.jl:187 [inlined]
[11] (::AlphaZero.Util.var"#9#10"{AlphaZero.var"#68#69"{AlphaZero.Benchmark.var"#5#9"{ProgressMeter.Progress}, Simulator{AlphaZero.Benchmark.var"#4#8"{Env{Main.tak.TakInterface.TakSpec, SimpleNet, Tuple{BitVector, Main.tak.TakEnv.Player}}, AlphaZero.Benchmark.Duel}, AlphaZero.Benchmark.var"#net#6"{Env{Main.tak.TakInterface.TakSpec, SimpleNet, Tuple{BitVector, Main.tak.TakEnv.Player}}, AlphaZero.Benchmark.Duel}, typeof(record_trace)}, Main.tak.TakInterface.TakSpec, SimParams, AlphaZero.var"#48#49"{Channel{Any}}, AlphaZero.var"#make#65"{Channel{Any}}}, UnitRange{Int64}, typeof(vcat), ReentrantLock})()
@ AlphaZero.Util ~/.julia/packages/ThreadPools/P1NVV/src/macros.jl:259
The part of the stack trace that is above [4] (which is in my implementation) varies, the cause is likely run_simulation which ends up in an infinite recursion. From my understanding, UCT should place a lower weight on states visited a lot of times and thus should, by exclusion, end up performing actions that bring it to new states at some point and thus to a terminal state. As the game depth in a normal game is roughly 100 moves, after 11k recursions this mechanism should have kicked in. If I prohibit movement actions altogether, training works fine. I am not sure how I should approach this problem, does anyone have experience with this?
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
I upgraded my NVIDIA drivers to 455.28 and I use CUDA.jl v2.0.1.
Running the connect-four example on AlphaZero.jl master with JULIA_CUDA_VERSION 11.1, I get the following error message:
ERROR: LoadError: LoadError: InitError: CUDA.jl does not yet support CUDA with nvdisasm 11.1.74; please file an issue.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] parse_toolkit_version(::String, ::String) at /home/martijn/.julia/packages/CUDA/dZvbp/deps/discovery.jl:348
[3] use_local_cuda() at /home/martijn/.julia/packages/CUDA/dZvbp/deps/bindeps.jl:196
[4] __init_dependencies__() at /home/martijn/.julia/packages/CUDA/dZvbp/deps/bindeps.jl:359
[5] __runtime_init__() at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:110
[6] (::CUDA.var"#609#610"{Bool})() at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:32
[7] lock(::CUDA.var"#609#610"{Bool}, ::ReentrantLock) at ./lock.jl:161
[8] _functional(::Bool) at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:26
[9] functional(::Bool) at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:19
[10] functional at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:18 [inlined]
[11] __init__() at /home/martijn/.julia/packages/Knet/8aEsn/src/Knet.jl:26
[12] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:697
[13] _require_from_serialized(::String) at ./loading.jl:749
[14] _require(::Base.PkgId) at ./loading.jl:1040
[15] require(::Base.PkgId) at ./loading.jl:928
[16] require(::Module, ::Symbol) at ./loading.jl:923
[17] include(::Function, ::Module, ::String) at ./Base.jl:380
[18] include at ./Base.jl:368 [inlined]
[19] include(::String) at /home/martijn/AlphaZero.jl/src/AlphaZero.jl:6
[20] top-level scope at /home/martijn/AlphaZero.jl/src/AlphaZero.jl:71
[21] include(::Function, ::Module, ::String) at ./Base.jl:380
[22] include(::Module, ::String) at ./Base.jl:368
[23] top-level scope at none:2
[24] eval at ./boot.jl:331 [inlined]
[25] eval(::Expr) at ./client.jl:467
[26] top-level scope at ./none:3
during initialization of module Knet
in expression starting at /home/martijn/AlphaZero.jl/src/networks/knet.jl:13
in expression starting at /home/martijn/AlphaZero.jl/src/AlphaZero.jl:71
ERROR: LoadError: Failed to precompile AlphaZero [8ed9eb0b-7496-408d-8c8b-2119aeea02cd] to /home/martijn/.julia/compiled/v1.5/AlphaZero/zTkjo_5lnvn.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
[3] _require(::Base.PkgId) at ./loading.jl:1030
[4] require(::Base.PkgId) at ./loading.jl:928
[5] require(::Module, ::Symbol) at ./loading.jl:923
[6] include(::Function, ::Module, ::String) at ./Base.jl:380
[7] include(::Module, ::String) at ./Base.jl:368
[8] exec_options(::Base.JLOptions) at ./client.jl:296
[9] _start() at ./client.jl:506
in expression starting at /home/martijn/AlphaZero.jl/scripts/alphazero.jl:17
Is this an AlphaZero.jl related issue? Is it best to wait for an AlphaZero.jl update?
Any help is appreciated. Thanks.
(Reason for updating to 455.28 and CUDA.jl v2.0.1: #21 (comment) and JuliaGPU/CUDA.jl#447 (comment) )
Please note that it is not necessary to have multiple distributed workers to exploit several CPU cores (every worker spawns several threads anyway).
Originally posted by @jonathan-laurent in #18 (comment)
I can see on my platform monitoring that during training, only 1 vCPU out of 8 is used. Apart from setting num_workers
parameters in params.jl
(which I left at the default value of 128
for all occurrences), is there something to be done in order to effectively use multiple cores?
[removed]
(I was asking how to add a chess implementation, then I found the guidelines about adding a new game)
I first tried the instructions in the README, but they fail due to the hardcoded paths in the Manifest.
The package ships a Manifest as well as a Project.toml. I think it would be sufficient to ship just the Project.toml. In fact I had to delete the Manifest to get it to correctly dev
. The Manifest has some hardcoded paths on the author's computer.
and another issue occured. Wrong CUDA Version?
UndefVarError: lib not defined
Stacktrace:
[1] maximum(::CUDA.CuArray{Float32,2}; dims::Int64) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\Knet\exwCE\src\cuarrays\reduction.jl:56
[2] softmax(::CUDA.CuArray{Float32,2}; dims::Int64) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\NNlib\PI8Xh\src\softmax.jl:29
[3] softmax(::CUDA.CuArray{Float32,2}) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\NNlib\PI8Xh\src\softmax.jl:29
[4] applychain(::Tuple{typeof(NNlib.softmax)}, ::CUDA.CuArray{Float32,2}) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\Flux\IjMZL\src\layers\basic.jl:36 (repeats 5 times)
[5] (::Flux.Chain{Tuple{Flux.Conv{2,4,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},typeof(Flux.flatten),Flux.Dense{typeof(identity),CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1}},typeof(NNlib.softmax)}})(::CUDA.CuArray{Float32,4}) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\Flux\IjMZL\src\layers\basic.jl:38
[6] forward(::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}) at D:\test2\src\networks\flux.jl:163
[7] evaluate(::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}) at D:\test2\src\networks\network.jl:253
[8] evaluate_batch(::AlphaZero.FluxLib.ResNet{Game}, ::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}) at D:\test2\src\networks\network.jl:283
[9] fill_and_evaluate(::AlphaZero.FluxLib.ResNet{Game}, ::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}; batch_size::Int64, fill::Bool) at D:\test2\src\play.jl:346
[10] (::AlphaZero.var"#101#102"{Bool,AlphaZero.FluxLib.ResNet{Game},Int64})(::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}) at D:\test2\src\play.jl:388
[11] macro expansion at D:\test2\src\batchifier.jl:47 [inlined]
[12] macro expansion at D:\test2\src\util.jl:56 [inlined]
[13] (::AlphaZero.Batchifier.var"#1#3"{AlphaZero.var"#101#102"{Bool,AlphaZero.FluxLib.ResNet{Game},Int64},Int64,Channel{Any}})() at .\threadingconstructs.jl:126
Hello! I'm new to DL/RL and excited to try AlphaZero for my project, which basically is training one unique agent for a new game.
However, following your instructions to start and train a ConnectFour agent right after installation, I get the following errors -apparently not blocking since the training is actually running, but quite slowly so I'm not sure whether the GPU is used or not.
Should I do something to update CUDAnative?
Problem -it seems that CUDAnative is deprecated: https://github.com/JuliaGPU/CUDAnative.jl
I can't figure out if there's a real problem here of AlphaZero manages to work as expected despite these precompilation errors.
Can you enlighten me? Thank you!
(base) ubuntu@bonbonrectangle-dev:~/AlphaZero.jl$ julia --project --color=yes scripts/alphazero.jl --gaee connect-four train
**ERROR: LoadError: LoadError: LoadError: UndefVarError: AddrSpacePtr not defined**
Stacktrace:
[1] getproperty(::Module, ::Symbol) at ./Base.jl:26
[2] top-level scope at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/device/cuda/wmma.jl:56
[3] include(::Function, ::Module, ::String) at ./Base.jl:380
[4] include at ./Base.jl:368 [inlined]
[5] include(::String) at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/CUDAnative.jl:1
[6] top-level scope at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/device/cuda.jl:14
[7] include(::Function, ::Module, ::String) at ./Base.jl:380
[8] include at ./Base.jl:368 [inlined]
[9] include(::String) at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/CUDAnative.jl:1
[10] top-level scope at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/CUDAnative.jl:70
[11] include(::Function, ::Module, ::String) at ./Base.jl:380
[12] include(::Module, ::String) at ./Base.jl:368
[13] top-level scope at none:2
[14] eval at ./boot.jl:331 [inlined]
[15] eval(::Expr) at ./client.jl:467
[16] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/device/cuda/wmma.jl:55
in expression starting at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/device/cuda.jl:14
in expression starting at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/CUDAnative.jl:70
**ERROR: LoadError: Failed to precompile CUDAnative [be33ccc6-a3ff-5ff2-a52e-74243cff1e17] to /home/ubuntu/.julia/compiled/v1.5/CUDAnative/4Zu2W_yJnFE.ji.**
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
[3] _require(::Base.PkgId) at ./loading.jl:1030
[4] require(::Base.PkgId) at ./loading.jl:928
[5] require(::Module, ::Symbol) at ./loading.jl:923
[6] include(::Function, ::Module, ::String) at ./Base.jl:380
[7] include(::Module, ::String) at ./Base.jl:368
[8] top-level scope at none:2
[9] eval at ./boot.jl:331 [inlined]
[10] eval(::Expr) at ./client.jl:467
[11] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CuArrays/YFdj7/src/CuArrays.jl:3
**┌ Warning: CUDA is installed, but CuArrays.jl fails to load
│ exception =
│ Failed to precompile CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae] to /home/ubuntu/.julia/compiled/v1.5/CuArrays/7YFE0_yJnFE.ji.**
│ Stacktrace:
│ [1] error(::String) at ./error.jl:33
│ [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
│ [3] _require(::Base.PkgId) at ./loading.jl:1030
│ [4] require(::Base.PkgId) at ./loading.jl:928
│ [5] require(::Module, ::Symbol) at ./loading.jl:923
│ [6] top-level scope at /home/ubuntu/.julia/packages/Knet/bTNMd/src/cuarray.jl:5
│ [7] include(::Function, ::Module, ::String) at ./Base.jl:380
│ [8] include at ./Base.jl:368 [inlined]
│ [9] include(::String) at /home/ubuntu/.julia/packages/Knet/bTNMd/src/Knet.jl:1
│ [10] top-level scope at /home/ubuntu/.julia/packages/Knet/bTNMd/src/Knet.jl:116
│ [11] include(::Function, ::Module, ::String) at ./Base.jl:380
│ [12] include(::Module, ::String) at ./Base.jl:368
│ [13] top-level scope at none:2
│ [14] eval at ./boot.jl:331 [inlined]
│ [15] eval(::Expr) at ./client.jl:467
│ [16] top-level scope at ./none:3
│ [17] eval(::Module, ::Any) at ./boot.jl:331
│ [18] exec_options(::Base.JLOptions) at ./client.jl:272
│ [19] _start() at ./client.jl:506
└ @ Knet ~/.julia/packages/Knet/bTNMd/src/cuarray.jl:8
If Oracle
has only one function, evaluate
, why not just replace it with a Function
? That way, you'll have to maintain and document less code, potential users will not have to understand Julian object-oriented programming to implement a new heuristic, and you will not have to explain what an Oracle
is. (I have learned this lesson over and over again trying to write similar code for others to use 😃 ).
I noticed yesterday that I cannot get above iteration one. So today I ran training a few times in a row and the checkpoint evaluation after iteration 1 always fails. The strange thing is that it sometimes fails after 10% completion and sometimes after 70%.
I'm trying to come up with a simple way to store each players score inside state.
I can't seem to come up with a good way besides just having both players scores stored inside it i.e:
Current_player_score
Other_player_score
and swapping between them...
Is there a smarter way to do this? I would rather them not be aware of the other players score and besides going via move history I can't seem to figure out a good way to do this.
I could not find v0.4.0 in the rep. Is there a pretrained model(connect4) I could play with.
I tried to train on my custom game but I always get the same assertion error.
ERROR: LoadError: AssertionError: iszero(π[(.~)(symmask)])
Stacktrace:
[1] apply_symmetry(::Type{Game}, ::AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}, ::Tuple{Array{Union{Nothing, Bool},1},Array{Int64,1}}) at C:\Users\dave7895\AlphaZero.jl\src\memory.jl:94
[2] (::AlphaZero.var"#28#31"{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},DataType})(::Tuple{Array{Union{Nothing, Bool},1},Array{Int64,1}}) at .\none:0
[3] iterate at .\generator.jl:47 [inlined]
[4] iterate(::Base.Iterators.Flatten{Base.Generator{Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1},AlphaZero.var"#29#30"{DataType}}}, ::Tuple{Int64,Base.Generator{Array{Tuple{Array{Union{Nothing, Bool},1},Array{Int64,1}},1},AlphaZero.var"#28#31"{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},DataType}},Int64}) at .\iterators.jl:1058
[5] grow_to!(::Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1}, ::Base.Iterators.Flatten{Base.Generator{Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1},AlphaZero.var"#29#30"{DataType}}}, ::Tuple{Int64,Base.Generator{Array{Tuple{Array{Union{Nothing, Bool},1},Array{Int64,1}},1},AlphaZero.var"#28#31"{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},DataType}},Int64}) at .\array.jl:756
[6] grow_to!(::Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1}, ::Base.Iterators.Flatten{Base.Generator{Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1},AlphaZero.var"#29#30"{DataType}}}) at .\array.jl:729
[7] _collect at .\array.jl:639 [inlined]
[8] collect at .\array.jl:603 [inlined]
[9] augment_with_symmetries at C:\Users\dave7895\AlphaZero.jl\src\memory.jl:101 [inlined]
[10] learning_step!(::Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}, ::Session{Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}}) at C:\Users\dave7895\AlphaZero.jl\src\training.jl:158
[11] macro expansion at .\util.jl:308 [inlined]
[12] macro expansion at C:\Users\dave7895\AlphaZero.jl\src\report.jl:231 [inlined]
[13] train!(::Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}, ::Session{Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}}) at C:\Users\dave7895\AlphaZero.jl\src\training.jl:273
[14] resume!(::Session{Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}}) at C:\Users\dave7895\AlphaZero.jl\src\ui\session.jl:452
[15] top-level scope at C:\Users\dave7895\AlphaZero.jl\scripts\alphazero.jl:82
[16] include(::Module, ::String) at .\Base.jl:377
[17] exec_options(::Base.JLOptions) at .\client.jl:288
[18] _start() at .\client.jl:484
in expression starting at C:\Users\dave7895\AlphaZero.jl\scripts\alphazero.jl:81
Hi @jonathan-laurent ,
This project is really awesome!
Since you mentioned it in the doc Develop-support-for-a-more-general-game-interface, I'd like to write down some thoughts and discuss them with you.
Here I'll mainly focus on the Game Interface and MCTS parts. In the meanwhile, the design differences between AlphaZero.jl, ReinforcementLearningBase.jl and OpenSpiel are also listed.
To implement a new game, we have some assumptions according to the Game Interface:
If I understand it correctly, two main concepts are Game and Board.
In OpenSpiel, those two concepts are almost the same (the Board is named state in OpenSpiel), except that the state is not contained in the Game, which means Game is just a static description (history is not contained in game but state).
In RLBase, the Game is treated as an AbstractEnvironment and the Board is just the observation of the env from the aspect of a player.
In this view, most of the interfaces in this package are aligned with those in RLBase. Following are the detailed description:
AbstractGame
-> AbstractEnv
board
-> observe
Action
-> get_action_space
white_playing
-> get_current_player
white_reward
-> get_reward
board_symmetric
-> missing in RLBase. Need to define a new trait to specify whether the state of a game is symmetric or notavailable_actions
-> get_legal_actions
actions_mask
-> get_legal_actions_mask
play!
-> (env::Abstractenv)(action)
heuristic_value
-> missing in RLBase.vectorize_board
-> get_state
symmetries
-> missing in RLBasegame_terminated
-> get_terminal
num_actions
-> length(action_space)
board_dim
-> size(rand(observation_space)
random_symmetric_state
-> missing in RLBaseI think it won't be very difficult to adapt to use OpenSpiel.jl or even to use the interfaces in RLBase.
I really like the implementation of asnyc MCTS in this package. I would like to see it is separated as a standalone package.
The naming of some types is slightly strange to me. For example, there's an Oracle{Game}
abstract type. If I understand it correctly, it is used in the rollout step to select an action. The first time I saw the name of Oracle, I supposed its subtypes must implement some smart algorithms 😆 . But in MCTS it is usually a light-weight method, am I right?
The implementation of Worker
assumes that there are only two players in the game. Do you have any idea how to expand it to apply for multi-players games?
At the first glance, I thought the async MCTS used some kind of root level or tree level parallelization. But I can't find that the multi-threading is used in the code anywhere. It seems that the async part is mainly to collect a batch of states and get the evaluation results once for all. Am I right here? It would be better if you could share some implementation considerations here 😄
Also cc @jbrea 😄
Hello Jonathan,
Could you please explain in a few words what the function test_symmetry(Game, state, (symstate, aperm)) in game.jl tries to do?
I'm quite new to Julia, and I must confess I have a hard time making sure what it does.
Currently it's where test_game fails on my game.
Thank you!
The only reason you may want to spawn several Julia processes on the same machine is to use multiple GPUs.
Originally posted by @jonathan-laurent in #18 (comment)
To make sure I understand well:
So it's possible to use multiple GPUs on the same machine by spawning several Julia processes. But is spawning several Julia processes actually required to use multiple GPUs?
This feature will be officially released with v0.4, right?
Is it possible to skip the benchmark without not implementing any?
I am trying to implement the game of Go (with only limited success...) and I am in the need to test the game frequently. Is there any possibility to skip the initial benchmark when only using play?
The current networks library is based on Knet. There used to be one based on Flux, which could be setup as the default by changing this line:
Line 69 in 64ab68b
The flux backend is currently broken but it should not be hard to fix it as soon as FluxML/Flux.jl#1044 is included in a new release.
Thank you for your hard work. I'm trying to run Connect Four example according to the manual:
git clone --branch v0.2.1 https://github.com/jonathan-laurent/AlphaZero.jl.git
cd AlphaZero.jl
julia --project -e "import Pkg; Pkg.instantiate()"
julia --project --color=yes scripts/alphazero.jl --game connect-four train
Training sequence worked for me. On the other hand, the following command fails which is described in Examining the current agent
section
julia --project --color=yes scripts/alphazero.jl --game connect-four explore
with the error message below:
CuArrays.jl SplittingPool statistics:
- 0 pool allocations: 0 bytes in 0.0s
- 0 CUDA allocations: 0 bytes in 0.0s
CuArrays.jl SplittingPool statistics:
- 0 pool allocations: 0 bytes in 0.0s
- 0 CUDA allocations: 0 bytes in 0.0s
Loading environment
Loading network from: sessions/connect-four/bestnn.data
Loading network from: sessions/connect-four/curnn.data
Loading memory from: sessions/connect-four/mem.data
Loaded iteration counter from: sessions/connect-four/iter.txt
Starting interactive exploration
Red plays:
1 2 3 4 5 6 7
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
ERROR: LoadError: MethodError: no method matching think(::MctsPlayer{Game,AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}}, ::Game, ::Int64)
Closest candidates are:
think(::MctsPlayer, ::Any) at /home/terasaki/tmp/AlphaZero.jl/src/play.jl:214
think(::AbstractPlayer, ::Any) at /home/terasaki/tmp/AlphaZero.jl/src/play.jl:20
think(::RandomPlayer, ::Any) at /home/terasaki/tmp/AlphaZero.jl/src/play.jl:71
...
Stacktrace:
[1] state_statistics(::Game, ::MctsPlayer{Game,AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}}, ::Int64, ::AlphaZero.MemoryBuffer{StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}) at /home/terasaki/tmp/AlphaZero.jl/src/ui/explorer.jl:62
[2] compute_and_print_state_statistics(::Explorer{Game}) at /home/terasaki/tmp/AlphaZero.jl/src/ui/explorer.jl:151
[3] start_explorer(::Explorer{Game}) at /home/terasaki/tmp/AlphaZero.jl/src/ui/explorer.jl:294
[4] start_explorer(::Session{Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}}) at /home/terasaki/tmp/AlphaZero.jl/src/ui/session.jl:440
[5] top-level scope at /home/terasaki/tmp/AlphaZero.jl/scripts/alphazero.jl:77
[6] include(::Module, ::String) at ./Base.jl:377
[7] exec_options(::Base.JLOptions) at ./client.jl:288
[8] _start() at ./client.jl:484
in expression starting at /home/terasaki/tmp/AlphaZero.jl/scripts/alphazero.jl:74
CuArrays.jl SplittingPool statistics:
- 0 pool allocations: 0 bytes in 0.0s
- 0 CUDA allocations: 0 bytes in 0.0s
Here is my environment with GPU 1080Ti:
julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Genuine Intel(R) CPU 0000 @ 2.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, broadwell)
Environment:
JULIA_NUM_THREADS = 40
Any ideas ?
Probably I am misunderstanding the concept of how to implement a game for AlphaZero.jl but I am having a hard time to understand, how i can enumerate possible actions with only the static data of GameSpec.
From what I understand, the function actions(::AbstractGameSpec)
should return all possible actions of the game, while actions_mask(::AbstractGameEnv)
returns a boolean mask indicating which of those actions are legal in the current state.
A game with a very limited amount of possible actions is fairly easy to implement in this setup. But how would I, for example, implement Chess? For Chess the state determines which actions are possible and as there are many possible actions it seams inefficient to enumerate them all and pass a boolean mask indicating which are playable. But than again I am probably misunderstanding something.
Hope someone can get me on track. Would really love to use this package!
Hi Jonathan!
I am a lead contributor to OpenSpiel, a framework for RL in games. OpenSpiel is a general game framework and we have a Julia API thanks to @findmyway. Someone from our team, @michalsustr, pointed me to your project. Looks great!
We have many games implemented and tested I wonder if it would be possible to add support for them in your project? For the next two weeks, we are doing a concentrated effort in adding functionality to OpenSpiel.. might be a good time to look at this if anybody is interested (see google-deepmind/open_spiel#251).
Curious to hear your thoughts on this!
Hello,
I am interested in implementing supervised learning in AlphaZero.jl. Since it's mentioned on the contribution page, I assume it hasn't been implemented yet? Has anyone already thought about this?
I would like to implement the following features:
Does anyone have an idea where I should best start?
Hey - You previously mentioned you were working on this, any eta as to when you think this will be completed?
If not and if I was to attempt to do this myself do you have any pointers or suggestions as to how you would approach this?
I'd like to simplify Minmax and MCTS players move selection by cutting branches as soon as when providing the list of available actions to these players' think()
function. I would need a Game
instance to know its players' types so that it could adapt actions_mask
accordingly. I can't seem to find a way to get this data from a Game
instance nor to store it at game creation.
Maybe because AlphaZero.jl requires strictly that there's no adherence between games and players?
Maybe because I should better use the GameInterface.heuristic_value()
function to indirectly cut branches? But then it would work only for Minmax players, since MCTS players don't use the heuristic.
May I request an advice on how to proceed?
(Don't hesitate to tell me if I'm asking too many questions or if I should do it anywhere else.)
Why I want to do this in the first place is because there are so many possible moves in each state of my game, that even only testing if my implementation is correct takes ages, not considering huge benchmarks times. :-S
It seems this repository is missing a website link.
https://jonathan-laurent.github.io/AlphaZero.jl/stable/ or https://jonathan-laurent.github.io/AlphaZero.jl/dev/ are suitable candidates for that.
When training the connect four agent, the training process crashes with an out of memory error about every 24 hours and must then be restarted.
It would be interesting to see if this happens again after updating the dependencies and/or switching to Flux as a DL backend.
Configuration
Stacktrace
ERROR: LoadError: Out of GPU memory trying to allocate 21.000 MiB
Effective GPU memory usage: 99.73% (7.772 GiB/7.793 GiB)
CuArrays GPU memory usage: 6.767 GiB
SplittingPool usage: 2.207 GiB (2.179 GiB allocated, 28.807 MiB cached)
SplittingPool efficiency: 32.20% (2.179 GiB requested, 6.767 GiB allocated)
Stacktrace:
[1] alloc at /home/jonathan/.julia/packages/CuArrays/rNxse/src/memory.jl:162 [inlined]
[2] CuArrays.CuArray{UInt8,1,P} where P(::UndefInitializer, ::Tuple{Int64}) at /home/jonathan/.julia/packages/CuArrays/rNxse/src/array.jl:90
[3] CuArray at /home/jonathan/.julia/packages/CuArrays/rNxse/src/array.jl:98 [inlined]
[4] CuArray at /home/jonathan/.julia/packages/CuArrays/rNxse/src/array.jl:99 [inlined]
[5] KnetPtrCu(::Int64) at /home/jonathan/.julia/packages/Knet/FSBq5/src/cuarray.jl:90
[6] KnetPtr at /home/jonathan/.julia/packages/Knet/FSBq5/src/kptr.jl:102 [inlined]
[7] KnetArray at /home/jonathan/.julia/packages/Knet/FSBq5/src/karray.jl:82 [inlined]
[8] similar at /home/jonathan/.julia/packages/Knet/FSBq5/src/karray.jl:164 [inlined]
[9] similar at /home/jonathan/.julia/packages/Knet/FSBq5/src/karray.jl:167 [inlined]
[10] broadcasted(::typeof(+), ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,4}) at /home/jonathan/.julia/packages/Knet/FSBq5/src/binary.jl:37
[11] +(::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,4}) at /home/jonathan/.julia/packages/Knet/FSBq5/src/binary.jl:232
[12] forw(::Function, ::AutoGrad.Result{Knet.KnetArray{Float32,4}}, ::Vararg{AutoGrad.Result{Knet.KnetArray{Float32,4}},N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tup$
e{}}}) at /home/jonathan/.julia/packages/AutoGrad/pTNVv/src/core.jl:66
[13] forw at /home/jonathan/.julia/packages/AutoGrad/pTNVv/src/core.jl:65 [inlined]
[14] +(::AutoGrad.Result{Knet.KnetArray{Float32,4}}, ::AutoGrad.Result{Knet.KnetArray{Float32,4}}) at ./none:0
[15] (::AlphaZero.KNets.SkipConnection)(::AutoGrad.Result{Knet.KnetArray{Float32,4}}) at /home/jonathan/AlphaZero.jl/src/networks/knet/layers.jl:104
[16] (::AlphaZero.KNets.Chain)(::AutoGrad.Result{Knet.KnetArray{Float32,4}}) at /home/jonathan/AlphaZero.jl/src/networks/knet/layers.jl:19 (repeats 2 times)
[17] forward(::ResNet{Game}, ::Knet.KnetArray{Float32,4}) at /home/jonathan/AlphaZero.jl/src/networks/knet.jl:148
[18] evaluate(::ResNet{Game}, ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,2}) at /home/jonathan/AlphaZero.jl/src/networks/network.jl:285
[19] losses(::ResNet{Game}, ::LearningParams, ::Float32, ::Float32, ::Tuple{Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,4},Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,2},Knet.KnetArray{Float32$
2}}) at /home/jonathan/AlphaZero.jl/src/learning.jl:62
[20] (::AlphaZero.var"#loss#50"{AlphaZero.Trainer})(::Knet.KnetArray{Float32,2}, ::Vararg{Any,N} where N) at /home/jonathan/AlphaZero.jl/src/learning.jl:102
[21] (::Knet.var"#693#694"{Knet.Minimize{Base.Generator{Array{Tuple{Array{Float32,2},Array{Float32,4},Array{Float32,2},Array{Float32,2},Array{Float32,2}},1},AlphaZero.Util.var"#7#9"{AlphaZero.var"#47#51$
{AlphaZero.Trainer}}}},Tuple{Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,4},Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,2}}})() at /home/jonathan/.julia/packages/AutoG$
ad/pTNVv/src/core.jl:205
[22] differentiate(::Function; o::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jonathan/.julia/packages/AutoGrad/pTNVv/src/core.jl:144
[23] differentiate at /home/jonathan/.julia/packages/AutoGrad/pTNVv/src/core.jl:135 [inlined]
[24] iterate at /home/jonathan/.julia/packages/Knet/FSBq5/src/train.jl:23 [inlined]
[25] iterate at ./iterators.jl:140 [inlined]
[26] iterate at ./iterators.jl:139 [inlined]
[27] train!(::AlphaZero.var"#49#53"{Array{Float32,1}}, ::ResNet{Game}, ::Adam, ::Function, ::Base.Generator{Array{Tuple{Array{Float32,2},Array{Float32,4},Array{Float32,2},Array{Float32,2},Array{Float32,2
}},1},AlphaZero.Util.var"#7#9"{AlphaZero.var"#47#51"{AlphaZero.Trainer}}}) at /home/jonathan/AlphaZero.jl/src/networks/knet.jl:119
[28] training_epoch!(::AlphaZero.Trainer) at /home/jonathan/AlphaZero.jl/src/learning.jl:113
[29] macro expansion at ./util.jl:302 [inlined]
[30] learning!(::Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}, ::Session{Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}}) at /home/jonathan/AlphaZero.jl/src/traini
ng.jl:165
[31] macro expansion at ./util.jl:302 [inlined]
[32] macro expansion at /home/jonathan/AlphaZero.jl/src/report.jl:241 [inlined]
[33] train!(::Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}, ::Session{Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}}) at /home/jonathan/AlphaZero.jl/src/training.
jl:266
[34] resume!(::Session{Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}}) at /home/jonathan/AlphaZero.jl/src/ui/session.jl:384
[35] top-level scope at /home/jonathan/AlphaZero.jl/scripts/alphazero.jl:68
[36] include(::Module, ::String) at ./Base.jl:377
[37] exec_options(::Base.JLOptions) at ./client.jl:288
[38] _start() at ./client.jl:484
Hi Jonathan!
I'm trying to tune AlphaZero.jl hyperparameters recently, and find some problems. With master(commit 91bb698) and nothing changed, I find that self play takes more and more time.
iter1: 49m gpu 33% cpu 300%
iter2: 2h2m gpu 15% cpu 330%
iter3: 7h30m gpu 4% cpu 230%
memory has 54G free.
this is so strange.
Below is my system info:
cpu: Intel(R) Core(TM) i9-10940X CPU @ 3.30GH 14 physical cores 28 threads
memory: 64G
gpu: NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 , RTX2080ti
OS: ubuntu18.04
julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)
julia> Threads.nthreads()
28
I think either cpu or gpu fully utilized is ok, but no matter how I change parameters, I just can't make it. And even worse, iter2 use less gpu than iter1, and iter3 even less.
In order to split game simulations across different threads, we are using a homemade Util.mapreduce
primitive that is a bit complex and unintuitive. It would be better to use something more standard such as tmap
.
Status: @michelangelo21 is looking at this.
First of all, it's very cool to see AlphaZero implemented in Julia! I have always thought that Julia is a really good tool for this type of thing.
This comment is related to #4, but maybe is a slightly different perspective. It would be very cool to also have a version of this that works with MDPs, either using RLBase as @findmyway suggested in #4, or POMDPs.jl (which I and some colleagues work on), or RLInterface.jl. This would need to be a considerably different implementation because the MCTS would be approximating an expectimax tree instead of a minimax tree.
Just thought I should start this issue as a stub in case anyone wants to pick it up, and to point to some MDP definition interfaces in Julia, and to clarify that the game and MDP versions would need to be different. I and my students would definitely use the package if it had support for MDPs.
Hi Guys, I'm getting this error on the master branch. This is right after the self play has finished. Below the error you'll find that Julia sees CUDA and device correctly. But throws a CuDNN error, can you help?
ERROR: LoadError: CUDNNError: CUDNN_STATUS_EXECUTION_FAILED (code 8)
Stacktrace:
[1] throw_api_error(::CUDA.CUDNN.cudnnStatus_t) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/error.jl:19
[2] macro expansion at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/error.jl:30 [inlined]
[3] cudnnBatchNormalizationForwardTraining(::Ptr{Nothing}, ::CUDA.CUDNN.cudnnBatchNormMode_t, ::Base.RefValue{Float32}, ::Base.RefValue{Float32}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,4}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,4}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32, ::CUDA.CuPtr{Nothing}, ::CUDA.CuPtr{Nothing}) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/utils/call.jl:93
[4] cudnnBNForward!(::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32; cache::Nothing, alpha::Int64, beta::Int64, eps::Float32, training::Bool) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/batchnorm.jl:55
[5] #batchnorm#478 at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/batchnorm.jl:26 [inlined]
[6] #adjoint#17 at /home/sdeveshj/.julia/packages/Flux/05b38/src/cuda/cudnn.jl:6 [inlined]
[7] _pullback at /home/sdeveshj/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:53 [inlined]
[8] BatchNorm at /home/sdeveshj/.julia/packages/Flux/05b38/src/cuda/cudnn.jl:3 [inlined] (repeats 2 times)
[9] applychain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:36 [inlined]
[10] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
[11] applychain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:36 [inlined]
[12] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{Flux.Conv{2,2,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
[13] Chain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:38 [inlined]
[14] _pullback(::Zygote.Context, ::Flux.Chain{Tuple{Flux.Conv{2,2,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
[15] forward at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:161 [inlined]
[16] _pullback(::Zygote.Context, ::typeof(AlphaZero.Network.forward), ::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
[17] evaluate at /home/sdeveshj/AlphaZero.jl/src/networks/network.jl:253 [inlined]
[18] _pullback(::Zygote.Context, ::typeof(AlphaZero.Network.evaluate), ::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
[19] losses at /home/sdeveshj/AlphaZero.jl/src/learning.jl:62 [inlined]
[20] _pullback(::Zygote.Context, ::typeof(AlphaZero.losses), ::AlphaZero.FluxLib.ResNet{Game}, ::LearningParams, ::Float32, ::Float32, ::Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
[21] L at /home/sdeveshj/AlphaZero.jl/src/learning.jl:113 [inlined]
[22] _pullback(::Zygote.Context, ::AlphaZero.var"#L#54"{AlphaZero.Trainer}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,2}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
[23] adjoint at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/lib/lib.jl:172 [inlined]
[24] _pullback at /home/sdeveshj/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
[25] #1 at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:83 [inlined]
[26] _pullback(::Zygote.Context, ::AlphaZero.FluxLib.var"#1#2"{AlphaZero.var"#L#54"{AlphaZero.Trainer},Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
[27] pullback(::Function, ::Zygote.Params) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface.jl:172
[28] lossgrads(::Function, ::Zygote.Params) at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:73
[29] train!(::AlphaZero.var"#53#55"{Array{Float32,1}}, ::AlphaZero.FluxLib.ResNet{Game}, ::Adam, ::Function, ::Base.Iterators.Take{Base.Iterators.Stateful{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Repeated{Nothing},AlphaZero.Util.var"#12#13"{AlphaZero.var"#50#52",Tuple{Array{Float32,2},Array{Float32,4},Array{Float32,2},Array{Float32,2},Array{Float32,2}},Int64,Bool}}},Tuple{Any,Tuple{Nothing,Base.Generator{_A,AlphaZero.Util.var"#9#11"{AlphaZero.var"#50#52"}} where _A,Any}}}}, ::Int64) at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:82
[30] batch_updates!(::AlphaZero.Trainer, ::Int64) at /home/sdeveshj/AlphaZero.jl/src/learning.jl:116
[31] macro expansion at ./timing.jl:310 [inlined]
[32] learning_step!(::Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}, ::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/training.jl:185
[33] macro expansion at ./timing.jl:310 [inlined]
[34] macro expansion at /home/sdeveshj/AlphaZero.jl/src/report.jl:229 [inlined]
[35] train!(::Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}, ::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/training.jl:295
[36] resume!(::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/ui/session.jl:383
[37] top-level scope at /home/sdeveshj/AlphaZero.jl/scripts/alphazero.jl:89
[38] include(::Function, ::Module, ::String) at ./Base.jl:380
[39] include(::Module, ::String) at ./Base.jl:368
[40] exec_options(::Base.JLOptions) at ./client.jl:296
[41] _start() at ./client.jl:506
in expression starting at `/home/sdeveshj/AlphaZero.jl/scripts/alphazero.jl:80
julia> Libdl.dlpath("libcuda")
"/usr/lib/x86_64-linux-gnu/libcuda.so"
julia> Libdl.dlpath("libcudnn")
"/usr/lib/cuda/lib64/libcudnn.so"
(@v1.5) pkg> activate .
Activating environment at `~/AlphaZero.jl/Project.toml`
julia> CUDA.device()
CuDevice(0): GeForce RTX 2070 SUPER
julia> has_cuda()
true
julia> CUDA.version()
v"11.1.0"
Hello!
Thanks for great documentation! I was looking for working examples of alphazero for a small game and this repo looks very promising! Unfortunately there is no pretrained model, so I was wandering if you can post one in github releases it would help a lot.
I tried to train it myself, but was unsuccessful. After self play session an error occurred:
LoadError: CUBLASError: the GPU program failed to execute (code 13, CUBLAS_STATUS_EXECUTION_FAILED)
My setup is clean ubuntu 18.04 with Nvidia drivers 450. I cloned master branch, installed dependencies, run training
julia --project -e "import Pkg; Pkg.instantiate()"
julia --project --color=yes scripts/alphazero.jl --game connect-four train
If it is an environment issue, maybe you could recommend some docker image?
AlphaZero crashes when trying to load parameters from a JSON file, as the subtypes of OptimiserSpec
do not implement the subtypekey
field required by JSON3.
As a consequence, when loading a session from disk with the Session
constructor or when using load_env
, it is important to provide params
explicitly.
To replicate
If you already have a valid connect four session in sessions/connect-four
, just run scripts/duel.jl
after replacing this line
Line 22 in c7deb67
params=nothing)
This results in the following stacktrace.
ERROR: LoadError: ArgumentError: invalid json abstract type: didn't find subtypekey
Stacktrace:
[1] #read#49 at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:950 [inlined]
[2] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:888 [inlined]
[3] #readvalue#48 at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:861 [inlined]
[4] readvalue at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:842 [inlined]
[5] #read#47 at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:824 [inlined]
[6] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:805 [inlined]
[7] readvalue(::Base.CodeUnits{UInt8,String}, ::Int64, ::Int64, ::Type{LearningParams}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{
(),Tuple{}}}) at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:861
[8] readvalue at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:842 [inlined]
[9] read(::JSON3.Struct, ::Base.CodeUnits{UInt8,String}, ::Int64, ::Int64, ::UInt8, ::Type{Params}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{}
,NamedTuple{(),Tuple{}}}) at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:824
[10] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:805 [inlined]
[11] read(::String, ::Type{Params}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jonathan/.julia/packages/JSON
3/ItGdr/src/structs.jl:308
[12] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:298 [inlined]
[13] #read#7 at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:294 [inlined]
[14] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:294 [inlined]
[15] #202 at /home/jonathan/AlphaZero.jl/src/ui/session.jl:130 [inlined]
[16] open(::AlphaZero.var"#202#204", ::String, ::Vararg{String,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./io.jl:298
[17] open at ./io.jl:296 [inlined]
[18] load_env(::Type{Game}, ::Type{ResNet{Game}}, ::AlphaZero.Log.Logger, ::String; params::Nothing) at /home/jonathan/AlphaZero.jl/src/ui/session.jl:129
[19] run_duel(::Type{Game}, ::Type{ResNet{Game}}, ::String, ::AlphaZero.Benchmark.Duel; params::Nothing) at /home/jonathan/AlphaZero.jl/src/ui/session.jl:510
[20] top-level scope at /home/jonathan/AlphaZero.jl/scripts/duel.jl:15
[21] include(::String) at ./client.jl:439
[22] top-level scope at none:0
in expression starting at /home/jonathan/AlphaZero.jl/scripts/duel.jl:15
While atempting to utalize AlphaZero for tetris I keep running into this error when running it on the GPU. I have reproduced this error on two separate machines, and happens consistently when launching a checkpoint evaluation. I am wondering if someone has insight into what might be causing this.
Repo:
https://gitlab.com/samdickinson314/tetrisai
include("runner.jl")
Launching a checkpoint evaluation
CUDNNError: CUDNN_STATUS_EXECUTION_FAILED (code 8)
Stacktrace:
[1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
@ CUDA.CUDNN C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\error.jl:22
[2] macro expansion
@ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\error.jl:39 [inlined]
[3] cudnnActivationForward(handle::Ptr{Nothing}, activationDesc::CUDA.CUDNN.cudnnActivationDescriptor, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float32, 4}, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float32, 4})
@ CUDA.CUDNN C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\utils\call.jl:26
[4] #cudnnActivationForwardAD#657
@ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:48 [inlined]
[5] #cudnnActivationForwardWithDefaults#656
@ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:42 [inlined]
[6] #cudnnActivationForward!#653
@ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:22 [inlined]
[7] #35
@ C:\Users\dickisp1\.julia\packages\NNlibCUDA\Oc2CZ\src\cudnn\activations.jl:13 [inlined]
[8] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(NNlib.relu), Tuple{CUDA.CuArray{Float32, 4}}})
@ NNlibCUDA C:\Users\dickisp1\.julia\packages\NNlibCUDA\Oc2CZ\src\cudnn\activations.jl:30
[9] (::Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}})(x::CUDA.CuArray{Float32, 4}, cache::Nothing)
@ Flux.CUDAint C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:9
[10] BatchNorm
@ C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:6 [inlined]
[11] applychain(fs::Tuple{Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}, x::CUDA.CuArray{Float32, 4}) (repeats 2 times)
@ Flux C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:37
[12] (::Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}})(x::CUDA.CuArray{Float32, 4})
@ Flux C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:39
[13] forward(nn::ResNet, state::CUDA.CuArray{Float32, 4})
@ AlphaZero.FluxLib C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\flux.jl:142
[14] forward_normalized(nn::ResNet, state::CUDA.CuArray{Float32, 4}, actions_mask::CUDA.CuArray{Float32, 2})
@ AlphaZero.Network C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\network.jl:264
[15] evaluate_batch(nn::ResNet, batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}})
@ AlphaZero.Network C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\network.jl:312
[16] fill_and_evaluate(net::ResNet, batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}}; batch_size::Int64, fill_batches::Bool)
@ AlphaZero C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\simulations.jl:32
[17] (::AlphaZero.var"#36#37"{Int64, Bool, ResNet})(batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}})
@ AlphaZero C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\simulations.jl:54
[18] macro expansion
@ C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\batchifier.jl:68 [inlined]
[19] macro expansion
@ C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\util.jl:20 [inlined]
[20] (::AlphaZero.Batchifier.var"#2#4"{Int64, AlphaZero.var"#36#37"{Int64, Bool, ResNet}, Channel{Any}})()
@ AlphaZero.Batchifier C:\Users\dickisp1\.julia\packages\ThreadPools\ROFEh\src\macros.jl:261Interrupted by the user
1
It would be really nice if the package could be registered in the General registry. That would make it extremely easy for folks to try out and build on top of.
Currently it seems to me that per MCTS node, a vector of length action encoding is saved, holding Q values, NN probs and edge visits. However for some games this is suboptimal. Let's take tak or the slightly less famous chess.
In chess, you need a vector of length >4k to onehot encode actions. However the branching factor is 31. In Tak on the 5x5 board the encoding lenght is >2k with a branching factor of ca 60. Means only a fraction of the actions for each given state are actually possible to take. With tak this has the consequence that I am swapping quite a bit on my machine. It would save quite some memory if some sparse storage can be used for these games.
Now I see two implementation Options. The one is straightforward, just replace the vector with a sparsevector and it needs almost no code change. This could probably reuse the code and just take an extra parameter in the initialization. The second option is to use a non-sparse vector of length sum(action-mask(state))
. The ith element in the vector corresponds to the ith element in findall(action-mask(state))
. This has a slight computational plus, especially if generating the action mask isn't trivial for a state, but saves even the indexing vector.
What are your thoughts on this? If you are to rewrite the MCTS anyway maybe you could take this into consideration?
Any tips for running this on Azure without paying Julia hubs insane premium?
I'm trying to leverage spot pricing which is about 1/10th-1/20th the cost of juliahubs pricing.
I found this:
https://github.com/microsoft/AzureClusterlessHPC.jl
I'm not entirely sure how exactly Juliahub handles running this code on multiple machines together... Is there a command or something to connect multiple instances together or something built in similar to Ray? Or will this be an incredibly painful process of setting up the code for use with that previous github I linked?
I would like to train an agent on a cloud-based multi-GPU platform and then migrate it to a much cheaper platform used only for inference. The first migration step would be towards a secondary instance of AlphaZero.jl, assuming that the resources needed for testing the quality of the agent may be greatly reduced compared with the resources needed to train it.
I have two related questions:
when running the example training session with the command:
julia --project --color=yes scripts/alphazero.jl --game connect-four train
I get the following error regarding CuDNN:
Using 1 distributed worker(s).
Initializing a new AlphaZero environment
Initial report
Number of network parameters: 1,667,912
Number of regularized network parameters: 1,667,776
Memory footprint per MCTS node: 326 bytes
Running benchmark: AlphaZero against MCTS (1000 rollouts)
AssertionError: This functionality is unavailabe as CUDNN is missing.
Stacktrace:
[1] macro expansion at /home/user/.julia/packages/CUDA/h38pe/deps/bindeps.jl:74 [inlined]
[2] macro expansion at /home/user/.julia/packages/CUDA/h38pe/src/initialization.jl:51 [inlined]
[3] libcudnn at /home/user/.julia/packages/CUDA/h38pe/deps/bindeps.jl:73 [inlined]
[4] (::CUDA.CUDNN.var"#19247#cache_fptr!#9")() at /home/user/.julia/packages/CUDA/h38pe/lib/utils/call.jl:31
[5] macro expansion at /home/user/.julia/packages/CUDA/h38pe/lib/utils/call.jl:39 [inlined]
[6] unsafe_cudnnCreate(::Base.RefValue{Ptr{Nothing}}) at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/libcudnn.jl:39
[7] macro expansion at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/base.jl:6 [inlined]
[8] macro expansion at /home/user/.julia/packages/CUDA/h38pe/src/memory.jl:312 [inlined]
[9] cudnnCreate() at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/base.jl:3
[10] #516 at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/CUDNN.jl:44 [inlined]
[11] get!(::CUDA.CUDNN.var"#516#519"{CUDA.CuContext}, ::IdDict{Any,Any}, ::Any) at ./iddict.jl:152
[12] handle() at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/CUDNN.jl:43
[13] forw(::Function, ::AutoGrad.Param{Knet.KnetArray{Float32,4}}, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Symbol,Tuple{Int64,Int64},Tuple{Symbol},NamedTuple{(:padding,),Tuple{Tuple{Int64,Int64}}}}) at /home/user/.julia/packages/AutoGrad/VFrAv/src/core.jl:66
[14] #conv4#356 at ./none:0 [inlined]
[15] (::AlphaZero.KNets.Conv)(::Knet.KnetArray{Float32,4}) at /home/user/AlphaZero.jl/src/networks/knet/layers.jl:60
[16] (::AlphaZero.KNets.Chain)(::Knet.KnetArray{Float32,4}) at /home/user/AlphaZero.jl/src/networks/knet/layers.jl:19
[17] forward(::ResNet{Game}, ::Knet.KnetArray{Float32,4}) at /home/user/AlphaZero.jl/src/networks/knet.jl:149
[18] evaluate(::ResNet{Game}, ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,2}) at /home/user/AlphaZero.jl/src/networks/network.jl:253
[19] evaluate_batch(::ResNet{Game}, ::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}) at /home/user/AlphaZero.jl/src/networks/network.jl:283
[20] fill_and_evaluate(::ResNet{Game}, ::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}; batch_size::Int64, fill::Bool) at /home/user/AlphaZero.jl/src/play.jl:346
[21] (::AlphaZero.var"#101#102"{Bool,ResNet{Game},Int64})(::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}) at /home/user/AlphaZero.jl/src/play.jl:388
[22] macro expansion at /home/user/AlphaZero.jl/src/batchifier.jl:47 [inlined]
[23] macro expansion at /home/user/AlphaZero.jl/src/util.jl:56 [inlined]
[24] (::AlphaZero.Batchifier.var"#1#3"{AlphaZero.var"#101#102"{Bool,ResNet{Game},Int64},Int64,Channel{Any}})() at ./threadingconstructs.jl:169
The computer is running CentOS (release-7-8), with RTX-2080Ti (cuda version 11)
Regarding the @unimplemented macro, You may want to consider the Not Implemented Exceptions note in this blog: https://white.ucc.asn.au/2020/04/19/Julia-Antipatterns.html
Less code is easier to maintain 😃
Hi Jonathan,
As I try to understand the core of Alphazero.jl, I had a question about the input to the neural network. Looking at src/learning.jl, I believe the neural net receives a batch of input, but the problem is I couldn't figure out what is exact input to the neural network, specifically the part data=(W, X, A, P, V) as training input, maybe you could tell me?
Hello!
Not an issue but a question regarding GameInterface.symmetries (cf the documentation):
symmetries(::Type{G}, state) where {G <: AbstractGame}
Return the vector of all pairs (s, σ) where:
- s is the image of state by a nonidentical symmetry
- σ is the associated actions permutation, as an integer vector of size num_actions(Game).
When applying to the game g
with state state1
and actions mask actions_mask1
the symmetry corresponding to a given (state2, σ)
pair, I suppose AlphaZero.jl affects state2
as the new state of g
and determines the updated actions mask actions_mask2
according to one of the following propositions, but which one?
actions_mask2[action_index] == actions_mask1[σ(action_index)]
actions_mask2[σ(action_index)] == actions_mask1[action_index]
In other words, how is the permutation supposed to be used?
(Does it really matters in the end? ;-)
Tried installing this on Ubuntu 20.04, Julia 1.5.2, and got this error after calling julia --project -e "import Pkg; Pkg.instantiate()"
:
Error: Error building `Knet`:
│ In file included from /usr/local/cuda-10.1/bin/../targets/x86_64-linux/include/cuda_runtime.h:83,
│ from <command-line>:
│ /usr/local/cuda-10.1/bin/../targets/x86_64-linux/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
│ 138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
│ | ^~~~~
│ [ Info: cuda1.jl
│ [ Info: `/usr/local/cuda-10.1/bin/nvcc -O3 --use_fast_math -Wno-deprecated-gpu-targets --compiler-options '-O3 -Wall -fPIC' -c cuda1.cu`
│ ERROR: LoadError: failed process: Process(`/usr/local/cuda-10.1/bin/nvcc -O3 --use_fast_math -Wno-deprecated-gpu-targets --compiler-options '-O3 -Wall -fPIC' -c cuda1.cu`, ProcessExited(1)) [1]
│
│ Stacktrace:
│ [1] pipeline_error at ./process.jl:525 [inlined]
│ [2] run(::Cmd; wait::Bool) at ./process.jl:440
│ [3] run at ./process.jl:438 [inlined]
│ [4] inforun(::Cmd) at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:9
│ [5] build_nvcc() at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:75
│ [6] build() at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:87
│ [7] top-level scope at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:93
│ [8] include(::String) at ./client.jl:457
│ [9] top-level scope at none:5
│ in expression starting at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:93
Hey! I've nearly completed rigging up my game and am excited to begin running it. I had some quick questions that I wanted to run by you. Just some quick sanity checks to ensure my understanding of everything is correct as if I'm wrong these will likely break my training.
State Vectorization
1.a) Why in the examples is this always made to be from whites side?
1.b) When is state vectorization executed? Each turn or at the end of a game?
1.c) Is state vectorization done for both black and white?
1.d) What should be included in state vectorization?
Other connect-four example questions
2.a) Pretty sure I'm correct but to confirm update_status! and update_actions_mask! I've included that logic in my play! function is this fine? or is it required the logic for these be in their own functions?
2.b) In connect-four there's a function GI.clone I assume this is for something external specific to connect-four?
Other
3.a) What exactly is observation? Everything in state in the game env? Or is it everything in state vectorization?
Once again, really appreciate your help as well as other members of the community with their feedback. Thanks a ton for even making this in the first place it's amazing!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.