jonathan-laurent / alphazero.jl Goto Github PK

View Code? Open in Web Editor NEW

1.2K 27.0 134.0 20.45 MB

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.

Home Page: https://jonathan-laurent.github.io/AlphaZero.jl/stable/

License: MIT License

Julia 100.00%

alphazero julia machine-learning deep-learning

alphazero.jl's People

Contributors

Stargazers

Watchers

Forkers

findmyway silk-road gitter-badger carlosal1015 vdt jdc08161063 zeta1999 neveroldmilk simongarisch tchigher hbcbh1999 maleadt lulzx lulzzz murilo bakahr xennygrimmato kokizzu pkumarsinghal kustomzone openaccess rahulghangas intfrr malhotrapulak dantodor shkd13 kirgush n0ndescript nrhodes puneet1747 mbauman orneryhippo studiovc williato hros dhairyalgandhi zas97 nettogrof pedromatos mc-o oqpvc galleon evanlohn nmoran kalmarek jsgonsette arnief3 norci ven-k vishal-subbu kjfunh michaelgreisberger michelangelo21 laplacekorea manzmvp christianlabansky standardgalactic opus-nebula izzortsi unelma beamiter blacksph3re gwario zetimente telesoho tangdaohua playfloor jaysulk magicly casper2002casper dannywinrow yutaizhou chuanqichen ourobouros laurium-labs reidsanders gentr1 yousefazizi1982 carlolucibello roza umbriquse ihph mfkiwl jamesthesnake melkael vonewman senhalil vyshnavik21 hayabrama skypather yushoteke be-code lilithhafner whojo tandychao dave7895 sailingnumbers baedan artaxerces mgierschdev

alphazero.jl's Issues

Connect Four iteration training time is taking a long time

I’ve downloaded the codebase from github (v0.3) and have gone thru the instructions to run the sample, but I am getting iteration times that are substantially longer than what is mentioned in the documentation (8 – 16 hrs / iteration vs 30-50 minutes). I am wondering if I don’t have something set correctly on my system or with the Julia environment. See screen cap below which shows 4% with an ETA of 16 hrs and 37 minutes. System I have is reasonably capable but even on a higher end system (64 gig ram, rtx 2080 super) still getting much slower results.

Thoughts on the problem? Thank you.

Stateful Game-structs throw errors

I've started to play around with AlphaZero.jl over the last few days by implementing Othello and Hex. In the process I've run into similar issues a few times: As soon as I let the Game-struct be more stateful, training throws weird errors (e.g. "MCTS.explore! must be called before MCTS.policy", but also others that were more verbose and even less helpful). I expect that is the reason why these lines of code exists in Connect4:

function Base.copy(g::Game)
  history = isnothing(g.history) ? nothing : copy(g.history)
  Game(g.board, g.curplayer, g.finished, g.winner, copy(g.amask), history)
end

I am, generally speaking, at a loss when it comes to debugging this. I think I want to write a few tests to catch these kinds of errors, but I have no idea where to start. Any help would be appreciated.

UndefVarError: lib not defined when training a connect four agent

Trying to train per instructions in Training a Connect Four Agent section.
Ubuntu 18.04 with RTX 2080ti
At first thought it might be a Julia version issue.
Tried with 1.4.0 and 1.3.1 but both have an error (1.4.0 outputs more warning type info).
Perhaps I'm doing something wrong:

brian@1920x-Ubuntu:~$ julia
_
_ _ ()_ | Documentation: https://docs.julialang.org
() | () () |
_ _ | | __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ ` | |
| | || | | | (| | | Version 1.3.1 (2019-12-30)
/ |_'|||_'_| | Official https://julialang.org/ release
|__/ |

julia>
brian@1920x-Ubuntu:$ git clone https://github.com/jonathan-laurent/AlphaZero.jl.git
Cloning into 'AlphaZero.jl'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (47/47), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 5859 (delta 15), reused 47 (delta 15), pack-reused 5812
Receiving objects: 100% (5859/5859), 8.56 MiB | 12.84 MiB/s, done.
Resolving deltas: 100% (3141/3141), done.
brian@1920x-Ubuntu:$ cd AlphaZero.jl/
brian@1920x-Ubuntu:/AlphaZero.jl$ julia --project -e "import Pkg; Pkg.instantiate()"
Updating registry at ~/.julia/registries/General
Updating git-repo https://github.com/JuliaRegistries/General.git
brian@1920x-Ubuntu:/AlphaZero.jl$ julia --project --color=yes scripts/alphazero.jl --game connect-four train
CuArrays.jl SplittingPool statistics:

0 pool allocations: 0 bytes in 0.0s
0 CUDA allocations: 0 bytes in 0.0s
CuArrays.jl SplittingPool statistics:
0 pool allocations: 0 bytes in 0.0s
0 CUDA allocations: 0 bytes in 0.0s

Initializing a new AlphaZero environment

Initial report

Number of network parameters: 617,480
Number of regularized network parameters: 617,408
Memory footprint per MCTS node: 380 bytes

Running benchmark: AlphaZero against MCTS (1000 rollouts)

UndefVarError: lib not defined
Stacktrace:
[1] broadcasted(::typeof(NNlib.relu), ::Knet.KnetArray{Float32,4}) at /home/brian/.julia/packages/Knet/vxHRi/src/unary.jl:17
[2] (::AlphaZero.KNets.BatchNorm)(::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet/layers.jl:85
[3] (::AlphaZero.KNets.Chain)(::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet/layers.jl:19
[4] forward(::ResNet{Game}, ::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet.jl:148
[5] evaluate(::ResNet{Game}, ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,2}) at /home/brian/AlphaZero.jl/src/networks/network.jl:288
[6] evaluate_batch(::ResNet{Game}, ::Array{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},1}) at /home/brian/AlphaZero.jl/src/networks/network.jl:313
[7] inference_server(::AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}) at ./util.jl:288
[8] macro expansion at /home/brian/AlphaZero.jl/src/util.jl:64 [inlined]
[9] (::AlphaZero.MCTS.var"#21#23"{AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}})() at ./task.jl:333

***************** Hangs here so after ctrl-C

^C
signal (2): Interrupt
in expression starting at /home/brian/AlphaZero.jl/scripts/alphazero.jl:70
epoll_pwait at /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/epoll_pwait.c:42
uv__io_poll at /workspace/srcdir/libuv/src/unix/linux-core.c:270
uv_run at /workspace/srcdir/libuv/src/unix/core.c:359
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:448
poptaskref at ./task.jl:660
wait at ./task.jl:667
wait at ./condition.jl:106
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
_wait at ./task.jl:238
sync_end at ./task.jl:278
macro expansion at ./task.jl:319 [inlined]
macro expansion at /home/brian/AlphaZero.jl/src/mcts.jl:427 [inlined]
macro expansion at ./util.jl:212 [inlined]
explore_async! at /home/brian/AlphaZero.jl/src/mcts.jl:426
explore! at /home/brian/AlphaZero.jl/src/mcts.jl:452 [inlined]
think at /home/brian/AlphaZero.jl/src/play.jl:176 [inlined]
#play_game#90 at /home/brian/AlphaZero.jl/src/play.jl:246
#play_game at ./none:0 [inlined]
#pit#93 at /home/brian/AlphaZero.jl/src/play.jl:296
unknown function (ip: 0x7efca1f99dd9)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
#pit at ./none:0
unknown function (ip: 0x7efca1f99a4a)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
macro expansion at /home/brian/AlphaZero.jl/src/benchmark.jl:111 [inlined]
macro expansion at ./util.jl:288 [inlined]
run at /home/brian/AlphaZero.jl/src/benchmark.jl:110
run_duel at /home/brian/AlphaZero.jl/src/ui/session.jl:252
run_benchmark at /home/brian/AlphaZero.jl/src/ui/session.jl:275
zeroth_iteration! at /home/brian/AlphaZero.jl/src/ui/session.jl:285
#Session#126 at /home/brian/AlphaZero.jl/src/ui/session.jl:356
Type at ./none:0
unknown function (ip: 0x7efca1f42f79)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1631 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:328
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:417
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:368 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:778
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:888
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7efcbc3d6c0f)
unknown function (ip: 0x7)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:897
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:814
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:873
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:878
include at ./boot.jl:328 [inlined]
include_relative at ./loading.jl:1105
include at ./Base.jl:31
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
exec_options at ./client.jl:287
_start at ./client.jl:460
jfptr__start_2084.clone_1 at /opt/julia-1.3.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
unknown function (ip: 0x401931)
unknown function (ip: 0x401533)
__libc_start_main at /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
unknown function (ip: 0x4015d4)
unknown function (ip: 0xffffffffffffffff)
Allocations: 159067857 (Pool: 159028147; Big: 39710); GC: 99
CuArrays.jl SplittingPool statistics:

87 pool allocations: 6.728 MiB in 1.18s
47 CUDA allocations: 4.368 MiB in 0.02s
brian@1920x-Ubuntu:~/AlphaZero.jl$

Use Base.Logging and ProgressLogging

Right now, a lot of the code uses custom logging solutions.

We should use Base.Logging.jl and ProgressLogging.jl instead. This would make the code simpler and allow custom logging backends other than ANSI terminals (e.g. JuliaHub).

using AlphaZero

Hallo Jonathan,
thanks for this great project.
I would like to dive deeper into it, but I have a problem with the Pkg...

julia> import Pkg; Pkg.add("AlphaZero")
Updating registry at C:\Users\H\.juliapro\JuliaPro_v1.4.2-1\registries\JuliaPro
ERROR: The following package names could not be resolved:

AlphaZero (not found in project, manifest or registry)

Do you have any helping advice?

Embed trained network in javascript web app for browser-based inference?

Question from a beginner: I am wondering how to get a trained player post research phase with AlphaZero.jl to be used for inference in production phase, through a javascript web application that end users would run in their browser?
Is there an equivalent of importing a Keras network into TensorFlow.js that could leverage a Knet or Flux network trained with AlphaZero.jl?
Thanks!

StackOverflowError on cyclic state graph game

I am currently implementing a board game called tak. In this game, it is possible to move stones around, so it is possible to move a stone back and forth. Theoretically it is possible to reach a terminal state, at least a draw, from every state when choosing the right actions. Practically, the MCTS decides to loop infinitely, resulting in:

Initializing a new AlphaZero environment

  Initial report
  
    Number of network parameters: 159,457
    Number of regularized network parameters: 156,736
    Memory footprint per MCTS node: 24056 bytes
  
  Running benchmark: AlphaZero against MCTS (1000 rollouts)
  
StackOverflowError:StackOverflowError:
Stacktrace:
  [1] check_win(board::Array{Union{Nothing, Tuple{Main.tak.TakEnv.Stone, Main.tak.TakEnv.Player}}, 3}, active_player::Main.tak.TakEnv.Player)
    @ Main.tak.TakEnv ~/Programming/tak/src/TakEnv.jl:622
  [2] play!(g::Main.tak.TakInterface.TakGame, action_idx::Int64)
    @ Main.tak.TakInterface ~/Programming/tak/src/TakInterface.jl:81
  [3] run_simulation!(env::AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}, game::Main.tak.TakInterface.TakGame; η::Vector{Float64}, root::Bool)
    @ AlphaZero.MCTS ~/.julia/packages/AlphaZero/eAGva/src/mcts.jl:214
  [4] run_simulation!(env::AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}, game::Main.tak.TakInterface.TakGame; η::Vector{Float64}, root::Bool) (repeats 11808 times)
    @ AlphaZero.MCTS ~/.julia/packages/AlphaZero/eAGva/src/mcts.jl:218
  [5] explore!(env::AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}, game::Main.tak.TakInterface.TakGame, nsims::Int64)
    @ AlphaZero.MCTS ~/.julia/packages/AlphaZero/eAGva/src/mcts.jl:243
  [6] think(p::MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}}, game::Main.tak.TakInterface.TakGame)
    @ AlphaZero ~/.julia/packages/AlphaZero/eAGva/src/play.jl:198
  [7] think
    @ ~/.julia/packages/AlphaZero/eAGva/src/play.jl:259 [inlined]
  [8] play_game(gspec::Main.tak.TakInterface.TakSpec, player::TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.Batchifier.BatchedOracle{AlphaZero.Batchifier.var"#6#7"}}}, MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}}}; flip_probability::Float64)
    @ AlphaZero ~/.julia/packages/AlphaZero/eAGva/src/play.jl:308
  [9] (::AlphaZero.var"#simulate_game#70"{TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.Batchifier.BatchedOracle{AlphaZero.Batchifier.var"#6#7"}}}, MctsPlayer{AlphaZero.MCTS.Env{Tuple{BitVector, Main.tak.TakEnv.Player}, AlphaZero.MCTS.RolloutOracle{Main.tak.TakInterface.TakSpec}}}}, AlphaZero.Benchmark.var"#5#9"{ProgressMeter.Progress}, Simulator{AlphaZero.Benchmark.var"#4#8"{Env{Main.tak.TakInterface.TakSpec, SimpleNet, Tuple{BitVector, Main.tak.TakEnv.Player}}, AlphaZero.Benchmark.Duel}, AlphaZero.Benchmark.var"#net#6"{Env{Main.tak.TakInterface.TakSpec, SimpleNet, Tuple{BitVector, Main.tak.TakEnv.Player}}, AlphaZero.Benchmark.Duel}, typeof(record_trace)}, Main.tak.TakInterface.TakSpec, SimParams})(sim_id::Int64)
    @ AlphaZero ~/.julia/packages/AlphaZero/eAGva/src/simulations.jl:232
 [10] macro expansion
    @ ~/.julia/packages/AlphaZero/eAGva/src/util.jl:187 [inlined]
 [11] (::AlphaZero.Util.var"#9#10"{AlphaZero.var"#68#69"{AlphaZero.Benchmark.var"#5#9"{ProgressMeter.Progress}, Simulator{AlphaZero.Benchmark.var"#4#8"{Env{Main.tak.TakInterface.TakSpec, SimpleNet, Tuple{BitVector, Main.tak.TakEnv.Player}}, AlphaZero.Benchmark.Duel}, AlphaZero.Benchmark.var"#net#6"{Env{Main.tak.TakInterface.TakSpec, SimpleNet, Tuple{BitVector, Main.tak.TakEnv.Player}}, AlphaZero.Benchmark.Duel}, typeof(record_trace)}, Main.tak.TakInterface.TakSpec, SimParams, AlphaZero.var"#48#49"{Channel{Any}}, AlphaZero.var"#make#65"{Channel{Any}}}, UnitRange{Int64}, typeof(vcat), ReentrantLock})()
    @ AlphaZero.Util ~/.julia/packages/ThreadPools/P1NVV/src/macros.jl:259

The part of the stack trace that is above [4] (which is in my implementation) varies, the cause is likely run_simulation which ends up in an infinite recursion. From my understanding, UCT should place a lower weight on states visited a lot of times and thus should, by exclusion, end up performing actions that bring it to new states at some point and thus to a terminal state. As the game depth in a normal game is roughly 100 moves, after 11k recursions this mechanism should have kicked in. If I prohibit movement actions altogether, training works fine. I am not sure how I should approach this problem, does anyone have experience with this?

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

LoadError: InitError: CUDA.jl does not yet support CUDA with nvdisasm 11.1.74;

I upgraded my NVIDIA drivers to 455.28 and I use CUDA.jl v2.0.1.

Running the connect-four example on AlphaZero.jl master with JULIA_CUDA_VERSION 11.1, I get the following error message:

ERROR: LoadError: LoadError: InitError: CUDA.jl does not yet support CUDA with nvdisasm 11.1.74; please file an issue.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] parse_toolkit_version(::String, ::String) at /home/martijn/.julia/packages/CUDA/dZvbp/deps/discovery.jl:348
 [3] use_local_cuda() at /home/martijn/.julia/packages/CUDA/dZvbp/deps/bindeps.jl:196
 [4] __init_dependencies__() at /home/martijn/.julia/packages/CUDA/dZvbp/deps/bindeps.jl:359
 [5] __runtime_init__() at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:110
 [6] (::CUDA.var"#609#610"{Bool})() at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:32
 [7] lock(::CUDA.var"#609#610"{Bool}, ::ReentrantLock) at ./lock.jl:161
 [8] _functional(::Bool) at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:26
 [9] functional(::Bool) at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:19
 [10] functional at /home/martijn/.julia/packages/CUDA/dZvbp/src/initialization.jl:18 [inlined]
 [11] __init__() at /home/martijn/.julia/packages/Knet/8aEsn/src/Knet.jl:26
 [12] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:697
 [13] _require_from_serialized(::String) at ./loading.jl:749
 [14] _require(::Base.PkgId) at ./loading.jl:1040
 [15] require(::Base.PkgId) at ./loading.jl:928
 [16] require(::Module, ::Symbol) at ./loading.jl:923
 [17] include(::Function, ::Module, ::String) at ./Base.jl:380
 [18] include at ./Base.jl:368 [inlined]
 [19] include(::String) at /home/martijn/AlphaZero.jl/src/AlphaZero.jl:6
 [20] top-level scope at /home/martijn/AlphaZero.jl/src/AlphaZero.jl:71
 [21] include(::Function, ::Module, ::String) at ./Base.jl:380
 [22] include(::Module, ::String) at ./Base.jl:368
 [23] top-level scope at none:2
 [24] eval at ./boot.jl:331 [inlined]
 [25] eval(::Expr) at ./client.jl:467
 [26] top-level scope at ./none:3
during initialization of module Knet
in expression starting at /home/martijn/AlphaZero.jl/src/networks/knet.jl:13
in expression starting at /home/martijn/AlphaZero.jl/src/AlphaZero.jl:71
ERROR: LoadError: Failed to precompile AlphaZero [8ed9eb0b-7496-408d-8c8b-2119aeea02cd] to /home/martijn/.julia/compiled/v1.5/AlphaZero/zTkjo_5lnvn.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923
 [6] include(::Function, ::Module, ::String) at ./Base.jl:380
 [7] include(::Module, ::String) at ./Base.jl:368
 [8] exec_options(::Base.JLOptions) at ./client.jl:296
 [9] _start() at ./client.jl:506
in expression starting at /home/martijn/AlphaZero.jl/scripts/alphazero.jl:17

Is this an AlphaZero.jl related issue? Is it best to wait for an AlphaZero.jl update?

Any help is appreciated. Thanks.

(Reason for updating to 455.28 and CUDA.jl v2.0.1: #21 (comment) and JuliaGPU/CUDA.jl#447 (comment) )

Exploit several CPU

Please note that it is not necessary to have multiple distributed workers to exploit several CPU cores (every worker spawns several threads anyway).

Originally posted by @jonathan-laurent in #18 (comment)

I can see on my platform monitoring that during training, only 1 vCPU out of 8 is used. Apart from setting num_workers parameters in params.jl (which I left at the default value of 128 for all occurrences), is there something to be done in order to effectively use multiple cores?

chess AI?

[removed]

(I was asking how to add a chess implementation, then I found the guidelines about adding a new game)

Some issues with installing the package

I first tried the instructions in the README, but they fail due to the hardcoded paths in the Manifest.

The package ships a Manifest as well as a Project.toml. I think it would be sufficient to ship just the Project.toml. In fact I had to delete the Manifest to get it to correctly dev. The Manifest has some hardcoded paths on the author's computer.

UndefVarError: lib not defined

and another issue occured. Wrong CUDA Version?

UndefVarError: lib not defined
Stacktrace:
[1] maximum(::CUDA.CuArray{Float32,2}; dims::Int64) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\Knet\exwCE\src\cuarrays\reduction.jl:56
[2] softmax(::CUDA.CuArray{Float32,2}; dims::Int64) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\NNlib\PI8Xh\src\softmax.jl:29
[3] softmax(::CUDA.CuArray{Float32,2}) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\NNlib\PI8Xh\src\softmax.jl:29
[4] applychain(::Tuple{typeof(NNlib.softmax)}, ::CUDA.CuArray{Float32,2}) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\Flux\IjMZL\src\layers\basic.jl:36 (repeats 5 times)
[5] (::Flux.Chain{Tuple{Flux.Conv{2,4,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},typeof(Flux.flatten),Flux.Dense{typeof(identity),CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1}},typeof(NNlib.softmax)}})(::CUDA.CuArray{Float32,4}) at C:\Users\Hieros.juliapro\JuliaPro_v1.4.2-1\packages\Flux\IjMZL\src\layers\basic.jl:38
[6] forward(::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}) at D:\test2\src\networks\flux.jl:163
[7] evaluate(::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}) at D:\test2\src\networks\network.jl:253
[8] evaluate_batch(::AlphaZero.FluxLib.ResNet{Game}, ::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}) at D:\test2\src\networks\network.jl:283
[9] fill_and_evaluate(::AlphaZero.FluxLib.ResNet{Game}, ::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}; batch_size::Int64, fill::Bool) at D:\test2\src\play.jl:346
[10] (::AlphaZero.var"#101#102"{Bool,AlphaZero.FluxLib.ResNet{Game},Int64})(::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}) at D:\test2\src\play.jl:388
[11] macro expansion at D:\test2\src\batchifier.jl:47 [inlined]
[12] macro expansion at D:\test2\src\util.jl:56 [inlined]
[13] (::AlphaZero.Batchifier.var"#1#3"{AlphaZero.var"#101#102"{Bool,AlphaZero.FluxLib.ResNet{Game},Int64},Int64,Channel{Any}})() at .\threadingconstructs.jl:126

CUDA vs CUDAnative?

Hello! I'm new to DL/RL and excited to try AlphaZero for my project, which basically is training one unique agent for a new game.
However, following your instructions to start and train a ConnectFour agent right after installation, I get the following errors -apparently not blocking since the training is actually running, but quite slowly so I'm not sure whether the GPU is used or not.

Should I do something to update CUDAnative?
Problem -it seems that CUDAnative is deprecated: https://github.com/JuliaGPU/CUDAnative.jl
I can't figure out if there's a real problem here of AlphaZero manages to work as expected despite these precompilation errors.

Can you enlighten me? Thank you!

(base) ubuntu@bonbonrectangle-dev:~/AlphaZero.jl$ julia --project --color=yes scripts/alphazero.jl --gaee connect-four train
**ERROR: LoadError: LoadError: LoadError: UndefVarError: AddrSpacePtr not defined**
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:26
 [2] top-level scope at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/device/cuda/wmma.jl:56
 [3] include(::Function, ::Module, ::String) at ./Base.jl:380
 [4] include at ./Base.jl:368 [inlined]
 [5] include(::String) at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/CUDAnative.jl:1
 [6] top-level scope at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/device/cuda.jl:14
 [7] include(::Function, ::Module, ::String) at ./Base.jl:380
 [8] include at ./Base.jl:368 [inlined]
 [9] include(::String) at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/CUDAnative.jl:1
 [10] top-level scope at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/CUDAnative.jl:70
 [11] include(::Function, ::Module, ::String) at ./Base.jl:380
 [12] include(::Module, ::String) at ./Base.jl:368
 [13] top-level scope at none:2
 [14] eval at ./boot.jl:331 [inlined]
 [15] eval(::Expr) at ./client.jl:467
 [16] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/device/cuda/wmma.jl:55
in expression starting at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/device/cuda.jl:14
in expression starting at /home/ubuntu/.julia/packages/CUDAnative/ierw8/src/CUDAnative.jl:70
**ERROR: LoadError: Failed to precompile CUDAnative [be33ccc6-a3ff-5ff2-a52e-74243cff1e17] to /home/ubuntu/.julia/compiled/v1.5/CUDAnative/4Zu2W_yJnFE.ji.**
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923
 [6] include(::Function, ::Module, ::String) at ./Base.jl:380
 [7] include(::Module, ::String) at ./Base.jl:368
 [8] top-level scope at none:2
 [9] eval at ./boot.jl:331 [inlined]
 [10] eval(::Expr) at ./client.jl:467
 [11] top-level scope at ./none:3
in expression starting at /home/ubuntu/.julia/packages/CuArrays/YFdj7/src/CuArrays.jl:3
**┌ Warning: CUDA is installed, but CuArrays.jl fails to load
│   exception =
│    Failed to precompile CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae] to /home/ubuntu/.julia/compiled/v1.5/CuArrays/7YFE0_yJnFE.ji.**
│    Stacktrace:
│     [1] error(::String) at ./error.jl:33
│     [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
│     [3] _require(::Base.PkgId) at ./loading.jl:1030
│     [4] require(::Base.PkgId) at ./loading.jl:928
│     [5] require(::Module, ::Symbol) at ./loading.jl:923
│     [6] top-level scope at /home/ubuntu/.julia/packages/Knet/bTNMd/src/cuarray.jl:5
│     [7] include(::Function, ::Module, ::String) at ./Base.jl:380
│     [8] include at ./Base.jl:368 [inlined]
│     [9] include(::String) at /home/ubuntu/.julia/packages/Knet/bTNMd/src/Knet.jl:1
│     [10] top-level scope at /home/ubuntu/.julia/packages/Knet/bTNMd/src/Knet.jl:116
│     [11] include(::Function, ::Module, ::String) at ./Base.jl:380
│     [12] include(::Module, ::String) at ./Base.jl:368
│     [13] top-level scope at none:2
│     [14] eval at ./boot.jl:331 [inlined]
│     [15] eval(::Expr) at ./client.jl:467
│     [16] top-level scope at ./none:3
│     [17] eval(::Module, ::Any) at ./boot.jl:331
│     [18] exec_options(::Base.JLOptions) at ./client.jl:272
│     [19] _start() at ./client.jl:506
└ @ Knet ~/.julia/packages/Knet/bTNMd/src/cuarray.jl:8

Suggestion: replace Oracle with just a function

If Oracle has only one function, evaluate, why not just replace it with a Function? That way, you'll have to maintain and document less code, potential users will not have to understand Julian object-oriented programming to implement a new heuristic, and you will not have to explain what an Oracle is. (I have learned this lesson over and over again trying to write similar code for others to use 😃 ).

Checkpoint evaluation randomly fails

I noticed yesterday that I cannot get above iteration one. So today I ran training a few times in a row and the checkpoint evaluation after iteration 1 always fails. The strange thing is that it sometimes fails after 10% completion and sometimes after 70%.

Any Tips for per-player tracking?

I'm trying to come up with a simple way to store each players score inside state.
I can't seem to come up with a good way besides just having both players scores stored inside it i.e:
Current_player_score
Other_player_score

and swapping between them...
Is there a smarter way to do this? I would rather them not be aware of the other players score and besides going via move history I can't seem to figure out a good way to do this.

fatal: Remote branch v0.4.0 not found in upstream origin

I could not find v0.4.0 in the rep. Is there a pretrained model(connect4) I could play with.

Assertion error during `apply_symmetry`

I tried to train on my custom game but I always get the same assertion error.

ERROR: LoadError: AssertionError: iszero(π[(.~)(symmask)])
Stacktrace:
 [1] apply_symmetry(::Type{Game}, ::AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}, ::Tuple{Array{Union{Nothing, Bool},1},Array{Int64,1}}) at C:\Users\dave7895\AlphaZero.jl\src\memory.jl:94
 [2] (::AlphaZero.var"#28#31"{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},DataType})(::Tuple{Array{Union{Nothing, Bool},1},Array{Int64,1}}) at .\none:0
 [3] iterate at .\generator.jl:47 [inlined]
 [4] iterate(::Base.Iterators.Flatten{Base.Generator{Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1},AlphaZero.var"#29#30"{DataType}}}, ::Tuple{Int64,Base.Generator{Array{Tuple{Array{Union{Nothing, Bool},1},Array{Int64,1}},1},AlphaZero.var"#28#31"{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},DataType}},Int64}) at .\iterators.jl:1058
 [5] grow_to!(::Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1}, ::Base.Iterators.Flatten{Base.Generator{Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1},AlphaZero.var"#29#30"{DataType}}}, ::Tuple{Int64,Base.Generator{Array{Tuple{Array{Union{Nothing, Bool},1},Array{Int64,1}},1},AlphaZero.var"#28#31"{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},DataType}},Int64}) at .\array.jl:756
 [6] grow_to!(::Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1}, ::Base.Iterators.Flatten{Base.Generator{Array{AlphaZero.TrainingSample{StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}},1},AlphaZero.var"#29#30"{DataType}}}) at .\array.jl:729
 [7] _collect at .\array.jl:639 [inlined]
 [8] collect at .\array.jl:603 [inlined]
 [9] augment_with_symmetries at C:\Users\dave7895\AlphaZero.jl\src\memory.jl:101 [inlined]
 [10] learning_step!(::Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}, ::Session{Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}}) at C:\Users\dave7895\AlphaZero.jl\src\training.jl:158
 [11] macro expansion at .\util.jl:308 [inlined]
 [12] macro expansion at C:\Users\dave7895\AlphaZero.jl\src\report.jl:231 [inlined]
 [13] train!(::Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}, ::Session{Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}}) at C:\Users\dave7895\AlphaZero.jl\src\training.jl:273
 [14] resume!(::Session{Env{Game,SimpleNet{Game},StaticArrays.SArray{Tuple{25},Union{Nothing, Bool},1,25}}}) at C:\Users\dave7895\AlphaZero.jl\src\ui\session.jl:452
 [15] top-level scope at C:\Users\dave7895\AlphaZero.jl\scripts\alphazero.jl:82
 [16] include(::Module, ::String) at .\Base.jl:377
 [17] exec_options(::Base.JLOptions) at .\client.jl:288
 [18] _start() at .\client.jl:484
in expression starting at C:\Users\dave7895\AlphaZero.jl\scripts\alphazero.jl:81

API discussion

Hi @jonathan-laurent ,

This project is really awesome!

Since you mentioned it in the doc Develop-support-for-a-more-general-game-interface, I'd like to write down some thoughts and discuss them with you.

Here I'll mainly focus on the Game Interface and MCTS parts. In the meanwhile, the design differences between AlphaZero.jl, ReinforcementLearningBase.jl and OpenSpiel are also listed.

Game Interface

To implement a new game, we have some assumptions according to the Game Interface:

Zero-sum
Two players
No chance node
Symmetric (optional)

If I understand it correctly, two main concepts are Game and Board.

In OpenSpiel, those two concepts are almost the same (the Board is named state in OpenSpiel), except that the state is not contained in the Game, which means Game is just a static description (history is not contained in game but state).

In RLBase, the Game is treated as an AbstractEnvironment and the Board is just the observation of the env from the aspect of a player.

In this view, most of the interfaces in this package are aligned with those in RLBase. Following are the detailed description:

AbstractGame -> AbstractEnv
board -> observe
Action -> get_action_space
white_playing -> get_current_player
white_reward -> get_reward
board_symmetric -> missing in RLBase. Need to define a new trait to specify whether the state of a game is symmetric or not
available_actions -> get_legal_actions
actions_mask -> get_legal_actions_mask
play! -> (env::Abstractenv)(action)
heuristic_value -> missing in RLBase.
vectorize_board -> get_state
symmetries -> missing in RLBase
game_terminated -> get_terminal
num_actions -> length(action_space)
board_dim -> size(rand(observation_space)
random_symmetric_state -> missing in RLBase

I think it won't be very difficult to adapt to use OpenSpiel.jl or even to use the interfaces in RLBase.

MCTS

I really like the implementation of asnyc MCTS in this package. I would like to see it is separated as a standalone package.

The naming of some types is slightly strange to me. For example, there's an Oracle{Game} abstract type. If I understand it correctly, it is used in the rollout step to select an action. The first time I saw the name of Oracle, I supposed its subtypes must implement some smart algorithms 😆 . But in MCTS it is usually a light-weight method, am I right?

The implementation of Worker assumes that there are only two players in the game. Do you have any idea how to expand it to apply for multi-players games?

At the first glance, I thought the async MCTS used some kind of root level or tree level parallelization. But I can't find that the multi-threading is used in the code anywhere. It seems that the async part is mainly to collect a batch of states and get the evaluation results once for all. Am I right here? It would be better if you could share some implementation considerations here 😄

Also cc @jbrea 😄

Question about function test_symmetry

Hello Jonathan,
Could you please explain in a few words what the function test_symmetry(Game, state, (symstate, aperm)) in game.jl tries to do?
I'm quite new to Julia, and I must confess I have a hard time making sure what it does.
Currently it's where test_game fails on my game.
Thank you!

Speed issues?

I'm currently running into major speed issues.
The progress bar doesn't even appear.

Prints from my game seem to show everything working as intended.
My cpu usage is also quite low, around 8%.

Any tips for tracking down the cause of this?

Exploit multiple GPUs

The only reason you may want to spawn several Julia processes on the same machine is to use multiple GPUs.

Originally posted by @jonathan-laurent in #18 (comment)

To make sure I understand well:
So it's possible to use multiple GPUs on the same machine by spawning several Julia processes. But is spawning several Julia processes actually required to use multiple GPUs?
This feature will be officially released with v0.4, right?

Possibility to skip initial benchmark

Is it possible to skip the benchmark without not implementing any?
I am trying to implement the game of Go (with only limited success...) and I am in the need to test the game frequently. Is there any possibility to skip the initial benchmark when only using play?

The Flux backend is currently broken

The current networks library is based on Knet. There used to be one based on Flux, which could be setup as the default by changing this line:

AlphaZero.jl/src/AlphaZero.jl

Line 69 in 64ab68b

const USE_KNET = true

The flux backend is currently broken but it should not be hard to fix it as soon as FluxML/Flux.jl#1044 is included in a new release.

fail to explore

Thank you for your hard work. I'm trying to run Connect Four example according to the manual:

https://jonathan-laurent.github.io/AlphaZero.jl/dev/tutorial/connect_four/

git clone --branch v0.2.1 https://github.com/jonathan-laurent/AlphaZero.jl.git
cd AlphaZero.jl
julia --project -e "import Pkg; Pkg.instantiate()"
julia --project --color=yes scripts/alphazero.jl --game connect-four train

Training sequence worked for me. On the other hand, the following command fails which is described in Examining the current agent section

julia --project --color=yes scripts/alphazero.jl --game connect-four explore

with the error message below:

CuArrays.jl SplittingPool statistics:
 - 0 pool allocations: 0 bytes in 0.0s
 - 0 CUDA allocations: 0 bytes in 0.0s
CuArrays.jl SplittingPool statistics:
 - 0 pool allocations: 0 bytes in 0.0s
 - 0 CUDA allocations: 0 bytes in 0.0s

Loading environment

  Loading network from: sessions/connect-four/bestnn.data
  Loading network from: sessions/connect-four/curnn.data
  Loading memory from: sessions/connect-four/mem.data
  Loaded iteration counter from: sessions/connect-four/iter.txt

Starting interactive exploration

Red plays:

1 2 3 4 5 6 7
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

ERROR: LoadError: MethodError: no method matching think(::MctsPlayer{Game,AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}}, ::Game, ::Int64)
Closest candidates are:
  think(::MctsPlayer, ::Any) at /home/terasaki/tmp/AlphaZero.jl/src/play.jl:214
  think(::AbstractPlayer, ::Any) at /home/terasaki/tmp/AlphaZero.jl/src/play.jl:20
  think(::RandomPlayer, ::Any) at /home/terasaki/tmp/AlphaZero.jl/src/play.jl:71
  ...
Stacktrace:
 [1] state_statistics(::Game, ::MctsPlayer{Game,AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}}, ::Int64, ::AlphaZero.MemoryBuffer{StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}) at /home/terasaki/tmp/AlphaZero.jl/src/ui/explorer.jl:62
 [2] compute_and_print_state_statistics(::Explorer{Game}) at /home/terasaki/tmp/AlphaZero.jl/src/ui/explorer.jl:151
 [3] start_explorer(::Explorer{Game}) at /home/terasaki/tmp/AlphaZero.jl/src/ui/explorer.jl:294
 [4] start_explorer(::Session{Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}}) at /home/terasaki/tmp/AlphaZero.jl/src/ui/session.jl:440
 [5] top-level scope at /home/terasaki/tmp/AlphaZero.jl/scripts/alphazero.jl:77
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] exec_options(::Base.JLOptions) at ./client.jl:288
 [8] _start() at ./client.jl:484
in expression starting at /home/terasaki/tmp/AlphaZero.jl/scripts/alphazero.jl:74
CuArrays.jl SplittingPool statistics:
 - 0 pool allocations: 0 bytes in 0.0s
 - 0 CUDA allocations: 0 bytes in 0.0s

Here is my environment with GPU 1080Ti:

julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Genuine Intel(R) CPU 0000 @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 40

Any ideas ?

Enumerating actions without state

Probably I am misunderstanding the concept of how to implement a game for AlphaZero.jl but I am having a hard time to understand, how i can enumerate possible actions with only the static data of GameSpec.
From what I understand, the function actions(::AbstractGameSpec) should return all possible actions of the game, while actions_mask(::AbstractGameEnv)returns a boolean mask indicating which of those actions are legal in the current state.
A game with a very limited amount of possible actions is fairly easy to implement in this setup. But how would I, for example, implement Chess? For Chess the state determines which actions are possible and as there are many possible actions it seams inefficient to enumerate them all and pass a boolean mask indicating which are playable. But than again I am probably misunderstanding something.

Hope someone can get me on track. Would really love to use this package!

Support for OpenSpiel games?

Hi Jonathan!

I am a lead contributor to OpenSpiel, a framework for RL in games. OpenSpiel is a general game framework and we have a Julia API thanks to @findmyway. Someone from our team, @michalsustr, pointed me to your project. Looks great!

We have many games implemented and tested I wonder if it would be possible to add support for them in your project? For the next two weeks, we are doing a concentrated effort in adding functionality to OpenSpiel.. might be a good time to look at this if anybody is interested (see google-deepmind/open_spiel#251).

Curious to hear your thoughts on this!

Supervised learning

Hello,

I am interested in implementing supervised learning in AlphaZero.jl. Since it's mentioned on the contribution page, I assume it hasn't been implemented yet? Has anyone already thought about this?

I would like to implement the following features:

Generate examples (I am not sure if this makes sense at all. At this point, there is usually a lack of a good enough agent to generate good examples. One possibility could be to use the already implemented baseline agents (MCTS, Minimax))
Train with examples

Does anyone have an idea where I should best start?

Current status of Multi-threading MCTS Benchmarking?

Hey - You previously mentioned you were working on this, any eta as to when you think this will be completed?

If not and if I was to attempt to do this myself do you have any pointers or suggestions as to how you would approach this?

Can a game know its players' types?

I'd like to simplify Minmax and MCTS players move selection by cutting branches as soon as when providing the list of available actions to these players' think() function. I would need a Game instance to know its players' types so that it could adapt actions_mask accordingly. I can't seem to find a way to get this data from a Game instance nor to store it at game creation.

Maybe because AlphaZero.jl requires strictly that there's no adherence between games and players?
Maybe because I should better use the GameInterface.heuristic_value() function to indirectly cut branches? But then it would work only for Minmax players, since MCTS players don't use the heuristic.

May I request an advice on how to proceed?
(Don't hesitate to tell me if I'm asking too many questions or if I should do it anywhere else.)

Why I want to do this in the first place is because there are so many possible moves in each state of my game, that even only testing if my implementation is correct takes ages, not considering huge benchmarks times. :-S

Missing repository's website

It seems this repository is missing a website link.
https://jonathan-laurent.github.io/AlphaZero.jl/stable/ or https://jonathan-laurent.github.io/AlphaZero.jl/dev/ are suitable candidates for that.

It should be here:

Connect Four training must be restarted about every 24 hours due to an OOM error

When training the connect four agent, the training process crashes with an out of memory error about every 24 hours and must then be restarted.

It would be interesting to see if this happens again after updating the dependencies and/or switching to Flux as a DL backend.

Configuration

GPU: 8GB Nvidia RTX 2070
Julia 1.5.0-DEV.11 (commit 18783434e9)
CUDAapi v2.1.0
CuArrays v1.6.0
Knet v1.3.3 (master)

Stacktrace

ERROR: LoadError: Out of GPU memory trying to allocate 21.000 MiB
Effective GPU memory usage: 99.73% (7.772 GiB/7.793 GiB)
CuArrays GPU memory usage: 6.767 GiB
SplittingPool usage: 2.207 GiB (2.179 GiB allocated, 28.807 MiB cached)
SplittingPool efficiency: 32.20% (2.179 GiB requested, 6.767 GiB allocated)

Stacktrace:                                                                                                                                                                                                 
 [1] alloc at /home/jonathan/.julia/packages/CuArrays/rNxse/src/memory.jl:162 [inlined]                                                                                                                     
 [2] CuArrays.CuArray{UInt8,1,P} where P(::UndefInitializer, ::Tuple{Int64}) at /home/jonathan/.julia/packages/CuArrays/rNxse/src/array.jl:90                                                               
 [3] CuArray at /home/jonathan/.julia/packages/CuArrays/rNxse/src/array.jl:98 [inlined]                                                                                                                     
 [4] CuArray at /home/jonathan/.julia/packages/CuArrays/rNxse/src/array.jl:99 [inlined]                                                                                                                     
 [5] KnetPtrCu(::Int64) at /home/jonathan/.julia/packages/Knet/FSBq5/src/cuarray.jl:90                                                                                                                      
 [6] KnetPtr at /home/jonathan/.julia/packages/Knet/FSBq5/src/kptr.jl:102 [inlined]                                                                                                                         
 [7] KnetArray at /home/jonathan/.julia/packages/Knet/FSBq5/src/karray.jl:82 [inlined]                                                                                                                      
 [8] similar at /home/jonathan/.julia/packages/Knet/FSBq5/src/karray.jl:164 [inlined]                                                                                                                       
 [9] similar at /home/jonathan/.julia/packages/Knet/FSBq5/src/karray.jl:167 [inlined]                                                                                                                       
 [10] broadcasted(::typeof(+), ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,4}) at /home/jonathan/.julia/packages/Knet/FSBq5/src/binary.jl:37                                                      
 [11] +(::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,4}) at /home/jonathan/.julia/packages/Knet/FSBq5/src/binary.jl:232                                                                            
 [12] forw(::Function, ::AutoGrad.Result{Knet.KnetArray{Float32,4}}, ::Vararg{AutoGrad.Result{Knet.KnetArray{Float32,4}},N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tup$
e{}}}) at /home/jonathan/.julia/packages/AutoGrad/pTNVv/src/core.jl:66                                                                                                                                      
 [13] forw at /home/jonathan/.julia/packages/AutoGrad/pTNVv/src/core.jl:65 [inlined]                                                                                                                        
 [14] +(::AutoGrad.Result{Knet.KnetArray{Float32,4}}, ::AutoGrad.Result{Knet.KnetArray{Float32,4}}) at ./none:0                                                                                             
 [15] (::AlphaZero.KNets.SkipConnection)(::AutoGrad.Result{Knet.KnetArray{Float32,4}}) at /home/jonathan/AlphaZero.jl/src/networks/knet/layers.jl:104                                                       
 [16] (::AlphaZero.KNets.Chain)(::AutoGrad.Result{Knet.KnetArray{Float32,4}}) at /home/jonathan/AlphaZero.jl/src/networks/knet/layers.jl:19 (repeats 2 times)                                               
 [17] forward(::ResNet{Game}, ::Knet.KnetArray{Float32,4}) at /home/jonathan/AlphaZero.jl/src/networks/knet.jl:148
 [18] evaluate(::ResNet{Game}, ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,2}) at /home/jonathan/AlphaZero.jl/src/networks/network.jl:285
 [19] losses(::ResNet{Game}, ::LearningParams, ::Float32, ::Float32, ::Tuple{Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,4},Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,2},Knet.KnetArray{Float32$
2}}) at /home/jonathan/AlphaZero.jl/src/learning.jl:62
 [20] (::AlphaZero.var"#loss#50"{AlphaZero.Trainer})(::Knet.KnetArray{Float32,2}, ::Vararg{Any,N} where N) at /home/jonathan/AlphaZero.jl/src/learning.jl:102
 [21] (::Knet.var"#693#694"{Knet.Minimize{Base.Generator{Array{Tuple{Array{Float32,2},Array{Float32,4},Array{Float32,2},Array{Float32,2},Array{Float32,2}},1},AlphaZero.Util.var"#7#9"{AlphaZero.var"#47#51$
{AlphaZero.Trainer}}}},Tuple{Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,4},Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,2},Knet.KnetArray{Float32,2}}})() at /home/jonathan/.julia/packages/AutoG$
ad/pTNVv/src/core.jl:205
 [22] differentiate(::Function; o::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jonathan/.julia/packages/AutoGrad/pTNVv/src/core.jl:144
 [23] differentiate at /home/jonathan/.julia/packages/AutoGrad/pTNVv/src/core.jl:135 [inlined]
 [24] iterate at /home/jonathan/.julia/packages/Knet/FSBq5/src/train.jl:23 [inlined]
 [25] iterate at ./iterators.jl:140 [inlined]
 [26] iterate at ./iterators.jl:139 [inlined]
 [27] train!(::AlphaZero.var"#49#53"{Array{Float32,1}}, ::ResNet{Game}, ::Adam, ::Function, ::Base.Generator{Array{Tuple{Array{Float32,2},Array{Float32,4},Array{Float32,2},Array{Float32,2},Array{Float32,2
}},1},AlphaZero.Util.var"#7#9"{AlphaZero.var"#47#51"{AlphaZero.Trainer}}}) at /home/jonathan/AlphaZero.jl/src/networks/knet.jl:119
 [28] training_epoch!(::AlphaZero.Trainer) at /home/jonathan/AlphaZero.jl/src/learning.jl:113
 [29] macro expansion at ./util.jl:302 [inlined]
 [30] learning!(::Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}, ::Session{Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}}) at /home/jonathan/AlphaZero.jl/src/traini
ng.jl:165
 [31] macro expansion at ./util.jl:302 [inlined]
 [32] macro expansion at /home/jonathan/AlphaZero.jl/src/report.jl:241 [inlined]
 [33] train!(::Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}, ::Session{Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}}) at /home/jonathan/AlphaZero.jl/src/training.
jl:266
 [34] resume!(::Session{Env{Game,ResNet{Game},StaticArrays.SArray{Tuple{7,6},UInt8,2,42}}}) at /home/jonathan/AlphaZero.jl/src/ui/session.jl:384
 [35] top-level scope at /home/jonathan/AlphaZero.jl/scripts/alphazero.jl:68
 [36] include(::Module, ::String) at ./Base.jl:377
 [37] exec_options(::Base.JLOptions) at ./client.jl:288
 [38] _start() at ./client.jl:484

self play takes more and more time

Hi Jonathan!

I'm trying to tune AlphaZero.jl hyperparameters recently, and find some problems. With master(commit 91bb698) and nothing changed, I find that self play takes more and more time.

iter1: 49m gpu 33% cpu 300%
iter2: 2h2m gpu 15% cpu 330%
iter3: 7h30m gpu 4% cpu 230%

memory has 54G free.

this is so strange.

Below is my system info:

cpu: Intel(R) Core(TM) i9-10940X CPU @ 3.30GH 14 physical cores 28 threads
memory: 64G
gpu: NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 , RTX2080ti
OS: ubuntu18.04

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)

julia> Threads.nthreads()
28

I think either cpu or gpu fully utilized is ok, but no matter how I change parameters, I just can't make it. And even worse, iter2 use less gpu than iter1, and iter3 even less.

Deprecate Util.mapreduce in favor of something more standard

In order to split game simulations across different threads, we are using a homemade Util.mapreduce primitive that is a bit complex and unintuitive. It would be better to use something more standard such as tmap.

Status: @michelangelo21 is looking at this.

MDP Version

First of all, it's very cool to see AlphaZero implemented in Julia! I have always thought that Julia is a really good tool for this type of thing.

This comment is related to #4, but maybe is a slightly different perspective. It would be very cool to also have a version of this that works with MDPs, either using RLBase as @findmyway suggested in #4, or POMDPs.jl (which I and some colleagues work on), or RLInterface.jl. This would need to be a considerably different implementation because the MCTS would be approximating an expectimax tree instead of a minimax tree.

Just thought I should start this issue as a stub in case anyone wants to pick it up, and to point to some MDP definition interfaces in Julia, and to clarify that the game and MDP versions would need to be different. I and my students would definitely use the package if it had support for MDPs.

CuDNN error 8 on Ubuntu 18.04, Julia 1.5.2

Hi Guys, I'm getting this error on the master branch. This is right after the self play has finished. Below the error you'll find that Julia sees CUDA and device correctly. But throws a CuDNN error, can you help?

ERROR: LoadError: CUDNNError: CUDNN_STATUS_EXECUTION_FAILED (code 8)
Stacktrace:
 [1] throw_api_error(::CUDA.CUDNN.cudnnStatus_t) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/error.jl:19
 [2] macro expansion at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/error.jl:30 [inlined]
 [3] cudnnBatchNormalizationForwardTraining(::Ptr{Nothing}, ::CUDA.CUDNN.cudnnBatchNormMode_t, ::Base.RefValue{Float32}, ::Base.RefValue{Float32}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,4}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,4}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32, ::CUDA.CuPtr{Nothing}, ::CUDA.CuPtr{Nothing}) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/utils/call.jl:93
 [4] cudnnBNForward!(::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32; cache::Nothing, alpha::Int64, beta::Int64, eps::Float32, training::Bool) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/batchnorm.jl:55
 [5] #batchnorm#478 at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/batchnorm.jl:26 [inlined]
 [6] #adjoint#17 at /home/sdeveshj/.julia/packages/Flux/05b38/src/cuda/cudnn.jl:6 [inlined]
 [7] _pullback at /home/sdeveshj/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:53 [inlined]
 [8] BatchNorm at /home/sdeveshj/.julia/packages/Flux/05b38/src/cuda/cudnn.jl:3 [inlined] (repeats 2 times)
 [9] applychain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:36 [inlined]
 [10] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [11] applychain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:36 [inlined]
 [12] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{Flux.Conv{2,2,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [13] Chain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:38 [inlined]
[14] _pullback(::Zygote.Context, ::Flux.Chain{Tuple{Flux.Conv{2,2,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [15] forward at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:161 [inlined]
 [16] _pullback(::Zygote.Context, ::typeof(AlphaZero.Network.forward), ::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [17] evaluate at /home/sdeveshj/AlphaZero.jl/src/networks/network.jl:253 [inlined]
 [18] _pullback(::Zygote.Context, ::typeof(AlphaZero.Network.evaluate), ::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [19] losses at /home/sdeveshj/AlphaZero.jl/src/learning.jl:62 [inlined]
 [20] _pullback(::Zygote.Context, ::typeof(AlphaZero.losses), ::AlphaZero.FluxLib.ResNet{Game}, ::LearningParams, ::Float32, ::Float32, ::Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [21] L at /home/sdeveshj/AlphaZero.jl/src/learning.jl:113 [inlined]
 [22] _pullback(::Zygote.Context, ::AlphaZero.var"#L#54"{AlphaZero.Trainer}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,2}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [23] adjoint at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/lib/lib.jl:172 [inlined]
 [24] _pullback at /home/sdeveshj/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [25] #1 at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:83 [inlined]
 [26] _pullback(::Zygote.Context, ::AlphaZero.FluxLib.var"#1#2"{AlphaZero.var"#L#54"{AlphaZero.Trainer},Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [27] pullback(::Function, ::Zygote.Params) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface.jl:172
 [28] lossgrads(::Function, ::Zygote.Params) at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:73
 [29] train!(::AlphaZero.var"#53#55"{Array{Float32,1}}, ::AlphaZero.FluxLib.ResNet{Game}, ::Adam, ::Function, ::Base.Iterators.Take{Base.Iterators.Stateful{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Repeated{Nothing},AlphaZero.Util.var"#12#13"{AlphaZero.var"#50#52",Tuple{Array{Float32,2},Array{Float32,4},Array{Float32,2},Array{Float32,2},Array{Float32,2}},Int64,Bool}}},Tuple{Any,Tuple{Nothing,Base.Generator{_A,AlphaZero.Util.var"#9#11"{AlphaZero.var"#50#52"}} where _A,Any}}}}, ::Int64) at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:82
 [30] batch_updates!(::AlphaZero.Trainer, ::Int64) at /home/sdeveshj/AlphaZero.jl/src/learning.jl:116
 [31] macro expansion at ./timing.jl:310 [inlined]
[32] learning_step!(::Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}, ::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/training.jl:185
 [33] macro expansion at ./timing.jl:310 [inlined]
 [34] macro expansion at /home/sdeveshj/AlphaZero.jl/src/report.jl:229 [inlined]
 [35] train!(::Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}, ::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/training.jl:295
 [36] resume!(::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/ui/session.jl:383
 [37] top-level scope at /home/sdeveshj/AlphaZero.jl/scripts/alphazero.jl:89
 [38] include(::Function, ::Module, ::String) at ./Base.jl:380
 [39] include(::Module, ::String) at ./Base.jl:368
 [40] exec_options(::Base.JLOptions) at ./client.jl:296
 [41] _start() at ./client.jl:506
in expression starting at `/home/sdeveshj/AlphaZero.jl/scripts/alphazero.jl:80

julia> Libdl.dlpath("libcuda")
"/usr/lib/x86_64-linux-gnu/libcuda.so"

julia> Libdl.dlpath("libcudnn")
"/usr/lib/cuda/lib64/libcudnn.so"

(@v1.5) pkg> activate .
 Activating environment at `~/AlphaZero.jl/Project.toml`

julia> CUDA.device()
CuDevice(0): GeForce RTX 2070 SUPER

julia> has_cuda()
true

julia> CUDA.version()
v"11.1.0"

LoadError: CUBLASError

Hello!

Thanks for great documentation! I was looking for working examples of alphazero for a small game and this repo looks very promising! Unfortunately there is no pretrained model, so I was wandering if you can post one in github releases it would help a lot.

I tried to train it myself, but was unsuccessful. After self play session an error occurred:
LoadError: CUBLASError: the GPU program failed to execute (code 13, CUBLAS_STATUS_EXECUTION_FAILED)

My setup is clean ubuntu 18.04 with Nvidia drivers 450. I cloned master branch, installed dependencies, run training
julia --project -e "import Pkg; Pkg.instantiate()"
julia --project --color=yes scripts/alphazero.jl --game connect-four train

If it is an environment issue, maybe you could recommend some docker image?

Importation of training parameters from JSON is broken

AlphaZero crashes when trying to load parameters from a JSON file, as the subtypes of OptimiserSpec do not implement the subtypekey field required by JSON3.

As a consequence, when loading a session from disk with the Session constructor or when using load_env, it is important to provide params explicitly.

To replicate

If you already have a valid connect four session in sessions/connect-four, just run scripts/duel.jl after replacing this line

AlphaZero.jl/scripts/duel.jl

Line 22 in c7deb67

params=Training.params)

with

params=nothing)

This results in the following stacktrace.

ERROR: LoadError: ArgumentError: invalid json abstract type: didn't find subtypekey
Stacktrace:
 [1] #read#49 at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:950 [inlined]
 [2] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:888 [inlined]
 [3] #readvalue#48 at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:861 [inlined]
 [4] readvalue at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:842 [inlined]
 [5] #read#47 at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:824 [inlined]
 [6] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:805 [inlined]
 [7] readvalue(::Base.CodeUnits{UInt8,String}, ::Int64, ::Int64, ::Type{LearningParams}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{
(),Tuple{}}}) at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:861
 [8] readvalue at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:842 [inlined]
 [9] read(::JSON3.Struct, ::Base.CodeUnits{UInt8,String}, ::Int64, ::Int64, ::UInt8, ::Type{Params}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{}
,NamedTuple{(),Tuple{}}}) at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:824
 [10] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:805 [inlined]
 [11] read(::String, ::Type{Params}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jonathan/.julia/packages/JSON
3/ItGdr/src/structs.jl:308
 [12] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:298 [inlined]
 [13] #read#7 at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:294 [inlined]
 [14] read at /home/jonathan/.julia/packages/JSON3/ItGdr/src/structs.jl:294 [inlined]
 [15] #202 at /home/jonathan/AlphaZero.jl/src/ui/session.jl:130 [inlined]
 [16] open(::AlphaZero.var"#202#204", ::String, ::Vararg{String,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./io.jl:298
 [17] open at ./io.jl:296 [inlined]
 [18] load_env(::Type{Game}, ::Type{ResNet{Game}}, ::AlphaZero.Log.Logger, ::String; params::Nothing) at /home/jonathan/AlphaZero.jl/src/ui/session.jl:129
 [19] run_duel(::Type{Game}, ::Type{ResNet{Game}}, ::String, ::AlphaZero.Benchmark.Duel; params::Nothing) at /home/jonathan/AlphaZero.jl/src/ui/session.jl:510
 [20] top-level scope at /home/jonathan/AlphaZero.jl/scripts/duel.jl:15
 [21] include(::String) at ./client.jl:439
 [22] top-level scope at none:0
in expression starting at /home/jonathan/AlphaZero.jl/scripts/duel.jl:15

CUDA Error

While atempting to utalize AlphaZero for tetris I keep running into this error when running it on the GPU. I have reproduced this error on two separate machines, and happens consistently when launching a checkpoint evaluation. I am wondering if someone has insight into what might be causing this.

Repo:
https://gitlab.com/samdickinson314/tetrisai
include("runner.jl")

    Launching a checkpoint evaluation

CUDNNError: CUDNN_STATUS_EXECUTION_FAILED (code 8)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\error.jl:22
  [2] macro expansion
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\error.jl:39 [inlined]
  [3] cudnnActivationForward(handle::Ptr{Nothing}, activationDesc::CUDA.CUDNN.cudnnActivationDescriptor, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float32, 4}, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float32, 4})
    @ CUDA.CUDNN C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\utils\call.jl:26
  [4] #cudnnActivationForwardAD#657
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:48 [inlined]
  [5] #cudnnActivationForwardWithDefaults#656
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:42 [inlined]
  [6] #cudnnActivationForward!#653
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:22 [inlined]
  [7] #35
    @ C:\Users\dickisp1\.julia\packages\NNlibCUDA\Oc2CZ\src\cudnn\activations.jl:13 [inlined]
  [8] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(NNlib.relu), Tuple{CUDA.CuArray{Float32, 4}}})
    @ NNlibCUDA C:\Users\dickisp1\.julia\packages\NNlibCUDA\Oc2CZ\src\cudnn\activations.jl:30
  [9] (::Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}})(x::CUDA.CuArray{Float32, 4}, cache::Nothing)
    @ Flux.CUDAint C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:9
 [10] BatchNorm
    @ C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:6 [inlined]
 [11] applychain(fs::Tuple{Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}, x::CUDA.CuArray{Float32, 4}) (repeats 2 times)
    @ Flux C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:37
 [12] (::Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}})(x::CUDA.CuArray{Float32, 4})
    @ Flux C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:39
 [13] forward(nn::ResNet, state::CUDA.CuArray{Float32, 4})
    @ AlphaZero.FluxLib C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\flux.jl:142
 [14] forward_normalized(nn::ResNet, state::CUDA.CuArray{Float32, 4}, actions_mask::CUDA.CuArray{Float32, 2})
    @ AlphaZero.Network C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\network.jl:264
 [15] evaluate_batch(nn::ResNet, batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}})
    @ AlphaZero.Network C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\network.jl:312
 [16] fill_and_evaluate(net::ResNet, batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}}; batch_size::Int64, fill_batches::Bool)
    @ AlphaZero C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\simulations.jl:32
 [17] (::AlphaZero.var"#36#37"{Int64, Bool, ResNet})(batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}})
    @ AlphaZero C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\simulations.jl:54
 [18] macro expansion
    @ C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\batchifier.jl:68 [inlined]
 [19] macro expansion
    @ C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\util.jl:20 [inlined]
 [20] (::AlphaZero.Batchifier.var"#2#4"{Int64, AlphaZero.var"#36#37"{Int64, Bool, ResNet}, Channel{Any}})()
    @ AlphaZero.Batchifier C:\Users\dickisp1\.julia\packages\ThreadPools\ROFEh\src\macros.jl:261Interrupted by the user

1

Register package with General registry

It would be really nice if the package could be registered in the General registry. That would make it extremely easy for folks to try out and build on top of.

MCTS Optimization for sparse actions

Currently it seems to me that per MCTS node, a vector of length action encoding is saved, holding Q values, NN probs and edge visits. However for some games this is suboptimal. Let's take tak or the slightly less famous chess.
In chess, you need a vector of length >4k to onehot encode actions. However the branching factor is 31. In Tak on the 5x5 board the encoding lenght is >2k with a branching factor of ca 60. Means only a fraction of the actions for each given state are actually possible to take. With tak this has the consequence that I am swapping quite a bit on my machine. It would save quite some memory if some sparse storage can be used for these games.

Now I see two implementation Options. The one is straightforward, just replace the vector with a sparsevector and it needs almost no code change. This could probably reuse the code and just take an extra parameter in the initialization. The second option is to use a non-sparse vector of length sum(action-mask(state)). The ith element in the vector corresponds to the ith element in findall(action-mask(state)). This has a slight computational plus, especially if generating the action mask isn't trivial for a state, but saves even the indexing vector.

What are your thoughts on this? If you are to rewrite the MCTS anyway maybe you could take this into consideration?

Training on the cloud / multiple instances / clusters

Any tips for running this on Azure without paying Julia hubs insane premium?
I'm trying to leverage spot pricing which is about 1/10th-1/20th the cost of juliahubs pricing.

I found this:
https://github.com/microsoft/AzureClusterlessHPC.jl

I'm not entirely sure how exactly Juliahub handles running this code on multiple machines together... Is there a command or something to connect multiple instances together or something built in similar to Ray? Or will this be an incredibly painful process of setting up the code for use with that previous github I linked?

Migrate neural net agents across AlphaZero.jl instances?

I would like to train an agent on a cloud-based multi-GPU platform and then migrate it to a much cheaper platform used only for inference. The first migration step would be towards a secondary instance of AlphaZero.jl, assuming that the resources needed for testing the quality of the agent may be greatly reduced compared with the resources needed to train it.

I have two related questions:

Since AlphaZero.jl was designed with accessibility in mind, so that it performs very well on a desktop computer, is there a real benefit from running it on a multi-GPU platform and if yes, is there some parameter tweaking to do in order to leverage the extended hardware?
Considering an agent has been trained on a given instance of AlphaZero.jl, is there a "simple" way to migrate it into another instance of AlphaZero.jl?

CuDNN error

when running the example training session with the command:

julia --project --color=yes scripts/alphazero.jl --game connect-four train

I get the following error regarding CuDNN:

Using 1 distributed worker(s).


Initializing a new AlphaZero environment

  Initial report
  
    Number of network parameters: 1,667,912
    Number of regularized network parameters: 1,667,776
    Memory footprint per MCTS node: 326 bytes
  
  Running benchmark: AlphaZero against MCTS (1000 rollouts)
  
AssertionError: This functionality is unavailabe as CUDNN is missing.
Stacktrace:
 [1] macro expansion at /home/user/.julia/packages/CUDA/h38pe/deps/bindeps.jl:74 [inlined]
 [2] macro expansion at /home/user/.julia/packages/CUDA/h38pe/src/initialization.jl:51 [inlined]
 [3] libcudnn at /home/user/.julia/packages/CUDA/h38pe/deps/bindeps.jl:73 [inlined]
 [4] (::CUDA.CUDNN.var"#19247#cache_fptr!#9")() at /home/user/.julia/packages/CUDA/h38pe/lib/utils/call.jl:31
 [5] macro expansion at /home/user/.julia/packages/CUDA/h38pe/lib/utils/call.jl:39 [inlined]
 [6] unsafe_cudnnCreate(::Base.RefValue{Ptr{Nothing}}) at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/libcudnn.jl:39
 [7] macro expansion at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/base.jl:6 [inlined]
 [8] macro expansion at /home/user/.julia/packages/CUDA/h38pe/src/memory.jl:312 [inlined]
 [9] cudnnCreate() at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/base.jl:3
 [10] #516 at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/CUDNN.jl:44 [inlined]
 [11] get!(::CUDA.CUDNN.var"#516#519"{CUDA.CuContext}, ::IdDict{Any,Any}, ::Any) at ./iddict.jl:152
 [12] handle() at /home/user/.julia/packages/CUDA/h38pe/lib/cudnn/CUDNN.jl:43
 [13] forw(::Function, ::AutoGrad.Param{Knet.KnetArray{Float32,4}}, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Symbol,Tuple{Int64,Int64},Tuple{Symbol},NamedTuple{(:padding,),Tuple{Tuple{Int64,Int64}}}}) at /home/user/.julia/packages/AutoGrad/VFrAv/src/core.jl:66
 [14] #conv4#356 at ./none:0 [inlined]
 [15] (::AlphaZero.KNets.Conv)(::Knet.KnetArray{Float32,4}) at /home/user/AlphaZero.jl/src/networks/knet/layers.jl:60
 [16] (::AlphaZero.KNets.Chain)(::Knet.KnetArray{Float32,4}) at /home/user/AlphaZero.jl/src/networks/knet/layers.jl:19
 [17] forward(::ResNet{Game}, ::Knet.KnetArray{Float32,4}) at /home/user/AlphaZero.jl/src/networks/knet.jl:149
 [18] evaluate(::ResNet{Game}, ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,2}) at /home/user/AlphaZero.jl/src/networks/network.jl:253
 [19] evaluate_batch(::ResNet{Game}, ::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}) at /home/user/AlphaZero.jl/src/networks/network.jl:283
 [20] fill_and_evaluate(::ResNet{Game}, ::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}; batch_size::Int64, fill::Bool) at /home/user/AlphaZero.jl/src/play.jl:346
 [21] (::AlphaZero.var"#101#102"{Bool,ResNet{Game},Int64})(::Array{NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}},1}) at /home/user/AlphaZero.jl/src/play.jl:388
 [22] macro expansion at /home/user/AlphaZero.jl/src/batchifier.jl:47 [inlined]
 [23] macro expansion at /home/user/AlphaZero.jl/src/util.jl:56 [inlined]
 [24] (::AlphaZero.Batchifier.var"#1#3"{AlphaZero.var"#101#102"{Bool,ResNet{Game},Int64},Int64,Channel{Any}})() at ./threadingconstructs.jl:169

The computer is running CentOS (release-7-8), with RTX-2080Ti (cuda version 11)

@unimplemented

Regarding the @unimplemented macro, You may want to consider the Not Implemented Exceptions note in this blog: https://white.ucc.asn.au/2020/04/19/Julia-Antipatterns.html

Less code is easier to maintain 😃

LSTM support

Hi Jonathan,
As I try to understand the core of Alphazero.jl, I had a question about the input to the neural network. Looking at src/learning.jl, I believe the neural net receives a batch of input, but the problem is I couldn't figure out what is exact input to the neural network, specifically the part data=(W, X, A, P, V) as training input, maybe you could tell me?

Question about symmetries

Hello!
Not an issue but a question regarding GameInterface.symmetries (cf the documentation):

symmetries(::Type{G}, state) where {G <: AbstractGame}
Return the vector of all pairs (s, σ) where:

s is the image of state by a nonidentical symmetry

σ is the associated actions permutation, as an integer vector of size num_actions(Game).

When applying to the game g with state state1 and actions mask actions_mask1 the symmetry corresponding to a given (state2, σ) pair, I suppose AlphaZero.jl affects state2 as the new state of g and determines the updated actions mask actions_mask2 according to one of the following propositions, but which one?

actions_mask2[action_index] == actions_mask1[σ(action_index)]
actions_mask2[σ(action_index)] == actions_mask1[action_index]

In other words, how is the permutation supposed to be used?
(Does it really matters in the end? ;-)

Error building `Knet`

Tried installing this on Ubuntu 20.04, Julia 1.5.2, and got this error after calling julia --project -e "import Pkg; Pkg.instantiate()":

Error: Error building `Knet`: 
│ In file included from /usr/local/cuda-10.1/bin/../targets/x86_64-linux/include/cuda_runtime.h:83,
│                  from <command-line>:
│ /usr/local/cuda-10.1/bin/../targets/x86_64-linux/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
│   138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
│       |  ^~~~~
│ [ Info: cuda1.jl
│ [ Info: `/usr/local/cuda-10.1/bin/nvcc -O3 --use_fast_math -Wno-deprecated-gpu-targets --compiler-options '-O3 -Wall -fPIC' -c cuda1.cu`
│ ERROR: LoadError: failed process: Process(`/usr/local/cuda-10.1/bin/nvcc -O3 --use_fast_math -Wno-deprecated-gpu-targets --compiler-options '-O3 -Wall -fPIC' -c cuda1.cu`, ProcessExited(1)) [1]
│ 
│ Stacktrace:
│  [1] pipeline_error at ./process.jl:525 [inlined]
│  [2] run(::Cmd; wait::Bool) at ./process.jl:440
│  [3] run at ./process.jl:438 [inlined]
│  [4] inforun(::Cmd) at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:9
│  [5] build_nvcc() at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:75
│  [6] build() at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:87
│  [7] top-level scope at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:93
│  [8] include(::String) at ./client.jl:457
│  [9] top-level scope at none:5
│ in expression starting at /home/andriy/.julia/packages/Knet/bTNMd/deps/build.jl:93

Sanity Checks

Hey! I've nearly completed rigging up my game and am excited to begin running it. I had some quick questions that I wanted to run by you. Just some quick sanity checks to ensure my understanding of everything is correct as if I'm wrong these will likely break my training.

State Vectorization
1.a) Why in the examples is this always made to be from whites side?
1.b) When is state vectorization executed? Each turn or at the end of a game?
1.c) Is state vectorization done for both black and white?
1.d) What should be included in state vectorization?
Other connect-four example questions
2.a) Pretty sure I'm correct but to confirm update_status! and update_actions_mask! I've included that logic in my play! function is this fine? or is it required the logic for these be in their own functions?
2.b) In connect-four there's a function GI.clone I assume this is for something external specific to connect-four?
Other
3.a) What exactly is observation? Everything in state in the game env? Or is it everything in state vectorization?

Once again, really appreciate your help as well as other members of the community with their feedback. Thanks a ton for even making this in the first place it's amazing!