Comments (2)
Trained from about 250,000 professional Foxy games. These are the final validation scores based on 10,000 professional games (from before AlphaGo, so we may have to pick a different dataset):
Depth | Width | Policy (1) | Policy (3) | Policy (5) | Value |
---|---|---|---|---|---|
9 | 128 | 0.48574310541152954 | 0.7269249558448792 | 0.8160569071769714 | 0.6877566576004028 |
9 | 256 | 0.5018187761306763 | 0.745582103729248 | 0.8326722979545593 | 0.6977071762084961 |
16 | 192 | 0.5031564831733704 | 0.7433526515960693 | 0.8325315117835999 | 0.6984581351280212 |
23 | 160 | 0.5038605332374573 | 0.7408650517463684 | 0.8261481523513794 | 0.6942808032035828 |
Tournament
The result for 9x256, 16x192, and 23x160 looks very similar, this could be due to a few different reasons:
- Limitations in the train data.
- Limitations in the test data.
- The models are equivalent.
Play testing of the networks may reveal more interesting details. The play test will be setup such that:
- Each engine runs with the same version of the code (
7cf8e0
). - Each engine is allowed one second of thinking time per move.
dg-16x192 v dg-9x128 (37/200 games)
unknown results: 1 2.70%
board size: 19 komi: 7.5
wins black white avg cpu
dg-16x192 28 75.68% 11 57.89% 17 94.44% 865.58
dg-9x128 8 21.62% 1 5.56% 7 36.84% 881.52
12 32.43% 24 64.86%
dg-16x192 v dg-23x160 (37/200 games)
unknown results: 1 2.70%
board size: 19 komi: 7.5
wins black white avg cpu
dg-16x192 23 62.16% 10 52.63% 13 72.22% 708.71
dg-23x160 13 35.14% 4 22.22% 9 47.37% 683.93
14 37.84% 22 59.46%
dg-16x192 v dg-9x256 (37/200 games)
unknown results: 3 8.11%
board size: 19 komi: 7.5
wins black white avg cpu
dg-16x192 18 48.65% 8 42.11% 10 55.56% 838.37
dg-9x256 16 43.24% 8 44.44% 8 42.11% 858.27
16 43.24% 18 48.65%
dg-9x128 v dg-23x160 (36/200 games)
unknown results: 5 13.89%
board size: 19 komi: 7.5
wins black white avg cpu
dg-9x128 10 27.78% 5 27.78% 5 27.78% 855.59
dg-23x160 21 58.33% 10 55.56% 11 61.11% 779.41
15 41.67% 16 44.44%
dg-9x128 v dg-9x256 (36/200 games)
unknown results: 2 5.56%
board size: 19 komi: 7.5
wins black white avg cpu
dg-9x128 7 19.44% 1 5.56% 6 33.33% 849.99
dg-9x256 27 75.00% 11 61.11% 16 88.89% 868.63
12 33.33% 22 61.11%
dg-23x160 v dg-9x256 (36/200 games)
board size: 19 komi: 7.5
wins black white avg cpu
dg-23x160 15 41.67% 8 44.44% 7 38.89% 820.20
dg-9x256 21 58.33% 11 61.11% 10 55.56% 787.45
19 52.78% 17 47.22%
Elo
dg-9x128:0.6.3 0.00
dg-23x160:0.6.3 138.14
dg-9x256:0.6.3 210.86
dg-16x192:0.6.3 229.86
Performance
All times are in nanoseconds, according to the bench batch_size
command. As expected the deeper models are more expensive to compute in practice, despite having the same FLOPS, since they involve more cuDNN overhead:
Depth | Width | Batch Size (1) | Batch Size (4) | Batch Size (8) | Batch Size (16) | Batch Size (32) | Batch Size (256) |
---|---|---|---|---|---|---|---|
9 | 128 | 896,035 | 730,125 | 750,913 | 849,179 | 1,406,923 | 9,292,403 |
9 | 256 | 1,098,342 | 1,210,698 | 1,327,191 | 2,256,043 | 3,714,270 | 27,456,126 |
16 | 192 | 1,449,075 | 1,602,011 | 1,677,403 | 2,856,924 | 4,679,059 | 34,597,654 |
23 | 160 | 2,221,460 | 2,405,039 | 2,571,189 | 4,217,678 | 6,849,057 | 48,325,717 |
from dream-go.
Evaluation
At the end of the day, the only metric that matters is the playing strength of the final network, and based on the evidence provided in the previous post I suggest we use the architecture with the highest ELO:
16x192
Discussion
Some interesting observations one can make based on the data above:
- The test accuracy of the trained neural network is not a good indication of strength:
- Is this due to a bad test dataset? I should probably seek out a new one in the future.
- The runtime performance does not scale very well with deep architectures, instead preferring wide architectures:
- Follows naturally from how matrix multiplication is implemented on the GPU.
- Follows from that each CUDA kernel launch has an associated overhead.
- The playing architecture seems to prefer deep architectures over wide architectures:
- This may not be obvious, but consider that we used a time limit during play and that deep architectures are slower. This means deep architectures performed, on average, fewer rollouts than wide architectures. Yet their playing strength is about the same, so deep architectures has a higher strength per rollout value.
- This follows from compute go community common sense, where it is believed that a deep network is necessary to determine capturing races, and life & death.
from dream-go.
Related Issues (20)
- Scoring and `kgs-genmove_cleanup` improvements
- About MCTSnet HOT 2
- Introduce a new self-play mode
- Poor GPU utilization observed during play HOT 2
- Re-factor MCTS code to use asynchronous framework
- Shape of the convolution in the policy head
- Monte-Carlo tree search as regularized policy optimization HOT 3
- Investigate MCTS parallelism degradation HOT 7
- Prune nodes from the search tree that are obviously bad HOT 1
- Re-implement `INT8x32_CONFIG` support during inference
- Investigate SWISH as activation function in cuDNN
- GPU vs CPU matrix multiplication HOT 1
- Sparse Quantized Model
- MLP-Mixer: An all-MLP Architecture for Vision HOT 7
- NNUE (ƎUИИ Efficiently Updatable Neural Network) for Go HOT 5
- Triton: Open-Source GPU Programming for Neural Networks
- Long startup times due to `cudnnBuildRNNDynamic`
- 2022 TCGA Computer Go Tournament is coming! HOT 1
- Unsound uninitialized array
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dream-go.