Comments (4)
The 40 layer tower that DeepMind used had 11,895,775 weights and a regularization coefficient of 1e-4, while our 20 layer tower has 5,997,535 weights so our regularization coefficient should be approximately 2e-4.
0.0001 * (11,895,775 / 5,997,535) = 0.000198344
40 layers weights
3 * 3 * 34 * 128
+ 39 * (2 * 3 * 3 * 128 * 128)
+ 128 * 1 + 361 * 256 + 256 + 256 + 1
+ 128 * 2 + 722 * 362 + 362
= 11895775
20 layers weights
3 * 3 * 34 * 128
+ 19 * (2 * 3 * 3 * 128 * 128)
+ 128 * 1 + 361 * 256 + 256 + 256 + 1
+ 128 * 2 + 722 * 362 + 362
= 5997535
from dream-go.
Instead of treating DeepMind as an oracle, we might also wish to play with the regularization coefficient a bit ourselves since we can set it higher to avoid overfitting. This is especially useful because we are severely lacking in data compared to other similar projects.
I suggest training a 20 layer tower with the following coefficients on human games and observing their tournament and testing performance.
- 1e-4 This is what the current network is trained with.
- 2e-4 As suggested by the previous post.
- 1e-3 To represent an extreme coefficient.
The result of the training procedure for these network can be found in Table 1, and they seem to suggest that 2e-4 is the "sweet spot", since the change in the coefficient when compared to 1e-4 is much larger than the change in the accuracy. 1e-3 does not seem to be a valid option, since the weights are too constrained to properly exploit the structure.
Table 1: Accuracy of each model on a separate set of professional games.Coefficient | Steps | Value | Policy | ||
---|---|---|---|---|---|
Top 1 | Top 3 | Top 5 | |||
1e-4 | 355,717 | 73.1% | 49.2% | 73.8% | 83.4% |
2e-4 | 95,295 | 68.4% | 48.2% | 73.6% | 81.8% |
1e-3 | 24,257 | 58.1% | 38.4% | 61.8% | 71.3% |
Tournament
Each cell indicate ROW vs COL
.
- | 1e-4 | 2e-4 | 1e-3 |
---|---|---|---|
1e-4 | - | 8 - 2 | 9 - 1 |
2e-4 | - | - | 6 - 3 |
1e-3 | - | - | - |
from dream-go.
I suggest the learning schedule in Table 1 based on empirical data gathered from training the 8 layer tower and previous 20 layer towers. This schedule is a lot steeper than the one suggested by DeepMind, the reason for this is that we are now using the Adam optimizer which typically converge a lot faster than a pure momentum optimizer and we therefore need less steps.
This schedule takes about 22 hours to fully train on my computer with two GTX 1080 Ti's.
Table 1: Learning rateStep | Learning Rate |
---|---|
0 | 3e-3 |
12,000 | 1e-3 |
27,000 | 3e-4 |
45,000 | 1e-4 |
66,000 | 3e-5 |
90,000 | 1e-5 |
from dream-go.
The tournament and testing accuracy suggest seems to correlate to an hierarchy where 1e-4
> 2e-4
> 1e-3
. It is unclear if, given an equal number of training steps the same result would occur.
In the meantime I suggest we keep the currently trained network as the state of the art but keep the regularization coefficient for any future networks trained on self-play. The reasoning for this is that we will have issues with weights being overfit and the loss in playing strength from this larger regularization coefficient is not too large (even if it is non-trivial).
Would however recommend that when training on human data, we decrease the regularization coefficient to 1e-4
.
from dream-go.
Related Issues (20)
- Re-balance search tree size vs neural network size HOT 2
- Scoring and `kgs-genmove_cleanup` improvements
- About MCTSnet HOT 2
- Introduce a new self-play mode
- Poor GPU utilization observed during play HOT 2
- Re-factor MCTS code to use asynchronous framework
- Shape of the convolution in the policy head
- Monte-Carlo tree search as regularized policy optimization HOT 3
- Investigate MCTS parallelism degradation HOT 7
- Prune nodes from the search tree that are obviously bad HOT 1
- Re-implement `INT8x32_CONFIG` support during inference
- Investigate SWISH as activation function in cuDNN
- GPU vs CPU matrix multiplication HOT 1
- Sparse Quantized Model
- MLP-Mixer: An all-MLP Architecture for Vision HOT 7
- NNUE (ƎUИИ Efficiently Updatable Neural Network) for Go HOT 5
- Triton: Open-Source GPU Programming for Neural Networks
- Long startup times due to `cudnnBuildRNNDynamic`
- 2022 TCGA Computer Go Tournament is coming! HOT 1
- Unsound uninitialized array
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dream-go.