Giter Club home page Giter Club logo

Comments (4)

kblomdahl avatar kblomdahl commented on June 14, 2024

The 40 layer tower that DeepMind used had 11,895,775 weights and a regularization coefficient of 1e-4, while our 20 layer tower has 5,997,535 weights so our regularization coefficient should be approximately 2e-4.

0.0001 * (11,895,775 / 5,997,535) = 0.000198344

40 layers weights

3 * 3 * 34 * 128
+ 39 * (2 * 3 * 3 * 128 * 128)
+ 128 * 1 + 361 * 256 + 256 + 256 + 1
+ 128 * 2 + 722 * 362 + 362
= 11895775

20 layers weights

3 * 3 * 34 * 128
+ 19 * (2 * 3 * 3 * 128 * 128)
+ 128 * 1 + 361 * 256 + 256 + 256 + 1
+ 128 * 2 + 722 * 362 + 362
= 5997535

from dream-go.

kblomdahl avatar kblomdahl commented on June 14, 2024

Instead of treating DeepMind as an oracle, we might also wish to play with the regularization coefficient a bit ourselves since we can set it higher to avoid overfitting. This is especially useful because we are severely lacking in data compared to other similar projects.

I suggest training a 20 layer tower with the following coefficients on human games and observing their tournament and testing performance.

  • 1e-4 This is what the current network is trained with.
  • 2e-4 As suggested by the previous post.
  • 1e-3 To represent an extreme coefficient.

The result of the training procedure for these network can be found in Table 1, and they seem to suggest that 2e-4 is the "sweet spot", since the change in the coefficient when compared to 1e-4 is much larger than the change in the accuracy. 1e-3 does not seem to be a valid option, since the weights are too constrained to properly exploit the structure.

Table 1: Accuracy of each model on a separate set of professional games.
Coefficient Steps Value Policy
Top 1 Top 3 Top 5
1e-4 355,717 73.1% 49.2% 73.8% 83.4%
2e-4 95,295 68.4% 48.2% 73.6% 81.8%
1e-3 24,257 58.1% 38.4% 61.8% 71.3%

Tournament

Each cell indicate ROW vs COL.

- 1e-4 2e-4 1e-3
1e-4 - 8 - 2 9 - 1
2e-4 - - 6 - 3
1e-3 - - -

from dream-go.

kblomdahl avatar kblomdahl commented on June 14, 2024

I suggest the learning schedule in Table 1 based on empirical data gathered from training the 8 layer tower and previous 20 layer towers. This schedule is a lot steeper than the one suggested by DeepMind, the reason for this is that we are now using the Adam optimizer which typically converge a lot faster than a pure momentum optimizer and we therefore need less steps.

This schedule takes about 22 hours to fully train on my computer with two GTX 1080 Ti's.

Table 1: Learning rate
StepLearning Rate
03e-3
12,0001e-3
27,0003e-4
45,0001e-4
66,0003e-5
90,0001e-5

from dream-go.

kblomdahl avatar kblomdahl commented on June 14, 2024

The tournament and testing accuracy suggest seems to correlate to an hierarchy where 1e-4 > 2e-4 > 1e-3. It is unclear if, given an equal number of training steps the same result would occur.

In the meantime I suggest we keep the currently trained network as the state of the art but keep the regularization coefficient for any future networks trained on self-play. The reasoning for this is that we will have issues with weights being overfit and the loss in playing strength from this larger regularization coefficient is not too large (even if it is non-trivial).

Would however recommend that when training on human data, we decrease the regularization coefficient to 1e-4.

from dream-go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.