Giter Club home page Giter Club logo

Comments (7)

Zeta36 avatar Zeta36 commented on July 23, 2024 3

In chess, AlphaZero outperformed Stockfish after just 4 hours (300k steps)

Wow!!

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024 3

Hi @apollo-time

I think the main differences are as follows.

P3~4

AlphaZero:

  • AlphaZero does not augment the training data and does not transform the board position during MCTS. (for generality)
  • evaluation step is omitted. self-play is performed by the newest model parameters. (!)
  • didn't tune hyper-parameter by Bayesian optimization. (reuse past parameters except policy noise)

So, MCTS is also used without transforming the board position.

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024 2

The rules of Go are invariant to rotation and reflection. This fact was exploited in AlphaGo
and AlphaGo Zero in two ways. First, training data was augmented by generating 8 symmetries
for each position. Second, during MCTS, board positions were transformed using a randomly
selected rotation or reflection before being evaluated by the neural network, so that the MonteCarlo
evaluation is averaged over different biases

Oh..., I did't generate 8 symmetries for each position...

from reversi-alpha-zero.

gooooloo avatar gooooloo commented on July 23, 2024 2

In reversi, it is better that α is 0.3 ~ 0.5?

Agreed. Let's say 180 legal actions in average in Go19x19, and in Reversi it may be around 10? So as to the new paper, 10 times 0.03 seems more reasonable.

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024

Dirichlet noise Dir(α) was added to the prior probabilities in the
root node; this was scaled in inverse proportion to the approximate number of legal moves in a
typical position, to a value of α = {0.3, 0.15, 0.03} for chess, shogi and Go respectively.

In reversi, it is better that α is 0.3 ~ 0.5?

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024

Illegal moves are masked out by
setting their probabilities to zero, and re-normalising the probabilities for remaining moves.

re-normalising in legal moves may be important because of balance between value and policy.

from reversi-alpha-zero.

apollo-time avatar apollo-time commented on July 23, 2024

What is main different between alphago zero and alphazero?
Is same the MCTS architecture?

from reversi-alpha-zero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.