Comments (7)
In chess, AlphaZero outperformed Stockfish after just 4 hours (300k steps)
Wow!!
from reversi-alpha-zero.
Hi @apollo-time
I think the main differences are as follows.
P3~4
AlphaZero:
- AlphaZero does not augment the training data and does not transform the board position during MCTS. (for generality)
- evaluation step is omitted. self-play is performed by the newest model parameters. (!)
- didn't tune hyper-parameter by Bayesian optimization. (reuse past parameters except policy noise)
So, MCTS is also used without transforming the board position.
from reversi-alpha-zero.
The rules of Go are invariant to rotation and reflection. This fact was exploited in AlphaGo
and AlphaGo Zero in two ways. First, training data was augmented by generating 8 symmetries
for each position. Second, during MCTS, board positions were transformed using a randomly
selected rotation or reflection before being evaluated by the neural network, so that the MonteCarlo
evaluation is averaged over different biases
Oh..., I did't generate 8 symmetries for each position...
from reversi-alpha-zero.
In reversi, it is better that α is 0.3 ~ 0.5?
Agreed. Let's say 180 legal actions in average in Go19x19, and in Reversi it may be around 10? So as to the new paper, 10 times 0.03 seems more reasonable.
from reversi-alpha-zero.
Dirichlet noise Dir(α) was added to the prior probabilities in the
root node; this was scaled in inverse proportion to the approximate number of legal moves in a
typical position, to a value of α = {0.3, 0.15, 0.03} for chess, shogi and Go respectively.
In reversi, it is better that α is 0.3 ~ 0.5?
from reversi-alpha-zero.
Illegal moves are masked out by
setting their probabilities to zero, and re-normalising the probabilities for remaining moves.
re-normalising in legal moves may be important because of balance between value and policy.
from reversi-alpha-zero.
What is main different between alphago zero and alphazero?
Is same the MCTS architecture?
from reversi-alpha-zero.
Related Issues (20)
- About the optimizer? HOT 5
- invalid correct moves HOT 2
- GPU ResourceExhaustedError after many times of Keras model.load() during self-play HOT 1
- What's different between Challenge 2 & 3? HOT 2
- The sign of virtual loss is reversed
- The history dates of Challenge 3/4 are wrong. HOT 1
- It may forget pertinent information about positions that it no longer visits. HOT 21
- automatically ntest HOT 2
- Performance Reports HOT 23
- Unofficial AlphaGoZero implementation from Googlers HOT 15
- how much does share_mtcs_info_in_self_play contribute in strength? HOT 7
- Child seeds being identical to the parent seed may nullify the effect of multi-processing/threading HOT 3
- a question about reloading model HOT 2
- AlphaZero Approach HOT 2
- Replacing CNN with decoder-only Transformer for possible acceleration? HOT 3
- maybe a bug here HOT 1
- About using different players for training game generation HOT 6
- Cannot use multiple GPUs in self-play HOT 3
- tensorflow.python.framework.errors_impl.InvalidArgumentError: Tensor input_1:0, specified in either feed_devices or fetch_devices was not found in the Graph HOT 1
- Gobang version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reversi-alpha-zero.