Comments (8)
Why don't use discount-rate γ?
It is a diffucult question.
Conversely, the reasons to use discount-rate are
I think
- Far steps are less causual relationship.
- Generally it is better to get rewards early.
Thinking that way, the reasons not to use discount-rate are
- In games with perfect information like Go and Reversi, all moves are related with the final reward.
- There is not much meaning even if it wins the game quickly.
from reversi-alpha-zero.
Just for your reference, I am sharing tree search between 2 players, see codes here: https://github.com/gooooloo/alpha-zero-in-python/blob/master/src/reversi_zero/agent/player.py
But I don't think this makes big difference. Many other settings are much more important, such as simulation number, resignation threshold, performance trade-off between self/opt/eval module, etc.
from reversi-alpha-zero.
Hi @apollo-time
- Why don't update Q with N/W at this time?
- Isn't it W=W+virtual loss when player is white?
Thank you very good point!
That is a serious bug for virtual loss (Virtual Loss of W didn't work).
- Why didn't share tree between two players?
Because if models of black and white are different, MCTS results are also different.
from reversi-alpha-zero.
reversi-alpha-zero/src/reversi_zero/worker/self_play.py
Lines 63 to 64 in 527ce6c
I see two players use the same model in self play mode.
from reversi-alpha-zero.
Yes, that's right.
Although it is a little difficult to implement, sharing tree search results may be useful to save computation costs.
from reversi-alpha-zero.
I see DeepMind backup reward to parent nodes without modify.
Why don't use discount-rate γ?
from reversi-alpha-zero.
But I think the first step is not related with the final result as final step, when the game length is long.
from reversi-alpha-zero.
Although there is only one kind of the first move of reversi, it does not matter, but maybe there is a possibility that the first move becomes a bad move in go and chess.
from reversi-alpha-zero.
Related Issues (20)
- About the optimizer? HOT 5
- invalid correct moves HOT 2
- GPU ResourceExhaustedError after many times of Keras model.load() during self-play HOT 1
- What's different between Challenge 2 & 3? HOT 2
- The sign of virtual loss is reversed
- The history dates of Challenge 3/4 are wrong. HOT 1
- It may forget pertinent information about positions that it no longer visits. HOT 21
- automatically ntest HOT 2
- Performance Reports HOT 23
- Unofficial AlphaGoZero implementation from Googlers HOT 15
- how much does share_mtcs_info_in_self_play contribute in strength? HOT 7
- Child seeds being identical to the parent seed may nullify the effect of multi-processing/threading HOT 3
- a question about reloading model HOT 2
- AlphaZero Approach HOT 2
- Replacing CNN with decoder-only Transformer for possible acceleration? HOT 3
- maybe a bug here HOT 1
- About using different players for training game generation HOT 6
- Cannot use multiple GPUs in self-play HOT 3
- tensorflow.python.framework.errors_impl.InvalidArgumentError: Tensor input_1:0, specified in either feed_devices or fetch_devices was not found in the Graph HOT 1
- Gobang version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reversi-alpha-zero.