Comments (7)
now you plays draw with NTest:13. Good job!
No, my model couldn't draw with Ntest:13.
My record of (0, 10, 0) means (win, lose, draw).
It is confusing...
I notice the design of share_mtcs_info_in_self_play. It share mcts info among different games of same model. This is different with AlphaGoZero/AlphaZero paper, but I imagine it would improve selfplay quality a lot. How is it in real practice?
However I cannot see the effect clearly, I feel that it has good effects especially when small sim_per_move(~100).
And how many memory usage does it bring?
I think it doesn't increase very much.
A expanded node consumes about 200 bytes(64*3 + α) memory.
When sim_per_move is 400, 400 * 200B = 80kB per move, so 60 move/game * 80kB = about 480kB per game.
Actually, I can't see the increase.
from reversi-alpha-zero.
Thanks for explanation! @mokemokechicken
My record of (0, 10, 0) means (win, lose, draw).
I see. Sorry I didn't see your description carefully. Actually you have that in the .md file. It is me that I just check the change of commit.
A expanded node consumes about 200 bytes(64*3 + α) memory.
When sim_per_move is 400, 400 * 200B = 80kB per move, so 60 move/game * 80kB = about 480kB per game.
Good analysis! And you clear that buffer every time new model is loaded. Suppose you play 1000 games per model (if 7.2 seconds per game, it is 2 hours) , then it is just ~480MB.
I feel that it has good effects especially when small sim_per_move(~100)
Interesting. I was thinking it would also help with big sim_per_move(e.g. 800), unless the NN predicted node value is quite accurate.
Another mind of me would be, maybe you could share that info across self play processes. I imagine a simple (imperfect) way : when a game is done, compute the delta part, send it to a "shared mcts info manager", wait for manager to applies this delta, pull the new mcts info to new game.
from reversi-alpha-zero.
Another mind of me would be, maybe you could share that info across self play processes. I imagine a simple (imperfect) way : when a game is done, compute the delta part, send it to a "shared mcts info manager", wait for manager to applies this delta, pull the new mcts info to new game.
Yes, it is possible and interesting.
I am also concerned that sharing mcts info brings bad effects, for example, it brings less contribution by updating model.
So, your idea is better because sharing mcts info among same model avoid that problem.
By the way, I noticed that current implementation is not enough.
The "value" of tree search results is not brought to their parent(and ancestor) nodes.
It is necessary to keep all nodes to the root(initial state), and add new searched values(N and W) to them after simulations.
from reversi-alpha-zero.
It is necessary to keep all nodes to the root(initial state), and add new searched values(N and W) to them after simulations.
If I understand correctly, that will increase memory usage a lot? Well, seems it depends on how fast self play is, compared to model updating...
from reversi-alpha-zero.
If I understand correctly, that will increase memory usage a lot? Well, seems it depends on how fast self play is, compared to model updating...
It will not increase memory usage at all because it just updates N or W of the parent(and ancestors).
However I reconsidered that adding N simply is not good because almost same moves will be selected in the next game.
I will try to implement the concept to clear and test it.
from reversi-alpha-zero.
Doesn't sharing MCTS info discourage the exploration? The initial values of n, w and q are inherited from the shared MCTS info, and then the process/thread does simulation to add further values to these three quantities and then decides to move based on them. But this last step of moving forward seems to be heavily influenced by the shared info. If the exploration of the first turn is poorly performed, then the second turn's outcome becomes similar, and so is n-th turn's. On the other hand, if enough variation is achieved while using shared info, shared info is not useful except for the first an possibly second turns only. Thoughts?
from reversi-alpha-zero.
As you pointed out, I think that sharing MCTS info discourage the exploration.
The aim of sharing MCTS info is to encourage searching another move at difficult positions.
The difficult position means that there are several moves which have almost same (N, W).
By sharing it, in the next game, the loser side is encouraged to select another move because it knows the previous move is bad and another seems good.
For example, it is like a "post mortem" of chess.
※ I fixed the first move of black to "C4" for effective sharing.
I investigated some series of games. They always played different moves in the middle of game. So, my aim is achieved to a certain extent.
However, it is true that sharing it discourage the exploration.
Sharing info is reset every N games (currently N=5).
from reversi-alpha-zero.
Related Issues (20)
- About the optimizer? HOT 5
- invalid correct moves HOT 2
- GPU ResourceExhaustedError after many times of Keras model.load() during self-play HOT 1
- What's different between Challenge 2 & 3? HOT 2
- The sign of virtual loss is reversed
- The history dates of Challenge 3/4 are wrong. HOT 1
- It may forget pertinent information about positions that it no longer visits. HOT 21
- automatically ntest HOT 2
- Performance Reports HOT 23
- Unofficial AlphaGoZero implementation from Googlers HOT 15
- Child seeds being identical to the parent seed may nullify the effect of multi-processing/threading HOT 3
- a question about reloading model HOT 2
- AlphaZero Approach HOT 2
- Replacing CNN with decoder-only Transformer for possible acceleration? HOT 3
- maybe a bug here HOT 1
- About using different players for training game generation HOT 6
- Cannot use multiple GPUs in self-play HOT 3
- tensorflow.python.framework.errors_impl.InvalidArgumentError: Tensor input_1:0, specified in either feed_devices or fetch_devices was not found in the Graph HOT 1
- Gobang version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reversi-alpha-zero.