<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for explanation! <a class="user-mention notranslate" data-hovercard-type="user"

how much does share_mtcs_info_in_self_play contribute in strength? about reversi-alpha-zero HOT 7 CLOSED

mokemokechicken commented on July 23, 2024

how much does share_mtcs_info_in_self_play contribute in strength?

from reversi-alpha-zero.

Comments (7)

mokemokechicken commented on July 23, 2024

@gooooloo

now you plays draw with NTest:13. Good job!

No, my model couldn't draw with Ntest:13.
My record of (0, 10, 0) means (win, lose, draw).
It is confusing...

I notice the design of share_mtcs_info_in_self_play. It share mcts info among different games of same model. This is different with AlphaGoZero/AlphaZero paper, but I imagine it would improve selfplay quality a lot. How is it in real practice?

However I cannot see the effect clearly, I feel that it has good effects especially when small sim_per_move(~100).

And how many memory usage does it bring?

I think it doesn't increase very much.
A expanded node consumes about 200 bytes(64*3 + α) memory.
When sim_per_move is 400, 400 * 200B = 80kB per move, so 60 move/game * 80kB = about 480kB per game.
Actually, I can't see the increase.

from reversi-alpha-zero.

gooooloo commented on July 23, 2024

Thanks for explanation! @mokemokechicken

My record of (0, 10, 0) means (win, lose, draw).

I see. Sorry I didn't see your description carefully. Actually you have that in the .md file. It is me that I just check the change of commit.

A expanded node consumes about 200 bytes(64*3 + α) memory.
When sim_per_move is 400, 400 * 200B = 80kB per move, so 60 move/game * 80kB = about 480kB per game.

Good analysis! And you clear that buffer every time new model is loaded. Suppose you play 1000 games per model (if 7.2 seconds per game, it is 2 hours) , then it is just ~480MB.

I feel that it has good effects especially when small sim_per_move(~100)

Interesting. I was thinking it would also help with big sim_per_move(e.g. 800), unless the NN predicted node value is quite accurate.

Another mind of me would be, maybe you could share that info across self play processes. I imagine a simple (imperfect) way : when a game is done, compute the delta part, send it to a "shared mcts info manager", wait for manager to applies this delta, pull the new mcts info to new game.

from reversi-alpha-zero.

mokemokechicken commented on July 23, 2024

@gooooloo

Another mind of me would be, maybe you could share that info across self play processes. I imagine a simple (imperfect) way : when a game is done, compute the delta part, send it to a "shared mcts info manager", wait for manager to applies this delta, pull the new mcts info to new game.

Yes, it is possible and interesting.
I am also concerned that sharing mcts info brings bad effects, for example, it brings less contribution by updating model.
So, your idea is better because sharing mcts info among same model avoid that problem.

By the way, I noticed that current implementation is not enough.
The "value" of tree search results is not brought to their parent(and ancestor) nodes.
It is necessary to keep all nodes to the root(initial state), and add new searched values(N and W) to them after simulations.

from reversi-alpha-zero.

gooooloo commented on July 23, 2024

It is necessary to keep all nodes to the root(initial state), and add new searched values(N and W) to them after simulations.

If I understand correctly, that will increase memory usage a lot? Well, seems it depends on how fast self play is, compared to model updating...

from reversi-alpha-zero.

mokemokechicken commented on July 23, 2024

@gooooloo

If I understand correctly, that will increase memory usage a lot? Well, seems it depends on how fast self play is, compared to model updating...

It will not increase memory usage at all because it just updates N or W of the parent(and ancestors).
However I reconsidered that adding N simply is not good because almost same moves will be selected in the next game.
I will try to implement the concept to clear and test it.

from reversi-alpha-zero.

AranKomat commented on July 23, 2024

Doesn't sharing MCTS info discourage the exploration? The initial values of n, w and q are inherited from the shared MCTS info, and then the process/thread does simulation to add further values to these three quantities and then decides to move based on them. But this last step of moving forward seems to be heavily influenced by the shared info. If the exploration of the first turn is poorly performed, then the second turn's outcome becomes similar, and so is n-th turn's. On the other hand, if enough variation is achieved while using shared info, shared info is not useful except for the first an possibly second turns only. Thoughts?

from reversi-alpha-zero.

mokemokechicken commented on July 23, 2024

@AranKomat

As you pointed out, I think that sharing MCTS info discourage the exploration.

The aim of sharing MCTS info is to encourage searching another move at difficult positions.
The difficult position means that there are several moves which have almost same (N, W).
By sharing it, in the next game, the loser side is encouraged to select another move because it knows the previous move is bad and another seems good.
For example, it is like a "post mortem" of chess.
※ I fixed the first move of black to "C4" for effective sharing.

I investigated some series of games. They always played different moves in the middle of game. So, my aim is achieved to a certain extent.
However, it is true that sharing it discourage the exploration.
Sharing info is reset every N games (currently N=5).

from reversi-alpha-zero.

how much does share_mtcs_info_in_self_play contribute in strength? about reversi-alpha-zero HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent