Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

About MCTS about reversi-alpha-zero HOT 8 OPEN

mokemokechicken commented on July 23, 2024

About MCTS

from reversi-alpha-zero.

Comments (8)

mokemokechicken commented on July 23, 2024 2

Why don't use discount-rate γ?

It is a diffucult question.

Conversely, the reasons to use discount-rate are

I think

Far steps are less causual relationship.
Generally it is better to get rewards early.

Thinking that way, the reasons not to use discount-rate are

In games with perfect information like Go and Reversi, all moves are related with the final reward.
There is not much meaning even if it wins the game quickly.

from reversi-alpha-zero.

gooooloo commented on July 23, 2024 1

Just for your reference, I am sharing tree search between 2 players, see codes here: https://github.com/gooooloo/alpha-zero-in-python/blob/master/src/reversi_zero/agent/player.py

But I don't think this makes big difference. Many other settings are much more important, such as simulation number, resignation threshold, performance trade-off between self/opt/eval module, etc.

from reversi-alpha-zero.

mokemokechicken commented on July 23, 2024

Hi @apollo-time

Why don't update Q with N/W at this time?

Isn't it W=W+virtual loss when player is white?

Thank you very good point!
That is a serious bug for virtual loss (Virtual Loss of W didn't work).

Why didn't share tree between two players?

Because if models of black and white are different, MCTS results are also different.

from reversi-alpha-zero.

apollo-time commented on July 23, 2024

reversi-alpha-zero/src/reversi_zero/worker/self_play.py

Lines 63 to 64 in 527ce6c

 self.black = ReversiPlayer(self.config, self.model, enable_resign=enable_resign) 

 self.white = ReversiPlayer(self.config, self.model, enable_resign=enable_resign)

I see two players use the same model in self play mode.

from reversi-alpha-zero.

mokemokechicken commented on July 23, 2024

Yes, that's right.
Although it is a little difficult to implement, sharing tree search results may be useful to save computation costs.

from reversi-alpha-zero.

apollo-time commented on July 23, 2024

I see DeepMind backup reward to parent nodes without modify.
Why don't use discount-rate γ?

from reversi-alpha-zero.

apollo-time commented on July 23, 2024

But I think the first step is not related with the final result as final step, when the game length is long.

from reversi-alpha-zero.

mokemokechicken commented on July 23, 2024

Although there is only one kind of the first move of reversi, it does not matter, but maybe there is a possibility that the first move becomes a bad move in go and chess.

from reversi-alpha-zero.

Recommend Projects

About MCTS about reversi-alpha-zero HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	self.black = ReversiPlayer(self.config, self.model, enable_resign=enable_resign)
	self.white = ReversiPlayer(self.config, self.model, enable_resign=enable_resign)