zeta36 / muzero Goto Github PK

View Code? Open in Web Editor NEW

93.0 93.0 20.0 61 KB

A simple implementation of MuZero algorithm for connect4 game

License: GNU General Public License v3.0

Jupyter Notebook 68.69% Python 31.31%

deepmind jupyter-notebook muzero python pytorch

muzero's People

Contributors

Stargazers

Watchers

muzero's Issues

ValueError: object too deep for desired array

Hi, thank you for sharing the code.

I find this issue when I run the ipynb in Colab

Solution: ValueError: object too deep for desired array

function softmax_sample shoule be changed to:

def softmax_sample(distribution, temperature: float):
  if temperature == 0:
    temperature = 1
  distribution = numpy.array(distribution)**(1/temperature)
  p_sum = distribution[:,0].sum()
  sample_temp = distribution[:,0]/p_sum
  action = distribution[int(numpy.argmax(numpy.random.multinomial(1, sample_temp, 1)))][1]
  return 0, int(action)

because distribution is a 2d array, every element in it has 2 values. like this
[[ 0. 0.]
[ 4. 1.]
[ 1. 2.]
[ 0. 3.]
[ 0. 4.]
[ 0. 5.]
[ 0. 6.]
[ 0. 7.]
[ 0. 8.]
[ 0. 9.]
[ 0. 10.]
[ 0. 11.]........
The first value is the visit times and the second value is an action index.
p_sum should be calculated based on the first value so we use distribution[:,0].
when choose action index we should return the second value so we use
distribution[int(numpy.argmax(numpy.random.multinomial(1, sample_temp, 1)))][1]

sample_position error

Thanks for sharing this code.I found an error on line 552.it lose len function.

def sample_position(self, game) -> int:
# Sample position from game either uniformly or according to some priority.
return numpy.random.choice(game.history)
I think it should be
def sample_position(self, game) -> int:
# Sample position from game either uniformly or according to some priority.
return numpy.random.choice(len(game.history))

softmax_sample error

Thanks for sharing this code. An error I am facing while running the code says the inputs of numpy multinomial function are not correct:

File "mtrand.pyx", line 4639, in mtrand.RandomState.multinomial
ValueError: object too deep for desired array

It seems that the distribution variable in softmax_sample has a shape of [7,2], where its [:,0] values are the visit counts and its [:,1] vaules are action indexes.

I tried a simple fix of adjusting distribution in the softmax_sample function:

def softmax_sample(distribution, temperature: float):
if temperature == 0:
temperature = 1
distribution = numpy.array(distribution)**(1/temperature)
distribution = distribution[:,0]
p_sum = distribution.sum()
sample_temp = distribution/p_sum
return 0, numpy.argmax(numpy.random.multinomial(1, sample_temp, 1))

Is that a correct way to solve the issue? Is the problem stemmed from different numpy version (I am using 1.17.1)?

zeta36 / muzero Goto Github PK

muzero's People

Contributors

Stargazers

Watchers

Forkers

muzero's Issues

ValueError: object too deep for desired array

Solution: ValueError: object too deep for desired array

sample_position error

softmax_sample error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent