The categorical-dqn from floringogianu

categorical-dqn's People

Contributors

Stargazers

Watchers

categorical-dqn's Issues

categorical update problem when l=u

In def _get_categorical(self, next_states, rewards, mask),
when "b" happens to be an integer (e.g., bellman_op clamped to be self.v_max, so b=51),
the floor and ceil indexe values, "l" and "u" will be equal.
This seems to cause trouble to the distribution projection, as the category "b" will be projected to nowhere.

Excessive clamping

It seems like the clamping of the network's output in the update is a bit excessive? Given values of x in [0, 1] (valid probabilities), as x -> 0, log(x) -> -infinity, so clamping the minimum value makes sense, but log(1) = 0, so there's no issues with the max value. Pinging @tudor-berariu as well.

Empirically, this might be an issue. I'm running my Rainbow agent with a minimum clamp of 0.001 (arbitrarily chosen), and get the following rewards and Q-values on Space Invaders (the Q-values are in line with what is reported in the Double DQN paper; unfortunately I do not have reported Q-values for Rainbow):

Whereas when I use a minimum clamp of 0.01 and maximum clamp of 0.99 as in this repo, I get the following, which indicates that this prevents the network from accurately estimating Q (note that this is the first time I've ever seen Q-values so far from what I got above, so the issue clearly lies with the clamping):

Recommend Projects

floringogianu / categorical-dqn Goto Github PK

categorical-dqn's People

Contributors

Stargazers

Watchers

Forkers

categorical-dqn's Issues

categorical update problem when l=u

Excessive clamping

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent