msinto93 / d4pg Goto Github PK

Tensorflow implementation of a Deep Distributed Distributional Deterministic Policy Gradients (D4PG) network, trained on OpenAI Gym environments.

License: MIT License

Python 99.82% Shell 0.18%

d4pg's People

Contributors

Stargazers

Watchers

d4pg's Issues

How to set the Vmin and Vmax for the other mujoco tasks?

Does it based on the cumulative reward?

On custom ENVs

hi
tanks a lot for sharing code,
i'm trying to make this working on custom ENvs (robotic simulators) and i edited network for supporting image states and every things sounds good except 2:

1- first problem arises just in utils/network.py/line 123 , where Actor calculates scaled gradient in code:
def train_step(...)
....
self.grads_scaled = list(map(lambda x: tf.divide(x, batch_size), self.grads))
i got unsupported operation NoneType/Int error and for couple of days i couldn't solve this and i just commented this line and used unscaled grads and it worked.
but i'm pretty sure some things wrong with this. ... .

2-with this training starts but no learning can be seen during training as:
`
loss 0.0001 avg_return=-1000.00 0% 1/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 1/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 3/1000000 ....
.....
loss 0.0001 avg_return=-1000.00 0% 2050/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 2051/1000000 ....
......
loss 0.0001 avg_return=-1000.00 0% 5000/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 5001/1000000 ....

`
from above result, model loss is same and near zero and average return not changing during training (may not learns).
i didn't continue training after 5000 steps for lack of confidence of learning and training in my particular ENV needs heavy resources.

so case 1 is OK? or not and how can i fix it?
so is it natural in case 2 and should i continue training?

so any help or suggestion can be great.
thanks a lot again ,
best regards.

distribute computing

Is is possible to run the training on a cluster?
what modifications do you think are necessary?

Best regards
Alessandro

Could you add learning curves in the readme

A number of people would be interested in this as a reference implementation.

Could you share learning curves with/without Distributional loss and perhaps a comparison of your implementation with SAC (twin) and DDPG from other implementations?

It would add to the confidence in potentially using this implementation.

somthing about replay buffer

In your code, you use threading.Event() to control the competition between 'Learner' and 'Agent' when 'Learner' wants to delete extra samples from experience buffer and 'Agent' wants to add samples to experience buffer.
But would it be wrong that when several 'agents' all compete to add sample to experience buffer, but there is no 'lock' to control this competition among threading 'agents'???

ABOUT: target network's Z-atom values

Thank you for sharing code.
I'm confused about the target network's Z-atom values in line 127 in the file "learner.py".
if you want to get the the target network's Z-atom values, is that not the "output_logits",
if you want to get the "self.z_atoms = tf.lin_space(v_min, v_max, num_atoms)"(utils/network.py line 48) as target_Z_atoms, it will always be same. is that make no sense?

A bug during training

Thank you for your code, it helps me a lot. In Learner.py line 131, target_Z_atoms[terminals_batch, :] = 0.0. I think 'terminals_batch' should be the indexes of the terminated sample, but the 'terminals_batch' used here is just 1 or 0. Hope for you reply!

msinto93 / d4pg Goto Github PK

d4pg's People

Contributors

Stargazers

Watchers

Forkers

d4pg's Issues

How to set the Vmin and Vmax for the other mujoco tasks?

On custom ENVs

distribute computing

Could you add learning curves in the readme

somthing about replay buffer

ABOUT: target network's Z-atom values

A bug during training

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent