Giter Club home page Giter Club logo

d4pg's People

Contributors

msinto93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

d4pg's Issues

Could you add learning curves in the readme

A number of people would be interested in this as a reference implementation.

Could you share learning curves with/without Distributional loss and perhaps a comparison of your implementation with SAC (twin) and DDPG from other implementations?

It would add to the confidence in potentially using this implementation.

distribute computing

Is is possible to run the training on a cluster?
what modifications do you think are necessary?

Best regards
Alessandro

On custom ENVs

hi
tanks a lot for sharing code,
i'm trying to make this working on custom ENvs (robotic simulators) and i edited network for supporting image states and every things sounds good except 2:

1- first problem arises just in utils/network.py/line 123 , where Actor calculates scaled gradient in code:
def train_step(...)
....
self.grads_scaled = list(map(lambda x: tf.divide(x, batch_size), self.grads))
i got unsupported operation NoneType/Int error and for couple of days i couldn't solve this and i just commented this line and used unscaled grads and it worked.
but i'm pretty sure some things wrong with this. ... .

2-with this training starts but no learning can be seen during training as:
`
loss 0.0001 avg_return=-1000.00 0% 1/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 1/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 3/1000000 ....
.....
loss 0.0001 avg_return=-1000.00 0% 2050/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 2051/1000000 ....
......
loss 0.0001 avg_return=-1000.00 0% 5000/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 5001/1000000 ....

`
from above result, model loss is same and near zero and average return not changing during training (may not learns).
i didn't continue training after 5000 steps for lack of confidence of learning and training in my particular ENV needs heavy resources.

so case 1 is OK? or not and how can i fix it?
so is it natural in case 2 and should i continue training?

so any help or suggestion can be great.
thanks a lot again ,
best regards.

A bug during training

ๅ›พ็‰‡
Thank you for your code, it helps me a lot. In Learner.py line 131, target_Z_atoms[terminals_batch, :] = 0.0. I think 'terminals_batch' should be the indexes of the terminated sample, but the 'terminals_batch' used here is just 1 or 0. Hope for you reply!

ABOUT: target network's Z-atom values

Thank you for sharing code.
I'm confused about the target network's Z-atom values in line 127 in the file "learner.py".
if you want to get the the target network's Z-atom values, is that not the "output_logits",
if you want to get the "self.z_atoms = tf.lin_space(v_min, v_max, num_atoms)"(utils/network.py line 48) as target_Z_atoms, it will always be same. is that make no sense?

somthing about replay buffer

In your code, you use threading.Event() to control the competition between 'Learner' and 'Agent' when 'Learner' wants to delete extra samples from experience buffer and 'Agent' wants to add samples to experience buffer.
But would it be wrong that when several 'agents' all compete to add sample to experience buffer, but there is no 'lock' to control this competition among threading 'agents'???

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.