msinto93 / d4pg Goto Github PK
View Code? Open in Web Editor NEWTensorflow implementation of a Deep Distributed Distributional Deterministic Policy Gradients (D4PG) network, trained on OpenAI Gym environments.
License: MIT License
Tensorflow implementation of a Deep Distributed Distributional Deterministic Policy Gradients (D4PG) network, trained on OpenAI Gym environments.
License: MIT License
Does it based on the cumulative reward?
A number of people would be interested in this as a reference implementation.
Could you share learning curves with/without Distributional loss and perhaps a comparison of your implementation with SAC (twin) and DDPG from other implementations?
It would add to the confidence in potentially using this implementation.
Is is possible to run the training on a cluster?
what modifications do you think are necessary?
Best regards
Alessandro
hi
tanks a lot for sharing code,
i'm trying to make this working on custom ENvs (robotic simulators) and i edited network for supporting image states and every things sounds good except 2:
1- first problem arises just in utils/network.py/line 123 , where Actor calculates scaled gradient in code:
def train_step(...)
....
self.grads_scaled = list(map(lambda x: tf.divide(x, batch_size), self.grads))
i got unsupported operation NoneType/Int error and for couple of days i couldn't solve this and i just commented this line and used unscaled grads and it worked.
but i'm pretty sure some things wrong with this. ... .
2-with this training starts but no learning can be seen during training as:
`
loss 0.0001 avg_return=-1000.00 0% 1/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 1/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 3/1000000 ....
.....
loss 0.0001 avg_return=-1000.00 0% 2050/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 2051/1000000 ....
......
loss 0.0001 avg_return=-1000.00 0% 5000/1000000 ....
loss 0.0001 avg_return=-1000.00 0% 5001/1000000 ....
`
from above result, model loss is same and near zero and average return not changing during training (may not learns).
i didn't continue training after 5000 steps for lack of confidence of learning and training in my particular ENV needs heavy resources.
so case 1 is OK? or not and how can i fix it?
so is it natural in case 2 and should i continue training?
so any help or suggestion can be great.
thanks a lot again ,
best regards.
Thank you for sharing code.
I'm confused about the target network's Z-atom values in line 127 in the file "learner.py".
if you want to get the the target network's Z-atom values, is that not the "output_logits",
if you want to get the "self.z_atoms = tf.lin_space(v_min, v_max, num_atoms)"(utils/network.py line 48) as target_Z_atoms, it will always be same. is that make no sense?
In your code, you use threading.Event() to control the competition between 'Learner' and 'Agent' when 'Learner' wants to delete extra samples from experience buffer and 'Agent' wants to add samples to experience buffer.
But would it be wrong that when several 'agents' all compete to add sample to experience buffer, but there is no 'lock' to control this competition among threading 'agents'???
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.