kaixhin / acer Goto Github PK
View Code? Open in Web Editor NEWActor-critic with experience replay
License: MIT License
Actor-critic with experience replay
License: MIT License
Hello @Kaixhin,
Line 97 in f22b07c
z_star_p
by using z_star_p.detach()
.
In the second stage, we take advantage of back-propagation. Specifically, the updated gradient with
respect to φθ, that is z_∗, is back-propagated through the network to compute the derivatives with
respect to the parameters.
Please let me know what do you think.
In current form (without minus sign in front of _trust_region_loss), reward obtained just sits at ~9 on cartpole; it might take off after more steps but haven't tried it.
With minus sign in front, reward obtained starts changing immediately.
Line 25 in 5b7ca5d
Hello Kaixhin,
I do not get the idea of you using memory capacity greater that T_MAX. Shouldn't the implementation have memory capacity be less than the T_MAX. This looks more intuitive to me. Please let me know what do you think.
mcelog error in Aspire 3 a314 fatal kernel error
First of all, thanks for publishing this work! It's being really helpful to me to learn many RL concepts in practice!
In train.py the gradient transfer from the local model to the shared model is performed by:
# Transfers gradients from thread-specific model to shared model
def _transfer_grads_to_shared_model(model, shared_model):
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
if shared_param.grad is not None:
return
shared_param._grad = param.grad
I have a few questions as this function looks sub-optimal to me:
shared_param.grad is not None
to avoid multiple workers to copy gradients at the same time?If my understanding is not completely wrong, I was wondering if there is a reason behind not using a Lock
mechanism when transferring the gradients. Could the function be as follows?
# Transfers gradients from thread-specific model to shared model
def _transfer_grads_to_shared_model(model, shared_model, lock):
with lock:
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
shared_param._grad = param.grad
The same also may apply to the optimizer.step()
call.
Line 71 in 5b7ca5d
Shouldn't the code be like F.kl_div(distribution, ref_distribution, size_average=False). Why there is a log of the distribution.
Could you please let me know why there is a negative sign. I think that since we have already defined kl-divergence in the step before, we do not need a negative sign here. Please let me know how do you see this.
Line 81 in f22b07c
Hey,
in the paper for each off-policy learning only one trajectory is sampled, while here you use 16. For the low-level input this won't be too much slower but for higher dimensions this might be an issue? what do you think?
Hey,
nice work! One quick question, you mentioned The agent also receives the previous action and reward
, but this part is not part of the acer algorithm, but only from the navigation paper right?
Line 24 in 3676af0
def sample(self, maxlen=0):
L = len(self.memory)
if L > 0:
e = random.randrange(L)
mem = self.memory[e]
T = len(mem)
# Take a random subset of trajectory if maxlen specified, otherwise return full trajectory
if maxlen > 0 and T > maxlen + 1:
t = random.randrange(T - maxlen - 1) # Include next state after final "maxlen" state
return mem[t:t + maxlen + 1]
else:
return mem
else :
return None
Can we write the code in this way? I do not get the reason behind using while (True)
Hello, @Kaixhin . I am very glad to find your implemention for ACER in pytorch, and I want to do something based on it. However, I want to test on Atari games(pixel games) instead of Cartpole (control games). There are a lot of hyper-parameters, I wonder if I need to tune them or keep them as provided by you? Looking forward to your reply.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.