Giter Club home page Giter Club logo

acer's People

Contributors

ethancaballero avatar feryal avatar jsbyysheng avatar kaixhin avatar nat-d avatar random-user-x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

acer's Issues

Trust Region Updates

Hello @Kaixhin,

ACER/train.py

Line 97 in f22b07c

trust_loss += (param * z_star_p).sum()

I think that we should freeze the value of z_star_p by using z_star_p.detach().

In the second stage, we take advantage of back-propagation. Specifically, the updated gradient with
respect to φθ, that is z_∗, is back-propagated through the network to compute the derivatives with
respect to the parameters. 

Please let me know what do you think.

_trust_region_loss variations

In current form (without minus sign in front of _trust_region_loss), reward obtained just sits at ~9 on cartpole; it might take off after more steps but haven't tried it.
With minus sign in front, reward obtained starts changing immediately.

Doubts on Episodic Memory

ACER/main.py

Line 25 in 5b7ca5d

parser.add_argument('--memory-capacity', type=int, default=1000000, metavar='CAPACITY', help='Experience replay memory capacity')

Hello Kaixhin,

I do not get the idea of you using memory capacity greater that T_MAX. Shouldn't the implementation have memory capacity be less than the T_MAX. This looks more intuitive to me. Please let me know what do you think.

Mcelog

mcelog error in Aspire 3 a314 fatal kernel error

Doubt about gradient transfer to shared model

First of all, thanks for publishing this work! It's being really helpful to me to learn many RL concepts in practice!

In train.py the gradient transfer from the local model to the shared model is performed by:

# Transfers gradients from thread-specific model to shared model
def _transfer_grads_to_shared_model(model, shared_model):
  for param, shared_param in zip(model.parameters(), shared_model.parameters()):
    if shared_param.grad is not None:
      return
    shared_param._grad = param.grad

I have a few questions as this function looks sub-optimal to me:

  • Does the function returns if shared_param.grad is not None to avoid multiple workers to copy gradients at the same time?
  • If that is the case, it looks like the gradients are thrown away, am I correct?

If my understanding is not completely wrong, I was wondering if there is a reason behind not using a Lock mechanism when transferring the gradients. Could the function be as follows?

# Transfers gradients from thread-specific model to shared model
def _transfer_grads_to_shared_model(model, shared_model, lock):
 with lock: 
   for param, shared_param in zip(model.parameters(), shared_model.parameters()):
     shared_param._grad = param.grad

The same also may apply to the optimizer.step() call.

KL Divergence

ACER/train.py

Line 71 in 5b7ca5d

kl = F.kl_div(distribution.log(), ref_distribution, size_average=False)

Shouldn't the code be like F.kl_div(distribution, ref_distribution, size_average=False). Why there is a log of the distribution.

Doubts

Could you please let me know why there is a negative sign. I think that since we have already defined kl-divergence in the step before, we do not need a negative sign here. Please let me know how do you see this.

ACER/train.py

Line 81 in f22b07c

(-kl).backward(retain_graph=True)

batch_size for off-policy learning

Hey,
in the paper for each off-policy learning only one trajectory is sampled, while here you use 16. For the low-level input this won't be too much slower but for higher dimensions this might be an issue? what do you think?

feed the previous action to lstm

Hey,
nice work! One quick question, you mentioned The agent also receives the previous action and reward, but this part is not part of the acer algorithm, but only from the navigation paper right?

Doubts on memory

ACER/memory.py

Line 24 in 3676af0

def sample(self, maxlen=0):

  def sample(self, maxlen=0):
    L = len(self.memory)
    if L > 0:
      e = random.randrange(L)
      mem = self.memory[e]
      T = len(mem)
      # Take a random subset of trajectory if maxlen specified, otherwise return full trajectory
      if maxlen > 0 and T > maxlen + 1:
        t = random.randrange(T - maxlen - 1)  # Include next state after final "maxlen" state
        return mem[t:t + maxlen + 1]
      else:
        return mem
    else :
      return None

Can we write the code in this way? I do not get the reason behind using while (True)

Configurations for Atari games

Hello, @Kaixhin . I am very glad to find your implemention for ACER in pytorch, and I want to do something based on it. However, I want to test on Atari games(pixel games) instead of Cartpole (control games). There are a lot of hyper-parameters, I wonder if I need to tune them or keep them as provided by you? Looking forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.