Giter Club home page Giter Club logo

ddpg's People

Contributors

rmst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ddpg's Issues

bias annealing weight updates

I could be wrong but it does not seem that you are annealing the bias with important sampling as suggested in the paper(3.4).

w_i = (1/N * 1/P(i))^beta

I think you would have to multiply this w_i term with your gradients

error for No such file or directory

Hi,

Im running the code as-is for the InvertedPendulum-v1 environment. The output log looks like:

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "run.py", line 120, in
experiment.run(main)
File "/home/cuiyi/ddpg/experiment.py", line 41, in run
create(scriptdir, f, n)
File "/home/cuiyi/ddpg/experiment.py", line 77, in create
mkdir(path)
File "/home/cuiyi/ddpg/experiment.py", line 255, in mkdir
os.mkdir(path)
OSError: [Errno 2] No such file or directory: '../ddpg-results/experiment1'

what should I do to deal with the problem?

thanks.

DDPG Actor output saturate

Hello~ I have some question about DDPG
When my action dimension = 1, the result is good, but when my action dimension = 2 (the activation function is tanh and sigmoid), the output of actor will saturate.
Here is the result what I said: https://github.com/m5823779/DDPG
By the way, I use batch normalization only in my actor network.
Do you know where is the problem?

Need your help to understand a step

Could you pinpoint the code where actor's parameters (weights) are being updated ?

I am particularly looking for the step where gradient of critic is calculated wrt to action variables and actor wrt to theta. Sum of the multiplication of these gradients is used to update the actor (like given in the algorithm of the paper).

Why should terminated states removed from minibatch?

In replay_memory.py

    indices = np.zeros(size,dtype=np.int)
    for k in range(size):
      # find random index 
      invalid = True
      while invalid:
        # sample index ignore wrapping over buffer
        i = random.randint(0, self.n-2)
        # if i-th sample is current one or is terminal: get new index
        if i != self.i and not self.terminals[i]:
          invalid = False

      indices[k] = i

This part removes some candidates.

Is indices=np.random.randint(0, self.n, size) more appropriate?

Well done! But is it working?

Hi,

I was looking for such a repo to understand how to implement ddpg. Thanks for sharing.

I tried the Reacher-v1. However it does not seem to converge. So it this repo currently working or is it still under construction?

Also, have you considered using Keras to make things cleaner?

Cheers!

Reacher-v1 not training

Hi, I have just tried running Reacher-v1 for 1000000 timesteps with default settings and it didn't learn anything (it just get stuck at -12 test reward), but it looks like you made it running with some settings, what were these settings ?

Unrealistic rewards for InvertedPendulum

Hi,

Im running the code as-is for the InvertedPendulum-v1 environment. The output log looks like:

[2016-09-29 02:55:12,968] Making new env: InvertedDoublePendulum-v1
[2016-09-29 02:55:13,029] OpenGL_accelerate module loaded
[2016-09-29 02:55:13,076] Using accelerated ArrayDatatype
outdir: ddpg-results/IP/
True action space: [-1.], [ 1.] 
True state space: [-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf], [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf  inf] 
Filtered action space: [-1.], [ 1.]
Filtered state space: [-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf], [ inf  inf  inf  inf  inf  inf  inf  inf  inf   inf  inf]
((11,), (1,))
{'_entry_point': 'gym.envs.mujoco:InvertedDoublePendulumEnv',
 '_env_name': 'InvertedDoublePendulum',
 '_kwargs': {},
 '_local_only': False,
 'id': 'InvertedDoublePendulum-v1',
 'nondeterministic': False,
 'reward_threshold': 9100.0,
 'tags': [], 
 'timestep_limit': 1000,
 'trials': 100}
Average test return 94.6561916032 after 0 timesteps of training
Average training return 64.8650261368 after 10004 timesteps of training
Average test return 94.4441357631 after 10004 timesteps of training
Average training return 62.8453825653 after 20006 timesteps of training
Average test return 94.4936849296 after 20006 timesteps of training
Average training return 63.6538282778 after 30008 timesteps of training
Average test return 94.9548271625 after 30008 timesteps of training
Average training return 63.9039428219 after 40011 timesteps of training
Average test return 94.2871854837 after 40011 timesteps of training
Average training return 63.2686654373 after 50014 timesteps of training
Average test return 98.8836603337 after 50014 timesteps of training
Average training return 145.89652752 after 60042 timesteps of training
Average test return 295.657725759 after 60042 timesteps of training
Average training return 192.307169483 after 70066 timesteps of training
Average test return 257.732447567 after 70066 timesteps of training
Average training return 226.691339415 after 80067 timesteps of training
Average test return 473.731095604 after 80067 timesteps of training
Average training return 255.541847852 after 90069 timesteps of training
Average test return 435.084465257 after 90069 timesteps of training
Average training return 254.536465181 after 100089 timesteps of training
Average test return 630.270166648 after 100089 timesteps of training
Average training return 250.049665622 after 110105 timesteps of training
Average test return 2436.58758156 after 110105 timesteps of training
Average training return 244.717938695 after 120121 timesteps of training
Average test return 93368.0844892 after 120121 timesteps of training

And then the code kind of just exits (although I'd asked it to train for 1 million train steps). Do you experience this sort of behavior as well? I'm guessing there is a subtle bug in the code which allows it to score such high episodic returns as 94k.

Output of actor will saturate

Hello~ I have some question about DDPG
When my action dimension = 1, the result is good, but when my action dimension = 2 (the activation function is tanh and sigmoid), the output of actor will saturate.
Here is the result what I said: https://github.com/m5823779/DDPG
By the way, I use batch normalization only in my actor network.
Do you know where is the problem?

Concurrent Read Write on tf Variable?

I've seen several possible concurrent reads and writes in your code.

  • train_p and train_q has no control dependency on each other, train_q updates q_theta which is also used in train_p. It is possible they may conflict with each other.
  • train_p updates p_theta_target, which is used in train_q, and the order is also undefined.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.