Giter Club home page Giter Club logo

rail-berkeley / softlearning Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 235.0 13.43 MB

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Home Page: https://sites.google.com/view/sac-and-applications

License: Other

Shell 0.60% Python 99.40%
deep-learning deep-neural-networks deep-reinforcement-learning machine-learning reinforcement-learning soft-actor-critic

softlearning's People

Contributors

alacarter avatar azhou42 avatar ben-eysenbach avatar brandontrabucco avatar dependabot[bot] avatar haarnoja avatar hartikainen avatar henry-zhang-bohan avatar hrtang avatar johannespitz avatar nflu avatar sjoerdvansteenkiste avatar vitchyr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

softlearning's Issues

Resume training

Hi,
I am trying resuming a training and I think this works over the --restore parameter? But when I try this I get the error message that a file with ...tune_metadata was not found. And indeed in my checkpoints is no file with this ending? What is the best way to resume experiments?!

Conda issues installing patchelf

I'll try to find a work-around. Here's the output. Thanks!

11:04:39 ~/git_repos/softlearning:master
$ conda env create -f environment.yml
Solving environment: failed

ResolvePackageNotFound:

  • patchelf=0.9

No module named examples.development.simulate_policy

In the README.md, the following command is mentioned in order to simulate the resulting policy:
python -m examples.development.simulate_policy […]
However, I could not find this module in the development directory.
When trying to run, python outputs the following error:
/home/user/miniconda3/envs/softlearning/bin/python: No module named examples.development.simulate_policy

[question] Action Smoothing

Hello,
In the blog and in the code, action smoothing is mentioned but never explained...
In the code, one does apparently something like:

# smoothing coeff (0 -> no smoothing, 1 -> max smoothing)
alpha = some_value
beta = sqrt(1 - alpha ** 2)
# raw latents is sampled from a MultivariateNormalDiag with zero mean and unit std
smoothing_latent = alpha * smoothing_latent + raw_latents
latent = beta * smoothing_latent

action = mean + std * latent
action = tanh(action)

My question is where does this smoothing comes from? and why not, for instance, an exponential smoothing?

Also, can raw_latent be seen as the noise?

[Enhancement] Make replay buffer memory-efficient

Currently, the replay buffer stores each observation twice (since it stores a tuple of state, action, next_state, reward, done for every transition). The buffer consumes large amounts of memory for environments with high dimensional observations (like images). For example, the memory consumption for an experiment with 48x48 RGB images and 1 million timesteps is about ~56GB, and this could be cut down to ~28GB.

Union pool

Hi,

Seems like the union pool implementation is unfinished. Is that going to be done soon?

Thanks!

Pusher low level policy for ('any',-1) is not learning

Hi,

I am running your code. Pusher low-level policy trained for (-1, 'any') works fine but ('any',-1) doesn't do anything. Is there any fix for it or do you have your low-level trained models that I can use for my work?

Gaussian mixture policies

Are GMM policies stable with the Q-only formulation (without V function)? I see that this repository doesn't contain GMM policies while the old one (haarnoja/sac) does.
I am trying to get it working on rlkit but it seems like GMM policies are difficult to train without the V function.

Code slower than before refactoring

For some reason, the training is currently tens of percents slower (in terms of wall clock time) than it was prior to the latest refactor. Need to figure out what slows it down and fix it.

Progress on Deepmind control suite?

Hi!

I noticed that a recent commit was pushed to the repo to support running SAC on the Deepmind control suite.

I was wondering if the current code base is ready to run on the Deepmind control suite and if not, what else remains to be done? Maybe I could help. Thanks!

unstable training curve for default SQL

Hi,

First of all, thanks for the brilliant papers and making the codes open-source.

I was running SQL on half-cheetah with default setting using the command:
--universe=gym --domain=HalfCheetah --task=v2 --algorithm=SQL --exp-name=my-sql-experiment-2 --checkpoint-frequency=1000.

It uses Gaussian policy and a reward scale of 30, which I think implies a very low entropy regularization.

However, I obtained very unstable training return curve and evaluation return curve as below:

image

image

I was wondering if there is anything wrong with the default SQL setting and how do you test the SQL? I tried to lower the reward scale, and it is leading to a lower but a little bit stabler return curve.

Thanks!

Module 'gym' has no attribute 'register' on MacOS Mojave 10.14.4

Hi All, when I tried to run a reward learning task (https://github.com/avisingh599/reward-learning-rl) with softlearning environment, the following error occurred: "AttributeError: module 'gym' has no attribute 'register'"

However when I ran import gym and gym.register() on a separate python script on Pycharm it works fine, e.g. able to find the register module in gym. I had a look at the previous issues posted for Softlearning and think this is a gym adapter issue? But I am not sure how to manually add this environment/task onto gym_adapter in the Softlearning package? Many thanks for your help!!

image

Environment seeding doesn't allow reproducibility

Hi Hartikainen,

Thank you for maintaining the repo. Sorry if this is done elsewhere in the code, but shouldn't we set the seed of the environments (both training and eval) after creating them here for reproducibility purposes?

Gym maintains its own internal copy of numpy and use that internal copy to sample the initial state. Setting the seed of the global numpy module does not affect this internal copy of numpy.

Thanks!

Import error. Trying to rebuild mujoco_py.

$  softlearning run_example_local examples.development \
>     --universe=gym \
>     --domain=HalfCheetah \
>     --task=v3 \
>     --exp-name=my-sac-experiment-1 \
>     --checkpoint-frequency=1000

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING: Logging before flag parsing goes to stderr.
I0418 01:08:50.825603 140032189581056 acceleratesupport.py:13] OpenGL_accelerate module loaded
I0418 01:08:50.832047 140032189581056 arraydatatype.py:270] Using accelerated ArrayDatatype
I0418 01:08:51.017610 140032189581056 __init__.py:34] MuJoCo library version is: 200
2019-04-18 01:08:51,105 INFO node.py:439 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-04-18_01-08-51_27162/logs.
2019-04-18 01:08:51,211 INFO services.py:364 -- Waiting for redis server at 127.0.0.1:20587 to respond...
2019-04-18 01:08:51,320 INFO services.py:364 -- Waiting for redis server at 127.0.0.1:52635 to respond...
2019-04-18 01:08:51,321 INFO services.py:761 -- Starting Redis shard with 10.0 GB max memory.
2019-04-18 01:08:51,337 WARNING services.py:1301 -- Warning: Capping object memory store to 20.0GB. To increase this further, specify `object_store_memory` when calling ray.init() or ray start.
2019-04-18 01:08:51,337 INFO services.py:1449 -- Starting the Plasma object store with 20.0 GB memory using /dev/shm.
2019-04-18 01:08:51,885 INFO tune.py:139 -- Did not find checkpoint file in /home/yrli/ray_results/gym/HalfCheetah/v3/2019-04-18T01-08-51-my-sac-experiment-1.
2019-04-18 01:08:51,885 INFO tune.py:145 -- Starting a new experiment.
2019-04-18 01:08:51,892 INFO web_server.py:241 -- Starting Tune Server...
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/56 CPUs, 0/8 GPUs
Memory usage on this node: 4.8/270.1 GB

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 56/56 CPUs, 0/8 GPUs
Memory usage on this node: 4.9/270.1 GB
Result logdir: /home/yrli/ray_results/gym/HalfCheetah/v3/2019-04-18T01-08-51-my-sac-experiment-1
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - id=f24a78d2-seed=4956:       RUNNING

(pid=27322) 
(pid=27322) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
(pid=27322) For more information, please see:
(pid=27322)   * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
(pid=27322)   * https://github.com/tensorflow/addons
(pid=27322) If you depend on functionality not listed there, please file an issue.
(pid=27322) 
(pid=27322) 2019-04-18 01:08:55.340353: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=27322) Using seed 4956
(pid=27322) 2019-04-18 01:08:55.424360: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=27322) 2019-04-18 01:08:55.424399: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:161] retrieving CUDA diagnostic information for host: 64.site
(pid=27322) 2019-04-18 01:08:55.424408: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:168] hostname: 64.site
(pid=27322) 2019-04-18 01:08:55.424460: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:192] libcuda reported version is: 410.104.0
(pid=27322) 2019-04-18 01:08:55.424498: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:196] kernel reported version is: 410.104.0
(pid=27322) 2019-04-18 01:08:55.424507: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version seems to match DSO: 410.104.0
(pid=27322) 2019-04-18 01:08:55.426316: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400060000 Hz
(pid=27322) 2019-04-18 01:08:55.429028: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5d18810 executing computations on platform Host. Devices:
(pid=27322) 2019-04-18 01:08:55.429054: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
(pid=27322) Import error. Trying to rebuild mujoco_py.
(pid=27322) Compiling /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.pyx because it depends on /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/pxd/mujoco.pxd.
(pid=27322) [1/1] Cythonizing /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.pyx
(pid=27322) running build_ext
(pid=27322) building 'mujoco_py.cymj' extension
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/gl
(pid=27322) gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py -I/home/yrli/.mujoco/mujoco200/include -I/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/numpy/core/include -I/home/yrli/anaconda3/envs/softlearning/include/python3.6m -c /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.c -o /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.o -fopenmp -w
(pid=27322) gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py -I/home/yrli/.mujoco/mujoco200/include -I/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/numpy/core/include -I/home/yrli/anaconda3/envs/softlearning/include/python3.6m -c /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.c -o /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.o -fopenmp -w
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6/mujoco_py
(pid=27322) gcc -pthread -shared -L/home/yrli/anaconda3/envs/softlearning/lib -Wl,-rpath=/home/yrli/anaconda3/envs/softlearning/lib,--no-as-needed -L/home/yrli/anaconda3/envs/softlearning/lib -Wl,-rpath=/home/yrli/anaconda3/envs/softlearning/lib,--no-as-needed /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.o /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.o -L/home/yrli/.mujoco/mujoco200/bin -L/home/yrli/anaconda3/envs/softlearning/lib -Wl,--enable-new-dtags,-R/home/yrli/.mujoco/mujoco200/bin -lmujoco200 -lglewosmesa -lOSMesa -lGL -lpython3.6m -o /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6/mujoco_py/cymj.cpython-36m-x86_64-linux-gnu.so -fopenmp
2019-04-18 01:09:53,990 ERROR trial_runner.py:426 -- Error processing event.
Traceback (most recent call last):
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 389, in _process_events
    result = self.trial_executor.fetch_result(trial)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 252, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 2288, in get
    raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=27322, host=64.site)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/mujoco/mujoco_env.py", line 11, in <module>
    import mujoco_py
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/__init__.py", line 3, in <module>
    from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/builder.py", line 503, in <module>
    cymj = load_cython_ext(mujoco_path)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/builder.py", line 106, in load_cython_ext
    mod = load_dynamic_ext('cymj', cext_so_path)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/builder.py", line 124, in load_dynamic_ext
    return loader.load_module()
ImportError: dlopen: cannot load any more object with static TLS

During handling of the above exception, another exception occurred:

ray_ExperimentRunner:train() (pid=27322, host=64.site)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/trainable.py", line 150, in train
    result = self._train()
  File "/data1/yrli/softlearning/examples/development/main.py", line 77, in _train
    self._build()
  File "/data1/yrli/softlearning/examples/development/main.py", line 44, in _build
    get_environment_from_params(environment_params['training']))
  File "/data1/yrli/softlearning/softlearning/environments/utils.py", line 33, in get_environment_from_params
    return get_environment(universe, domain, task, environment_kwargs)
  File "/data1/yrli/softlearning/softlearning/environments/utils.py", line 24, in get_environment
    return ADAPTERS[universe](domain, task, **environment_params)
  File "/data1/yrli/softlearning/softlearning/environments/adapters/gym_adapter.py", line 66, in __init__
    env = gym.envs.make(env_id, **kwargs)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/registration.py", line 183, in make
    return registry.make(id, **kwargs)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/registration.py", line 125, in make
    env = spec.make(**kwargs)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/registration.py", line 88, in make
    cls = load(self._entry_point)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/registration.py", line 17, in load
    mod = importlib.import_module(mod_name)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/mujoco/__init__.py", line 1, in <module>
    from gym.envs.mujoco.mujoco_env import MujocoEnv
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/mujoco/mujoco_env.py", line 13, in <module>
    raise error.DependencyNotInstalled("{}. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)".format(e))
gym.error.DependencyNotInstalled: dlopen: cannot load any more object with static TLS. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)

Bound std of Gaussian policy via beta-sigmoid ?

If I got it correctly, the logstd of the Gaussian policy is clipped via min/max range and std is retrieved by exponentiation.

I am curious if using beta-sigmiodal function to model the logstd would be a tiny bit more stable, because it allows smooth lower/upper bound and less sharp gradient for larger magnitude.

e.g.

logvar = network output
var = 1/(1 + self.beta*torch.exp(-logvar))
var = min_var + (max_var - min_var)*var

Automating alpha

Alpha loss is defined as (code):
alpha_loss = -tf.reduce_mean(log_alpha * tf.stop_gradient(log_pis + self._target_entropy))

In other words:
alpha_loss = log_alpha * (- <Negative constant>)
alpha_loss = log_alpha * (<Positive constant>)

So minimizing alpha_loss means minimizing log_alpha, this means that alphaalways goes to zero no matter what, and this is indeed what I'm confirming in my experiments.

I'm obviously forgeting something, but I'm not being able to figure it out.

Error occurs while runing

When I run the command given in README :

python -m examples.development.main \ --mode=local \ --universe=gym \ --domain=HalfCheetah \ --task=v2 \ --exp-name=my-sac-experiment-1 \ --checkpoint-frequency=1000 # Save the checkpoint to resume training later

to train the agent , however error occurs .

pygame 1.9.4 Hello from the pygame community. https://www.pygame.org/contribute.html Traceback (most recent call last): File "/home/deepglint/anaconda2/envs/softlearning/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/home/deepglint/anaconda2/envs/softlearning/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/deepglint/softlearning/examples/development/main.py", line 14, in <module> from softlearning.algorithms.utils import get_algorithm_from_variant File "/home/deepglint/softlearning/softlearning/algorithms/__init__.py", line 2, in <module> from .sac import SAC File "/home/deepglint/softlearning/softlearning/algorithms/sac.py", line 47 **kwargs, ^ SyntaxError: invalid syntax
I would very appreciate it if you could give me a solution.

Additional information for baseline algorithms

Hi,

Would you please share the information for the baselines, e.g., DDPG, TD3, PPO, with their hyper-parameter settings? (If you used open-source codes, would you please give me the link?)

While this repository only considers soft-learning algorithms, I think refactoring previous baseline algorithms in this framework might be quite useful.

Thanks.

SAC Hyperparameters MountainCarContinuous-v0 - Env with deceptive reward

Hello,

I've tried in vain to find suitable hyperparameters for SAC in order to solve MountainCarContinuous-v0.

Even with hyperparameter tuning (see "add-trpo" branch of rl baselines zoo), I was not able to solve it consistently (if during random exploration it finds the goal, then it will work, otherwise, it will be stuck in a local minima).
I also encountered that issue when trying SAC on another environment with deceptive reward (bit flipping env, trying to apply HER + SAC, see here).

Did you manage to solve that problem? If so, what hyperparameters did you use?

Note: I am using the SAC implementation from stable-baselines that works pretty well on all others problems (but where the reward is dense).

Unable to reproduce result on HalfCheetah-v2

I am unable to obtain the result as reported in the paper on the openai environment HalfCheetah-v2. The commit used to obtain this result is 1f6147c, which isn't too long ago. The result is averaged over 5 random initial seeds.

halfcheetah

Do you know what might be causing this issue? Thank you!

I am able to obtain the result as reported (or close to it) in the paper on the remaining environments, posted here for reference.

ant
walker
humanoid
hopper

SAC checkpointing fails when using fixed entropy coefficient

@aviralkumar2907 ran into a problem where setting the target entropy to a fixed value makes the checkpointing crash:

Traceback (most recent call last):
  File "/home/kristian/github/hartikainen/ray/python/ray/tune/trial_runner.py", line 399, in _process_events
    self._checkpoint_trial_if_needed(trial)
  File "/home/kristian/github/hartikainen/ray/python/ray/tune/trial_runner.py", line 430, in _checkpoint_trial_if_needed
    self.trial_executor.save(trial, storage=Checkpoint.DISK)
  File "/home/kristian/github/hartikainen/ray/python/ray/tune/ray_trial_executor.py", line 317, in save
    trial._checkpoint.value = ray.get(trial.runner.save.remote())
  File "/home/kristian/github/hartikainen/ray/python/ray/worker.py", line 2211, in get
    raise value
ray.worker.RayTaskError: ray_ExperimentRunner:save() (pid=4371, host=jensen2)
  File "/home/kristian/github/hartikainen/ray/python/ray/tune/trainable.py", line 226, in save
    checkpoint = self._save(checkpoint_dir)
  File "/home/kristian/github/hartikainen/softlearning/examples/development/main.py", line 120, in _save
    tf_checkpoint = self._get_tf_checkpoint()
  File "/home/kristian/github/hartikainen/softlearning/examples/development/main.py", line 87, in _get_tf_checkpoint
    tf_checkpoint = tf.train.Checkpoint(**self.algorithm.tf_saveables)
  File "/home/kristian/github/hartikainen/softlearning/softlearning/algorithms/sac.py", line 430, in tf_saveables
    '_alpha_optimizer': self._alpha_optimizer,
AttributeError: 'SAC' object has no attribute '_alpha_optimizer'

MultivariateNormalDiag log_prob with target_entropy and alpha

Thank you for sharing the code.

For Gaussian policies, in this implementation as well as the original SAC repository, the log_prob method of the MultivariateNormalDiag class is used to compute the log probability of an action. This method returns the probability density and not the probability so log probabilities can be greater than 0. The issues arise in the learning objective for alpha. I can set a target_entropy = 0.0 in which you'd expect alpha to go to 0 (an entropy of 0 indicates a deterministic policy) but this is not the case since log_pi can be greater than or less than 0.

Is there something simple I'm missing here?

Thank you again.

invalidgitrepository error

Invalid git repository (last line of image).

Please tell me which path (repositry) it is looking for.
20190602_111750

Run on real robot

Hi,
is there currently some implementation to run this outside of simulation?

No module named 'softlearning.utils'

I used virtualenv instead of conda on ubuntu 16.04
Ran python setup.py install after installing all the requirements.
When I ran the first example to train the agent I got the following error:
No module named softlearning.utils
Any idea what caused this? Thanks!

(venv) jc@jc-Precision-5510:~/research/softlearning$ softlearning run_example_local examples.development     --universe=gym     --domain=HalfCheetah     --task=v3     --exp-name=my-sac-experiment-1     --checkpoint-frequency=1000

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Traceback (most recent call last):
  File "/home/jc/research/softlearning/venv/bin/softlearning", line 11, in <module>
    load_entry_point('softlearning==0.0.1', 'console_scripts', 'softlearning')()
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/scripts/console_scripts.py", line 202, in main
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/scripts/console_scripts.py", line 71, in run_example_local_cmd
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/examples/instrument.py", line 203, in run_example_local
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/examples/development/__init__.py", line 21, in get_parser
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/examples/utils.py", line 8, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/algorithms/__init__.py", line 1, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/algorithms/sql.py", line 9, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/algorithms/rl_algorithm.py", line 12, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/samplers/__init__.py", line 4, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/samplers/remote_sampler.py", line 10, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/samplers/utils.py", line 5, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/replay_pools/__init__.py", line 4, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/replay_pools/trajectory_replay_pool.py", line 8, in <module>
ModuleNotFoundError: No module named 'softlearning.utils'

Simulate policy not working anymore

In simulate_policy.py I had to replace
render_kwargs={'mode': args.render_mode}
on line 70

and in base_policy.py line 83 replaced it to
super(LatentSpacePolicy, self).init(kwargs['observation_keys'])

GPU issues

I run the command line 'CUDA_VISIBLE_DEVICES=3 python -m examples.development.main --mode=local --universe=gym --domain=Hopper --task=v2 --exp-name=test --checkpoint-frequency=1000 --cpus=16 --gpus 1 --trial-cpus 16 --trial-gpus 1'

But when I look up the 'nvidia-smi', no gpu is used.

My question is how to use gpu to run the codes?

Thanks!

Concatenating dm_control observations causes error due to uneven shapes

The code exits with error when I tried running with humanoid run task because of this line

flattened_observation = np.concatenate([

The reason is that the 'head_height' attribute of the observation has 0 dimension, so calling np.cat would complain that all the items in the concatenating tuple has to have the same dimension.

Originally posted by @quanvuong in #69 (comment)

How to set random seed?

Thanks for the repo!

How can I set the random seed for a run? There doesn’t seem to be an option to set the seed using command line arguments.

Bug in HER replay pool (and multi goal setup not finished)

Hi,

I noticed there is a bug in the HER replay pool. I would send a PR, but it seems that the whole multi goal setup is not nearly ready at the moment. There is no support in the policies (no policy has goal_keys attributes) and I couldn't really figure out that env.goal_key_map is supposed to represent. Therefore all the tests in HER replay pool are failing and the bug wasn't caught by the tests.

The bug:

def REPLACE_FULL_OBSERVATION(original_batch,
                             resampled_batch,
                             where_resampled,
                             environment):
    batch_flat = flatten(original_batch)
    resampled_batch_flat = flatten(original_batch)  # wrong
    goal_keys = [
        key for key in batch_flat.keys()
        if key[0] == 'goals'
    ]
    for key in goal_keys:
        assert (batch_flat[key][where_resampled].shape
                == resampled_batch_flat[key].shape)
        batch_flat[key][where_resampled] = (
            resampled_batch_flat[key])

    return unflatten(batch_flat)

should be

def REPLACE_FULL_OBSERVATION(original_batch,
                             resampled_batch,
                             where_resampled,
                             environment):
    batch_flat = flatten(original_batch)
    resampled_batch_flat = flatten(resampled_batch)  # correct
    goal_keys = [
        key for key in batch_flat.keys()
        if key[0] == 'goals'
    ]
    for key in goal_keys:
        assert (batch_flat[key][where_resampled].shape
                == resampled_batch_flat[key].shape)
        batch_flat[key][where_resampled] = (
            resampled_batch_flat[key])

    return unflatten(batch_flat)

Since I am trying to experiment with Hierarchical RL with multiple goals, I am more than happy to contribute with the mutli goal setup. From the existing code, I couldn't figure out what's the big picture.

All the best,

Lukas

Error on Docker/GPU installation

Hi, thank you for sharing your source code and interesting results!

I've run the following command for Docker/GPU installation:

export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/docker-compose.dev.gpu.yml \
        up \
        -d \
        --force-recreate

After then, I got the following error message:

Step 19/23 : RUN echo "${MJKEY}" > /root/.mujoco/mjkey.txt     && sed -i -e 's/^tensorflow==/tensorflow-gpu==/g' /tmp/requirements.txt     && conda env update -f /tmp/environment.yml     && rm /root/.mujoco/mjkey.txt     && rm /tmp/requirements.txt     && rm /tmp/environment.yml
 ---> Running in 9d088bf80325
Solving environment: ...working... done
ruamel_yaml-0.15.46  | 245 KB    | ########## | 100% 
ncurses-6.1          | 958 KB    | ########## | 100% 
python-3.6.5         | 29.4 MB   | ########## | 100% 
pip-18.1             | 1.8 MB    | ########## | 100% 
chardet-3.0.4        | 189 KB    | ########## | 100% 
pycosat-0.6.3        | 104 KB    | ########## | 100% 
requests-2.21.0      | 85 KB     | ########## | 100% 
six-1.12.0           | 22 KB     | ########## | 100% 
wheel-0.32.3         | 35 KB     | ########## | 100% 
certifi-2018.11.29   | 146 KB    | ########## | 100% 
urllib3-1.24.1       | 149 KB    | ########## | 100% 
cryptography-2.3.1   | 585 KB    | ########## | 100% 
zlib-1.2.11          | 120 KB    | ########## | 100% 
setuptools-40.6.3    | 625 KB    | ########## | 100% 
cffi-1.11.5          | 212 KB    | ########## | 100% 
patchelf-0.9         | 71 KB     | ########## | 100% 
pycparser-2.19       | 174 KB    | ########## | 100% 
idna-2.8             | 133 KB    | ########## | 100% 
pysocks-1.6.8        | 22 KB     | ########## | 100% 
asn1crypto-0.24.0    | 155 KB    | ########## | 100% 
pyopenssl-18.0.0     | 82 KB     | ########## | 100% 
conda-4.5.12         | 1.0 MB    | ########## | 100% 
Downloading and Extracting Packages
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting git+https://github.com/openai/gym.git@49cd48020f6760630a7317cb3529a22de6f12f2e#[all] (from -r /tmp/./requirements.txt (line 36))
  Cloning https://github.com/openai/gym.git (to revision 49cd48020f6760630a7317cb3529a22de6f12f2e) to ./pip-req-build-8ky3z5dn
Collecting git+https://github.com/vitchyr/multiworld.git@d76b3dae2e8cbca02924f93d6cc0239c552f6408 (from -r /tmp/./requirements.txt (line 50))
  Cloning https://github.com/vitchyr/multiworld.git (to revision d76b3dae2e8cbca02924f93d6cc0239c552f6408) to ./pip-req-build-g3i3y_w5
Collecting git+https://github.com/hartikainen/serializable.git@76516385a3a716ed4a2a9ad877e2d5cbcf18d4e6 (from -r /tmp/./requirements.txt (line 83))
  Cloning https://github.com/hartikainen/serializable.git (to revision 76516385a3a716ed4a2a9ad877e2d5cbcf18d4e6) to ./pip-req-build-q72w5nqc
Collecting absl-py==0.6.1 (from -r /tmp/./requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/0c/63/f505d2d4c21db849cf80bad517f0065a30be6b006b0a5637f1b95584a305/absl-py-0.6.1.tar.gz (94kB)
Requirement already satisfied: asn1crypto==0.24.0 in /opt/conda/envs/softlearning/lib/python3.6/site-packages (from -r /tmp/./requirements.txt (line 2)) (0.24.0)
Collecting astor==0.7.1 (from -r /tmp/./requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/35/6b/11530768cac581a12952a2aad00e1526b89d242d0b9f59534ef6e6a1752f/astor-0.7.1-py2.py3-none-any.whl
Collecting atomicwrites==1.2.1 (from -r /tmp/./requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/3a/9a/9d878f8d885706e2530402de6417141129a943802c084238914fa6798d97/atomicwrites-1.2.1-py2.py3-none-any.whl
Collecting attrs==18.2.0 (from -r /tmp/./requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/3a/e1/5f9023cc983f1a628a8c2fd051ad19e76ff7b142a0faf329336f9a62a514/attrs-18.2.0-py2.py3-none-any.whl
Collecting awscli==1.16.67 (from -r /tmp/./requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/aa/e5/ebd5896ad5ae353d23bea05ebb8edd3d49f1471784f6afa12a9cf11710de/awscli-1.16.67-py2.py3-none-any.whl (1.4MB)
Collecting boto3==1.9.57 (from -r /tmp/./requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/bf/a1/2fedb80d3eefe024580aaff3e81106058b6f99698295edfca51199162bd5/boto3-1.9.57-py2.py3-none-any.whl (128kB)
Collecting botocore==1.12.57 (from -r /tmp/./requirements.txt (line 8))
  Downloading https://files.pythonhosted.org/packages/f1/37/eb8f5a76e1cb16ecabb7c92f7504c37030c8b727d550021b2bb34dc2a082/botocore-1.12.57-py2.py3-none-any.whl (5.1MB)
Collecting cachetools==3.0.0 (from -r /tmp/./requirements.txt (line 9))
  Downloading https://files.pythonhosted.org/packages/76/7e/08cd3846bebeabb6b1cfc4af8aae649d90249b4aeed080bddb5297f1d73b/cachetools-3.0.0-py2.py3-none-any.whl
Requirement already satisfied: certifi==2018.11.29 in /opt/conda/envs/softlearning/lib/python3.6/site-packages (from -r /tmp/./requirements.txt (line 10)) (2018.11.29)
Requirement already satisfied: cffi==1.11.5 in /opt/conda/envs/softlearning/lib/python3.6/site-packages (from -r /tmp/./requirements.txt (line 11)) (1.11.5)
Requirement already satisfied: chardet==3.0.4 in /opt/conda/envs/softlearning/lib/python3.6/site-packages (from -r /tmp/./requirements.txt (line 12)) (3.0.4)
Collecting Click==7.0 (from -r /tmp/./requirements.txt (line 13))
  Downloading https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl (81kB)
Collecting cloudpickle==0.6.1 (from -r /tmp/./requirements.txt (line 14))
  Downloading https://files.pythonhosted.org/packages/fc/87/7b7ef3038b4783911e3fdecb5c566e3a817ce3e890e164fc174c088edb1e/cloudpickle-0.6.1-py2.py3-none-any.whl
Collecting colorama==0.3.9 (from -r /tmp/./requirements.txt (line 15))
  Downloading https://files.pythonhosted.org/packages/db/c8/7dcf9dbcb22429512708fe3a547f8b6101c0d02137acbd892505aee57adf/colorama-0.3.9-py2.py3-none-any.whl
Collecting conda==4.5.11 (from -r /tmp/./requirements.txt (line 16))
  Could not find a version that satisfies the requirement conda==4.5.11 (from -r /tmp/./requirements.txt (line 16)) (from versions: 3.0.6, 3.5.0, 3.7.0, 3.17.0, 4.0.0, 4.0.1, 4.0.2, 4.0.3, 4.0.4, 4.0.5, 4.0.7, 4.0.8, 4.0.9, 4.1.2, 4.1.6, 4.2.6, 4.2.7, 4.3.13, 4.3.16)
No matching distribution found for conda==4.5.11 (from -r /tmp/./requirements.txt (line 16))


CondaValueError: pip returned an error

ERROR: Service 'softlearning-dev-gpu' failed to build: The command '/bin/sh -c echo "${MJKEY}" > /root/.mujoco/mjkey.txt     && sed -i -e 's/^tensorflow==/tensorflow-gpu==/g' /tmp/requirements.txt     && conda env update -f /tmp/environment.yml     && rm /root/.mujoco/mjkey.txt     && rm /tmp/requirements.txt     && rm /tmp/environment.yml' returned a non-zero code: 1

I solved this issue by erasing conda==4.5.11 that is in requirements.txt #16.

self._Serializable__initialize(locals()) missing. serializable package missing

Trying to install...

I seem to be stuck because self._Serializable__initialize does not exist. I believe it is because the serializable package does not exist. And... much google searching doesn't turn it up either.
git+https://github.com/hartikainen/serializable.git@76516385a3a716ed4a2a9ad877e2d5cbcf18d4e6 in the requirements.txt file does not install and doesn't seem to exist anywhere on the web

Where can I get it???

Parallelization

Hi,
as far as I understand it, SAC currently works for training with a single agent?

Are there plans to support distributed training like done in Surreal?

Hierarchical training and reward set

Hi,
I found your paper "Latent Space Policies for Hierarchical Reinforcement Learning" very interesting and was glad you published the code. Motivated by your results, I'd like to implement the ant maze with hierarchical policies and compound skills / different rewards.
I didn't come up with an answer to the following questions. It would be great if you could help me out!

I assume that I have to pretrain a lower level policy first. How do I freeze the low level weights in the next step and how can I add a high level policy on top?

In the paper you mentioned a set of K reward functions. Where can I define the reward set ?

Thank you!

Checkpointing should not store cumulative replay pool

Right now our checkpointing code saves the full replay pool on every single checkpoint. This has become a problem with the image experiments since the snapshot size grows to gigabytes. One solution could be to just save the experience since the latest checkpoint and construct the replay pool from the previous checkpoints when restoring.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.