rail-berkeley / softlearning Goto Github PK

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Home Page: https://sites.google.com/view/sac-and-applications

License: Other

Shell 0.60% Python 99.40%

deep-learning deep-neural-networks deep-reinforcement-learning machine-learning reinforcement-learning soft-actor-critic

softlearning's People

Contributors

Stargazers

Watchers

Forkers

hartikainen pvr1 wsjeon jefferyq wedthree wwxfromtju collector-m baek-jinoo henry-zhang-bohan haarnoja learnermaxrl reinforcement-learning-fun-2 jiwon0307 ryzhu glebshevchukk awesomemachinelearning tommylike yunchuzhang github30 itsank eycab seanhsieh richardrl dchichkov manuelfreude fd-mingjie brandontrabucco luhuanwu sweetice layne-wang neu-shuai 11245702 alacarter jaykimbravekjh liangy1969 srisadhan hyzcn vitchyr yaqianzhang quantumiracle amir-ariaei hongkahjun robodreamer richardliaw sd-james aloshkad shubhampachori12110095 jacobzweig quanvuong dkorenkevych tonipes jskdr gkswamy98 pankayaraj b2220333 faezs kunbb zhangzhao4444 lvzw1895 githubbeinner zhaofeng-shu33 glennxu3536 yanxg syslot cheryyunl azhou42 xiaoanshi jseppanen kuangenzhang ying-wen syleemrl anji993 mujanfun gerritschoe jennytran158 xwinxu yuchao-dong xlnwel haydengao afcarl anxie anniesch nflu krishpop robvcc buptwcy bzp92 guoyaq kpertsch oztc dyqgit1996 chivee max-918 joewoo benykoz johannespitz pprivulet amimem downseq pi35

softlearning's Issues

Resume training

Hi,
I am trying resuming a training and I think this works over the --restore parameter? But when I try this I get the error message that a file with ...tune_metadata was not found. And indeed in my checkpoints is no file with this ending? What is the best way to resume experiments?!

Gym adapter should use default register and support arbitrary gym domains and tasks

Currently, every domain and task needs to be registered manually to the gym_adapter.py in order for them to be usable in the example scripts.

We should use the default gym register format for instantiating the envs, and allow arbitrary envs to be called without explicitly registering them to the gym adapter.

Conda issues installing patchelf

I'll try to find a work-around. Here's the output. Thanks!

11:04:39 ~/git_repos/softlearning:master
$ conda env create -f environment.yml
Solving environment: failed

ResolvePackageNotFound:

patchelf=0.9

No module named examples.development.simulate_policy

In the README.md, the following command is mentioned in order to simulate the resulting policy:
python -m examples.development.simulate_policy […]
However, I could not find this module in the development directory.
When trying to run, python outputs the following error:
/home/user/miniconda3/envs/softlearning/bin/python: No module named examples.development.simulate_policy

[question] Action Smoothing

Hello,
In the blog and in the code, action smoothing is mentioned but never explained...
In the code, one does apparently something like:

# smoothing coeff (0 -> no smoothing, 1 -> max smoothing)
alpha = some_value
beta = sqrt(1 - alpha ** 2)
# raw latents is sampled from a MultivariateNormalDiag with zero mean and unit std
smoothing_latent = alpha * smoothing_latent + raw_latents
latent = beta * smoothing_latent

action = mean + std * latent
action = tanh(action)

My question is where does this smoothing comes from? and why not, for instance, an exponential smoothing?

Also, can raw_latent be seen as the noise?

[Enhancement] Make replay buffer memory-efficient

Currently, the replay buffer stores each observation twice (since it stores a tuple of state, action, next_state, reward, done for every transition). The buffer consumes large amounts of memory for environments with high dimensional observations (like images). For example, the memory consumption for an experiment with 48x48 RGB images and 1 million timesteps is about ~56GB, and this could be cut down to ~28GB.

Union pool

Hi,

Seems like the union pool implementation is unfinished. Is that going to be done soon?

Thanks!

Pusher low level policy for ('any',-1) is not learning

Hi,

I am running your code. Pusher low-level policy trained for (-1, 'any') works fine but ('any',-1) doesn't do anything. Is there any fix for it or do you have your low-level trained models that I can use for my work?

Gaussian mixture policies

Are GMM policies stable with the Q-only formulation (without V function)? I see that this repository doesn't contain GMM policies while the old one (haarnoja/sac) does.
I am trying to get it working on rlkit but it seems like GMM policies are difficult to train without the V function.

Code slower than before refactoring

For some reason, the training is currently tens of percents slower (in terms of wall clock time) than it was prior to the latest refactor. Need to figure out what slows it down and fix it.

Progress on Deepmind control suite?

Hi!

I noticed that a recent commit was pushed to the repo to support running SAC on the Deepmind control suite.

I was wondering if the current code base is ready to run on the Deepmind control suite and if not, what else remains to be done? Maybe I could help. Thanks!

Support for SQL

Currently all code in experiments/ only supports SAC, but not SQL. Do you have any plans on adding support for SQL soon? @hartikainen @haarnoja

Environment sampling should be parallelized

Environment sampling seems to be pretty slow especially on image environments. We should think about ways to better parallelize things to speed it up.

What does it mean to set 'eval_deterministic' = True for SQL?

Hi,
I read the original Soft Q-Learning paper and the policy in SQL is approximated by a neural network whose input is state and a random noise, and output is an action. I am wondering what is the deterministic action mode for SQL? Thanks!

unstable training curve for default SQL

Hi,

First of all, thanks for the brilliant papers and making the codes open-source.

I was running SQL on half-cheetah with default setting using the command:
--universe=gym --domain=HalfCheetah --task=v2 --algorithm=SQL --exp-name=my-sql-experiment-2 --checkpoint-frequency=1000.

It uses Gaussian policy and a reward scale of 30, which I think implies a very low entropy regularization.

However, I obtained very unstable training return curve and evaluation return curve as below:

I was wondering if there is anything wrong with the default SQL setting and how do you test the SQL? I tried to lower the reward scale, and it is leading to a lower but a little bit stabler return curve.

Thanks!

Q-targets should be initialized with hard update.

As pointed out by @alexlee-gk, we currently initialize the Q-target with soft update, which is different from what we do in the paper.

Using log_alpha in alpha loss

Why does the objective for alpha use log_alpha instead of alpha?

softlearning/softlearning/algorithms/sac.py

Line 278 in 125ee6e

log_alpha * tf.stop_gradient(log_pis + self._target_entropy))

Is this equivalent to the objective in the paper which uses alpha or am I missing something?

Thank you for making this code publicly available.

Module 'gym' has no attribute 'register' on MacOS Mojave 10.14.4

Hi All, when I tried to run a reward learning task (https://github.com/avisingh599/reward-learning-rl) with softlearning environment, the following error occurred: "AttributeError: module 'gym' has no attribute 'register'"

However when I ran import gym and gym.register() on a separate python script on Pycharm it works fine, e.g. able to find the register module in gym. I had a look at the previous issues posted for Softlearning and think this is a gym adapter issue? But I am not sure how to manually add this environment/task onto gym_adapter in the Softlearning package? Many thanks for your help!!

Restore does not work at all anymore

Hi,
since I updated the code lately it seems like adding --restore==Pathtocheckpoint doesn't do anything...

Environment seeding doesn't allow reproducibility

Hi Hartikainen,

Thank you for maintaining the repo. Sorry if this is done elsewhere in the code, but shouldn't we set the seed of the environments (both training and eval) after creating them here for reproducibility purposes?

softlearning/examples/development/main.py

Line 49 in 1f6686d

Gym maintains its own internal copy of numpy and use that internal copy to sample the initial state. Setting the seed of the global numpy module does not affect this internal copy of numpy.

Thanks!

How to use gym environment ?

I want to use sac in the gym environment, such as SpaceInvaders.
How to do it?

Import error. Trying to rebuild mujoco_py.

$  softlearning run_example_local examples.development \
>     --universe=gym \
>     --domain=HalfCheetah \
>     --task=v3 \
>     --exp-name=my-sac-experiment-1 \
>     --checkpoint-frequency=1000

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING: Logging before flag parsing goes to stderr.
I0418 01:08:50.825603 140032189581056 acceleratesupport.py:13] OpenGL_accelerate module loaded
I0418 01:08:50.832047 140032189581056 arraydatatype.py:270] Using accelerated ArrayDatatype
I0418 01:08:51.017610 140032189581056 __init__.py:34] MuJoCo library version is: 200
2019-04-18 01:08:51,105 INFO node.py:439 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-04-18_01-08-51_27162/logs.
2019-04-18 01:08:51,211 INFO services.py:364 -- Waiting for redis server at 127.0.0.1:20587 to respond...
2019-04-18 01:08:51,320 INFO services.py:364 -- Waiting for redis server at 127.0.0.1:52635 to respond...
2019-04-18 01:08:51,321 INFO services.py:761 -- Starting Redis shard with 10.0 GB max memory.
2019-04-18 01:08:51,337 WARNING services.py:1301 -- Warning: Capping object memory store to 20.0GB. To increase this further, specify `object_store_memory` when calling ray.init() or ray start.
2019-04-18 01:08:51,337 INFO services.py:1449 -- Starting the Plasma object store with 20.0 GB memory using /dev/shm.
2019-04-18 01:08:51,885 INFO tune.py:139 -- Did not find checkpoint file in /home/yrli/ray_results/gym/HalfCheetah/v3/2019-04-18T01-08-51-my-sac-experiment-1.
2019-04-18 01:08:51,885 INFO tune.py:145 -- Starting a new experiment.
2019-04-18 01:08:51,892 INFO web_server.py:241 -- Starting Tune Server...
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/56 CPUs, 0/8 GPUs
Memory usage on this node: 4.8/270.1 GB

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 56/56 CPUs, 0/8 GPUs
Memory usage on this node: 4.9/270.1 GB
Result logdir: /home/yrli/ray_results/gym/HalfCheetah/v3/2019-04-18T01-08-51-my-sac-experiment-1
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - id=f24a78d2-seed=4956:       RUNNING

(pid=27322) 
(pid=27322) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
(pid=27322) For more information, please see:
(pid=27322)   * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
(pid=27322)   * https://github.com/tensorflow/addons
(pid=27322) If you depend on functionality not listed there, please file an issue.
(pid=27322) 
(pid=27322) 2019-04-18 01:08:55.340353: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=27322) Using seed 4956
(pid=27322) 2019-04-18 01:08:55.424360: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=27322) 2019-04-18 01:08:55.424399: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:161] retrieving CUDA diagnostic information for host: 64.site
(pid=27322) 2019-04-18 01:08:55.424408: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:168] hostname: 64.site
(pid=27322) 2019-04-18 01:08:55.424460: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:192] libcuda reported version is: 410.104.0
(pid=27322) 2019-04-18 01:08:55.424498: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:196] kernel reported version is: 410.104.0
(pid=27322) 2019-04-18 01:08:55.424507: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version seems to match DSO: 410.104.0
(pid=27322) 2019-04-18 01:08:55.426316: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400060000 Hz
(pid=27322) 2019-04-18 01:08:55.429028: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5d18810 executing computations on platform Host. Devices:
(pid=27322) 2019-04-18 01:08:55.429054: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
(pid=27322) Import error. Trying to rebuild mujoco_py.
(pid=27322) Compiling /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.pyx because it depends on /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/pxd/mujoco.pxd.
(pid=27322) [1/1] Cythonizing /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.pyx
(pid=27322) running build_ext
(pid=27322) building 'mujoco_py.cymj' extension
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/gl
(pid=27322) gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py -I/home/yrli/.mujoco/mujoco200/include -I/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/numpy/core/include -I/home/yrli/anaconda3/envs/softlearning/include/python3.6m -c /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.c -o /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.o -fopenmp -w
(pid=27322) gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py -I/home/yrli/.mujoco/mujoco200/include -I/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/numpy/core/include -I/home/yrli/anaconda3/envs/softlearning/include/python3.6m -c /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.c -o /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.o -fopenmp -w
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6
(pid=27322) creating /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6/mujoco_py
(pid=27322) gcc -pthread -shared -L/home/yrli/anaconda3/envs/softlearning/lib -Wl,-rpath=/home/yrli/anaconda3/envs/softlearning/lib,--no-as-needed -L/home/yrli/anaconda3/envs/softlearning/lib -Wl,-rpath=/home/yrli/anaconda3/envs/softlearning/lib,--no-as-needed /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/cymj.o /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.o -L/home/yrli/.mujoco/mujoco200/bin -L/home/yrli/anaconda3/envs/softlearning/lib -Wl,--enable-new-dtags,-R/home/yrli/.mujoco/mujoco200/bin -lmujoco200 -lglewosmesa -lOSMesa -lGL -lpython3.6m -o /home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.0_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6/mujoco_py/cymj.cpython-36m-x86_64-linux-gnu.so -fopenmp
2019-04-18 01:09:53,990 ERROR trial_runner.py:426 -- Error processing event.
Traceback (most recent call last):
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 389, in _process_events
    result = self.trial_executor.fetch_result(trial)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 252, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 2288, in get
    raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=27322, host=64.site)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/mujoco/mujoco_env.py", line 11, in <module>
    import mujoco_py
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/__init__.py", line 3, in <module>
    from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/builder.py", line 503, in <module>
    cymj = load_cython_ext(mujoco_path)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/builder.py", line 106, in load_cython_ext
    mod = load_dynamic_ext('cymj', cext_so_path)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/mujoco_py/builder.py", line 124, in load_dynamic_ext
    return loader.load_module()
ImportError: dlopen: cannot load any more object with static TLS

During handling of the above exception, another exception occurred:

ray_ExperimentRunner:train() (pid=27322, host=64.site)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/trainable.py", line 150, in train
    result = self._train()
  File "/data1/yrli/softlearning/examples/development/main.py", line 77, in _train
    self._build()
  File "/data1/yrli/softlearning/examples/development/main.py", line 44, in _build
    get_environment_from_params(environment_params['training']))
  File "/data1/yrli/softlearning/softlearning/environments/utils.py", line 33, in get_environment_from_params
    return get_environment(universe, domain, task, environment_kwargs)
  File "/data1/yrli/softlearning/softlearning/environments/utils.py", line 24, in get_environment
    return ADAPTERS[universe](domain, task, **environment_params)
  File "/data1/yrli/softlearning/softlearning/environments/adapters/gym_adapter.py", line 66, in __init__
    env = gym.envs.make(env_id, **kwargs)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/registration.py", line 183, in make
    return registry.make(id, **kwargs)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/registration.py", line 125, in make
    env = spec.make(**kwargs)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/registration.py", line 88, in make
    cls = load(self._entry_point)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/registration.py", line 17, in load
    mod = importlib.import_module(mod_name)
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/mujoco/__init__.py", line 1, in <module>
    from gym.envs.mujoco.mujoco_env import MujocoEnv
  File "/home/yrli/anaconda3/envs/softlearning/lib/python3.6/site-packages/gym/envs/mujoco/mujoco_env.py", line 13, in <module>
    raise error.DependencyNotInstalled("{}. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)".format(e))
gym.error.DependencyNotInstalled: dlopen: cannot load any more object with static TLS. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)

Typo in refactoring flexible replay pool field logic

I think the following line should be:
for field_name, field_attrs in fields_attrs.items():

softlearning/softlearning/replay_pools/flexible_replay_pool.py

Line 36 in f569ec4

for field_name, fields_attrs in fields_attrs.items():

Bound std of Gaussian policy via beta-sigmoid ?

If I got it correctly, the logstd of the Gaussian policy is clipped via min/max range and std is retrieved by exponentiation.

I am curious if using beta-sigmiodal function to model the logstd would be a tiny bit more stable, because it allows smooth lower/upper bound and less sharp gradient for larger magnitude.

e.g.

logvar = network output
var = 1/(1 + self.beta*torch.exp(-logvar))
var = min_var + (max_var - min_var)*var

Stop gradient for alpha in non alpha losses

Hi,

in the code, you are propagating the gradient into alpha also from the actor and critic losses. Is that intentional? Shouldn't there be tf.stop_gradient wrapped around the tf.exp(log_alpha) here.

Thanks,

Lukas

Automating alpha

Alpha loss is defined as (code):
alpha_loss = -tf.reduce_mean(log_alpha * tf.stop_gradient(log_pis + self._target_entropy))

In other words:
alpha_loss = log_alpha * (- <Negative constant>)
alpha_loss = log_alpha * (<Positive constant>)

So minimizing alpha_loss means minimizing log_alpha, this means that alphaalways goes to zero no matter what, and this is indeed what I'm confirming in my experiments.

I'm obviously forgeting something, but I'm not being able to figure it out.

Error occurs while runing

When I run the command given in README :

python -m examples.development.main \ --mode=local \ --universe=gym \ --domain=HalfCheetah \ --task=v2 \ --exp-name=my-sac-experiment-1 \ --checkpoint-frequency=1000 # Save the checkpoint to resume training later

to train the agent , however error occurs .

pygame 1.9.4 Hello from the pygame community. https://www.pygame.org/contribute.html Traceback (most recent call last): File "/home/deepglint/anaconda2/envs/softlearning/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/home/deepglint/anaconda2/envs/softlearning/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/deepglint/softlearning/examples/development/main.py", line 14, in <module> from softlearning.algorithms.utils import get_algorithm_from_variant File "/home/deepglint/softlearning/softlearning/algorithms/__init__.py", line 2, in <module> from .sac import SAC File "/home/deepglint/softlearning/softlearning/algorithms/sac.py", line 47 **kwargs, ^ SyntaxError: invalid syntax
I would very appreciate it if you could give me a solution.

`run_example_debug` on Mac OS doesn't save videos to the right directory

In debug mode, saves to "softlearning/videos" instead of under the user's home directory "~/ray_results/...". Works fine using run_example_local.

Additional information for baseline algorithms

Hi,

Would you please share the information for the baselines, e.g., DDPG, TD3, PPO, with their hyper-parameter settings? (If you used open-source codes, would you please give me the link?)

While this repository only considers soft-learning algorithms, I think refactoring previous baseline algorithms in this framework might be quite useful.

Thanks.

SAC Hyperparameters MountainCarContinuous-v0 - Env with deceptive reward

Hello,

I've tried in vain to find suitable hyperparameters for SAC in order to solve MountainCarContinuous-v0.

Even with hyperparameter tuning (see "add-trpo" branch of rl baselines zoo), I was not able to solve it consistently (if during random exploration it finds the goal, then it will work, otherwise, it will be stuck in a local minima).
I also encountered that issue when trying SAC on another environment with deceptive reward (bit flipping env, trying to apply HER + SAC, see here).

Did you manage to solve that problem? If so, what hyperparameters did you use?

Note: I am using the SAC implementation from stable-baselines that works pretty well on all others problems (but where the reward is dense).

Unable to reproduce result on HalfCheetah-v2

I am unable to obtain the result as reported in the paper on the openai environment HalfCheetah-v2. The commit used to obtain this result is 1f6147c, which isn't too long ago. The result is averaged over 5 random initial seeds.

Do you know what might be causing this issue? Thank you!

I am able to obtain the result as reported (or close to it) in the paper on the remaining environments, posted here for reference.

Conda Issues installing on macOS 10.14 Mojave,

Here's what I found was needed to get this repo working on macOS 10.14 Mojave:

Comment out the following line in environment.yml:

patchelf=0.9

Re-install .h files that were removed by the update to macOS 10.14 Mojave, following this answer:
https://stackoverflow.com/questions/52509602/cant-compile-c-program-on-a-mac-after-upgrade-to-mojave
-> Before I did this pip couldn't find the limit.h header file

SAC checkpointing fails when using fixed entropy coefficient

@aviralkumar2907 ran into a problem where setting the target entropy to a fixed value makes the checkpointing crash:

Traceback (most recent call last):
  File "/home/kristian/github/hartikainen/ray/python/ray/tune/trial_runner.py", line 399, in _process_events
    self._checkpoint_trial_if_needed(trial)
  File "/home/kristian/github/hartikainen/ray/python/ray/tune/trial_runner.py", line 430, in _checkpoint_trial_if_needed
    self.trial_executor.save(trial, storage=Checkpoint.DISK)
  File "/home/kristian/github/hartikainen/ray/python/ray/tune/ray_trial_executor.py", line 317, in save
    trial._checkpoint.value = ray.get(trial.runner.save.remote())
  File "/home/kristian/github/hartikainen/ray/python/ray/worker.py", line 2211, in get
    raise value
ray.worker.RayTaskError: ray_ExperimentRunner:save() (pid=4371, host=jensen2)
  File "/home/kristian/github/hartikainen/ray/python/ray/tune/trainable.py", line 226, in save
    checkpoint = self._save(checkpoint_dir)
  File "/home/kristian/github/hartikainen/softlearning/examples/development/main.py", line 120, in _save
    tf_checkpoint = self._get_tf_checkpoint()
  File "/home/kristian/github/hartikainen/softlearning/examples/development/main.py", line 87, in _get_tf_checkpoint
    tf_checkpoint = tf.train.Checkpoint(**self.algorithm.tf_saveables)
  File "/home/kristian/github/hartikainen/softlearning/softlearning/algorithms/sac.py", line 430, in tf_saveables
    '_alpha_optimizer': self._alpha_optimizer,
AttributeError: 'SAC' object has no attribute '_alpha_optimizer'

MultivariateNormalDiag log_prob with target_entropy and alpha

Thank you for sharing the code.

For Gaussian policies, in this implementation as well as the original SAC repository, the log_prob method of the MultivariateNormalDiag class is used to compute the log probability of an action. This method returns the probability density and not the probability so log probabilities can be greater than 0. The issues arise in the learning objective for alpha. I can set a target_entropy = 0.0 in which you'd expect alpha to go to 0 (an entropy of 0 indicates a deterministic policy) but this is not the case since log_pi can be greater than or less than 0.

Is there something simple I'm missing here?

Thank you again.

invalidgitrepository error

Invalid git repository (last line of image).

Please tell me which path (repositry) it is looking for.

Run on real robot

Hi,
is there currently some implementation to run this outside of simulation?

No module named 'softlearning.utils'

I used virtualenv instead of conda on ubuntu 16.04
Ran python setup.py install after installing all the requirements.
When I ran the first example to train the agent I got the following error:
No module named softlearning.utils
Any idea what caused this? Thanks!

(venv) jc@jc-Precision-5510:~/research/softlearning$ softlearning run_example_local examples.development     --universe=gym     --domain=HalfCheetah     --task=v3     --exp-name=my-sac-experiment-1     --checkpoint-frequency=1000

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Traceback (most recent call last):
  File "/home/jc/research/softlearning/venv/bin/softlearning", line 11, in <module>
    load_entry_point('softlearning==0.0.1', 'console_scripts', 'softlearning')()
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/scripts/console_scripts.py", line 202, in main
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/scripts/console_scripts.py", line 71, in run_example_local_cmd
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/examples/instrument.py", line 203, in run_example_local
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/examples/development/__init__.py", line 21, in get_parser
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/examples/utils.py", line 8, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/algorithms/__init__.py", line 1, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/algorithms/sql.py", line 9, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/algorithms/rl_algorithm.py", line 12, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/samplers/__init__.py", line 4, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/samplers/remote_sampler.py", line 10, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/samplers/utils.py", line 5, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/replay_pools/__init__.py", line 4, in <module>
  File "/home/jc/research/softlearning/venv/lib/python3.6/site-packages/softlearning-0.0.1-py3.6.egg/softlearning/replay_pools/trajectory_replay_pool.py", line 8, in <module>
ModuleNotFoundError: No module named 'softlearning.utils'

Simulate policy not working anymore

In simulate_policy.py I had to replace
render_kwargs={'mode': args.render_mode}
on line 70

and in base_policy.py line 83 replaced it to
super(LatentSpacePolicy, self).init(kwargs['observation_keys'])

Replay pools should allow saving environment info

Gym environments return extra info values at each training step. Those infos should be saveable in the replay pools.

GPU issues

I run the command line 'CUDA_VISIBLE_DEVICES=3 python -m examples.development.main --mode=local --universe=gym --domain=Hopper --task=v2 --exp-name=test --checkpoint-frequency=1000 --cpus=16 --gpus 1 --trial-cpus 16 --trial-gpus 1'

But when I look up the 'nvidia-smi', no gpu is used.

My question is how to use gpu to run the codes?

Thanks!

Concatenating dm_control observations causes error due to uneven shapes

The code exits with error when I tried running with humanoid run task because of this line

softlearning/softlearning/environments/adapters/dm_control_adapter.py

Line 123 in fb70abc

flattened_observation = np.concatenate([

The reason is that the 'head_height' attribute of the observation has 0 dimension, so calling np.cat would complain that all the items in the concatenating tuple has to have the same dimension.

Originally posted by @quanvuong in #69 (comment)

Initial exploration steps differ between Mujoco envs like TD3 ?

I would like to quickly confirm if it is also recommended for SAC to set different initial exploration steps similar to TD3 (e.g. 1e4 for HalfCheetah, 1e3 for Hopper) ?

How to set random seed?

Thanks for the repo!

How can I set the random seed for a run? There doesn’t seem to be an option to set the seed using command line arguments.

Bug in HER replay pool (and multi goal setup not finished)

Hi,

I noticed there is a bug in the HER replay pool. I would send a PR, but it seems that the whole multi goal setup is not nearly ready at the moment. There is no support in the policies (no policy has goal_keys attributes) and I couldn't really figure out that env.goal_key_map is supposed to represent. Therefore all the tests in HER replay pool are failing and the bug wasn't caught by the tests.

The bug:

def REPLACE_FULL_OBSERVATION(original_batch,
                             resampled_batch,
                             where_resampled,
                             environment):
    batch_flat = flatten(original_batch)
    resampled_batch_flat = flatten(original_batch)  # wrong
    goal_keys = [
        key for key in batch_flat.keys()
        if key[0] == 'goals'
    ]
    for key in goal_keys:
        assert (batch_flat[key][where_resampled].shape
                == resampled_batch_flat[key].shape)
        batch_flat[key][where_resampled] = (
            resampled_batch_flat[key])

    return unflatten(batch_flat)

should be

def REPLACE_FULL_OBSERVATION(original_batch,
                             resampled_batch,
                             where_resampled,
                             environment):
    batch_flat = flatten(original_batch)
    resampled_batch_flat = flatten(resampled_batch)  # correct
    goal_keys = [
        key for key in batch_flat.keys()
        if key[0] == 'goals'
    ]
    for key in goal_keys:
        assert (batch_flat[key][where_resampled].shape
                == resampled_batch_flat[key].shape)
        batch_flat[key][where_resampled] = (
            resampled_batch_flat[key])

    return unflatten(batch_flat)

Since I am trying to experiment with Hierarchical RL with multiple goals, I am more than happy to contribute with the mutli goal setup. From the existing code, I couldn't figure out what's the big picture.

All the best,

Lukas

Error on Docker/GPU installation

Hi, thank you for sharing your source code and interesting results!

I've run the following command for Docker/GPU installation:

export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/docker-compose.dev.gpu.yml \
        up \
        -d \
        --force-recreate

After then, I got the following error message:

Step 19/23 : RUN echo "${MJKEY}" > /root/.mujoco/mjkey.txt     && sed -i -e 's/^tensorflow==/tensorflow-gpu==/g' /tmp/requirements.txt     && conda env update -f /tmp/environment.yml     && rm /root/.mujoco/mjkey.txt     && rm /tmp/requirements.txt     && rm /tmp/environment.yml
 ---> Running in 9d088bf80325
Solving environment: ...working... done
ruamel_yaml-0.15.46  | 245 KB    | ########## | 100% 
ncurses-6.1          | 958 KB    | ########## | 100% 
python-3.6.5         | 29.4 MB   | ########## | 100% 
pip-18.1             | 1.8 MB    | ########## | 100% 
chardet-3.0.4        | 189 KB    | ########## | 100% 
pycosat-0.6.3        | 104 KB    | ########## | 100% 
requests-2.21.0      | 85 KB     | ########## | 100% 
six-1.12.0           | 22 KB     | ########## | 100% 
wheel-0.32.3         | 35 KB     | ########## | 100% 
certifi-2018.11.29   | 146 KB    | ########## | 100% 
urllib3-1.24.1       | 149 KB    | ########## | 100% 
cryptography-2.3.1   | 585 KB    | ########## | 100% 
zlib-1.2.11          | 120 KB    | ########## | 100% 
setuptools-40.6.3    | 625 KB    | ########## | 100% 
cffi-1.11.5          | 212 KB    | ########## | 100% 
patchelf-0.9         | 71 KB     | ########## | 100% 
pycparser-2.19       | 174 KB    | ########## | 100% 
idna-2.8             | 133 KB    | ########## | 100% 
pysocks-1.6.8        | 22 KB     | ########## | 100% 
asn1crypto-0.24.0    | 155 KB    | ########## | 100% 
pyopenssl-18.0.0     | 82 KB     | ########## | 100% 
conda-4.5.12         | 1.0 MB    | ########## | 100% 
Downloading and Extracting Packages
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting git+https://github.com/openai/gym.git@49cd48020f6760630a7317cb3529a22de6f12f2e#[all] (from -r /tmp/./requirements.txt (line 36))
  Cloning https://github.com/openai/gym.git (to revision 49cd48020f6760630a7317cb3529a22de6f12f2e) to ./pip-req-build-8ky3z5dn
Collecting git+https://github.com/vitchyr/multiworld.git@d76b3dae2e8cbca02924f93d6cc0239c552f6408 (from -r /tmp/./requirements.txt (line 50))
  Cloning https://github.com/vitchyr/multiworld.git (to revision d76b3dae2e8cbca02924f93d6cc0239c552f6408) to ./pip-req-build-g3i3y_w5
Collecting git+https://github.com/hartikainen/serializable.git@76516385a3a716ed4a2a9ad877e2d5cbcf18d4e6 (from -r /tmp/./requirements.txt (line 83))
  Cloning https://github.com/hartikainen/serializable.git (to revision 76516385a3a716ed4a2a9ad877e2d5cbcf18d4e6) to ./pip-req-build-q72w5nqc
Collecting absl-py==0.6.1 (from -r /tmp/./requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/0c/63/f505d2d4c21db849cf80bad517f0065a30be6b006b0a5637f1b95584a305/absl-py-0.6.1.tar.gz (94kB)
Requirement already satisfied: asn1crypto==0.24.0 in /opt/conda/envs/softlearning/lib/python3.6/site-packages (from -r /tmp/./requirements.txt (line 2)) (0.24.0)
Collecting astor==0.7.1 (from -r /tmp/./requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/35/6b/11530768cac581a12952a2aad00e1526b89d242d0b9f59534ef6e6a1752f/astor-0.7.1-py2.py3-none-any.whl
Collecting atomicwrites==1.2.1 (from -r /tmp/./requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/3a/9a/9d878f8d885706e2530402de6417141129a943802c084238914fa6798d97/atomicwrites-1.2.1-py2.py3-none-any.whl
Collecting attrs==18.2.0 (from -r /tmp/./requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/3a/e1/5f9023cc983f1a628a8c2fd051ad19e76ff7b142a0faf329336f9a62a514/attrs-18.2.0-py2.py3-none-any.whl
Collecting awscli==1.16.67 (from -r /tmp/./requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/aa/e5/ebd5896ad5ae353d23bea05ebb8edd3d49f1471784f6afa12a9cf11710de/awscli-1.16.67-py2.py3-none-any.whl (1.4MB)
Collecting boto3==1.9.57 (from -r /tmp/./requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/bf/a1/2fedb80d3eefe024580aaff3e81106058b6f99698295edfca51199162bd5/boto3-1.9.57-py2.py3-none-any.whl (128kB)
Collecting botocore==1.12.57 (from -r /tmp/./requirements.txt (line 8))
  Downloading https://files.pythonhosted.org/packages/f1/37/eb8f5a76e1cb16ecabb7c92f7504c37030c8b727d550021b2bb34dc2a082/botocore-1.12.57-py2.py3-none-any.whl (5.1MB)
Collecting cachetools==3.0.0 (from -r /tmp/./requirements.txt (line 9))
  Downloading https://files.pythonhosted.org/packages/76/7e/08cd3846bebeabb6b1cfc4af8aae649d90249b4aeed080bddb5297f1d73b/cachetools-3.0.0-py2.py3-none-any.whl
Requirement already satisfied: certifi==2018.11.29 in /opt/conda/envs/softlearning/lib/python3.6/site-packages (from -r /tmp/./requirements.txt (line 10)) (2018.11.29)
Requirement already satisfied: cffi==1.11.5 in /opt/conda/envs/softlearning/lib/python3.6/site-packages (from -r /tmp/./requirements.txt (line 11)) (1.11.5)
Requirement already satisfied: chardet==3.0.4 in /opt/conda/envs/softlearning/lib/python3.6/site-packages (from -r /tmp/./requirements.txt (line 12)) (3.0.4)
Collecting Click==7.0 (from -r /tmp/./requirements.txt (line 13))
  Downloading https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl (81kB)
Collecting cloudpickle==0.6.1 (from -r /tmp/./requirements.txt (line 14))
  Downloading https://files.pythonhosted.org/packages/fc/87/7b7ef3038b4783911e3fdecb5c566e3a817ce3e890e164fc174c088edb1e/cloudpickle-0.6.1-py2.py3-none-any.whl
Collecting colorama==0.3.9 (from -r /tmp/./requirements.txt (line 15))
  Downloading https://files.pythonhosted.org/packages/db/c8/7dcf9dbcb22429512708fe3a547f8b6101c0d02137acbd892505aee57adf/colorama-0.3.9-py2.py3-none-any.whl
Collecting conda==4.5.11 (from -r /tmp/./requirements.txt (line 16))
  Could not find a version that satisfies the requirement conda==4.5.11 (from -r /tmp/./requirements.txt (line 16)) (from versions: 3.0.6, 3.5.0, 3.7.0, 3.17.0, 4.0.0, 4.0.1, 4.0.2, 4.0.3, 4.0.4, 4.0.5, 4.0.7, 4.0.8, 4.0.9, 4.1.2, 4.1.6, 4.2.6, 4.2.7, 4.3.13, 4.3.16)
No matching distribution found for conda==4.5.11 (from -r /tmp/./requirements.txt (line 16))


CondaValueError: pip returned an error

ERROR: Service 'softlearning-dev-gpu' failed to build: The command '/bin/sh -c echo "${MJKEY}" > /root/.mujoco/mjkey.txt     && sed -i -e 's/^tensorflow==/tensorflow-gpu==/g' /tmp/requirements.txt     && conda env update -f /tmp/environment.yml     && rm /root/.mujoco/mjkey.txt     && rm /tmp/requirements.txt     && rm /tmp/environment.yml' returned a non-zero code: 1

I solved this issue by erasing conda==4.5.11 that is in requirements.txt #16.

self._Serializable__initialize(locals()) missing. serializable package missing

Trying to install...

I seem to be stuck because self._Serializable__initialize does not exist. I believe it is because the serializable package does not exist. And... much google searching doesn't turn it up either.
git+https://github.com/hartikainen/serializable.git@76516385a3a716ed4a2a9ad877e2d5cbcf18d4e6 in the requirements.txt file does not install and doesn't seem to exist anywhere on the web

Where can I get it???

Parallelization

Hi,
as far as I understand it, SAC currently works for training with a single agent?

Are there plans to support distributed training like done in Surreal?

Hierarchical training and reward set

Hi,
I found your paper "Latent Space Policies for Hierarchical Reinforcement Learning" very interesting and was glad you published the code. Motivated by your results, I'd like to implement the ant maze with hierarchical policies and compound skills / different rewards.
I didn't come up with an answer to the following questions. It would be great if you could help me out!

I assume that I have to pretrain a lower level policy first. How do I freeze the low level weights in the next step and how can I add a high level policy on top?

In the paper you mentioned a set of K reward functions. Where can I define the reward set ?

Thank you!

why not support pip ?

how to let softleanring support pip install ?

Checkpointing should not store cumulative replay pool

Right now our checkpointing code saves the full replay pool on every single checkpoint. This has become a problem with the image experiments since the snapshot size grows to gigabytes. One solution could be to just save the experience since the latest checkpoint and construct the replay pool from the previous checkpoints when restoring.

rail-berkeley / softlearning Goto Github PK

softlearning's People

Contributors

Stargazers

Watchers

Forkers

softlearning's Issues

Recommend Projects

Recommend Topics

Recommend Org