google-deepmind / bsuite Goto Github PK
View Code? Open in Web Editor NEWbsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent
License: Apache License 2.0
bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent
License: Apache License 2.0
There is a small problem I had when building PPO OpenAI baseline agent in the bsuite_tutorial.
from baselines.common.vec_env import dummy_vec_env
from baselines.ppo2 import ppo2
from bsuite.utils import gym_wrapper
import tensorflow as tf
SAVE_PATH_PPO = './demo_results/bsuite/ppo'
def _load_env():
raw_env = bsuite.load_and_record(
bsuite_id='bandit_noise/0',
save_path=SAVE_PATH_PPO, logging_mode='csv', overwrite=True)
return gym_wrapper.GymFromDMEnv(raw_env)
env = dummy_vec_env.DummyVecEnv([_load_env])
steps,episode,total_return,episode_len,episode_return,total_regret
1,1,[49.09808016],1,[0.67640523],[51.5]
2,2,[49.09808016],1,[0.74001572],[51.5]
3,3,[49.09808016],1,[0.7978738],[51.5]
4,4,[49.09808016],1,[0.62408932],[51.5]
ppo2.learn(
env=env, network='mlp', lr=1e-3, gamma=.99,
total_timesteps=10000, nsteps=100)
**output**
input shape is (1, 1)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-2-d47907e196cf> in <module>
1 ppo2.learn(
2 env=env, network='mlp', lr=1e-3, gamma=.99,
----> 3 total_timesteps=10000, nsteps=100)
~/anaconda3/envs/drl/lib/python3.6/site-packages/baselines/ppo2/ppo2.py in learn(network, env, total_timesteps, eval_env, seed, nsteps, ent_coef, lr, vf_coef, max_grad_norm, gamma, lam, log_interval, nminibatches, noptepochs, cliprange, save_interval, load_path, model_fn, **network_kwargs)
177 # or if it's just worse than predicting nothing (ev =< 0)
178 # print( returns.shape,values.shape)
--> 179 ev = explained_variance(values, returns)
180 logger.logkv("misc/serial_timesteps", update*nsteps)
181 logger.logkv("misc/nupdates", update)
~/anaconda3/envs/drl/lib/python3.6/site-packages/baselines/common/math_util.py in explained_variance(ypred, y)
34
35 """
---> 36 assert y.ndim == 1 and ypred.ndim == 1
37 vary = np.var(y)
38 return np.nan if vary==0 else 1 - np.var(y-ypred)/vary
AssertionError:
I found this due to mismatched shape of values(100, 1) and returns(10000, 1) before explained_variance(values, returns)
.
When I add one line in 'baselines/ppo2/runner.py', it seems to run correctly.
...
#batch of steps to batch of rollouts
mb_obs = np.asarray(mb_obs, dtype=self.obs.dtype)
mb_rewards = np.asarray(mb_rewards, dtype=np.float32)
mb_actions = np.asarray(mb_actions)
mb_values = np.asarray(mb_values, dtype=np.float32)
mb_values = mb_values.reshape(mb_rewards.shape) <<< add this line
mb_neglogpacs = np.asarray(mb_neglogpacs, dtype=np.float32)
mb_dones = np.asarray(mb_dones, dtype=np.bool)
last_values = self.model.value(tf.constant(self.obs))._numpy()
...
Stepping environment...
--------------------------------------------
| eplenmean | nan |
| eprewmean | nan |
| fps | 271 |
| loss/approxkl | 2.5486004e-08 |
| loss/clipfrac | 0.0 |
| loss/policy_entropy | 2.3978922 |
| loss/policy_loss | -2.7894964e-09 |
| loss/value_loss | 0.061606925 |
| misc/explained_variance | 0 |
| misc/nupdates | 100 |
| misc/serial_timesteps | 10000 |
| misc/time_elapsed | 37.5 |
| misc/total_timesteps | 10000 |
--------------------------------------------
Several runs on deep_sea/0 i.e., DeepSea with N=10 take longer than 100 episodes, some even longer than 2^10=1024 episodes when running the default_agent BootDQN with no modifications.
To reproduce, this is the code I am running in Colab with a GPU runtime:
# first install bsuite[baselines]
import bsuite
from bsuite.baselines import experiment
from bsuite.baselines.tf import dqn
from bsuite.baselines.tf import boot_dqn
SAVE_PATH_DQN = './logs/test_boot'
env = bsuite.load_and_record("deep_sea/0", save_path=SAVE_PATH_DQN, overwrite=True)
agent = boot_dqn.default_agent(
obs_spec=env.observation_spec(),
action_spec=env.action_spec()
)
experiment.run(agent, env, num_episodes=env.bsuite_num_episodes)
I reran this multiple times and have had a few runs with > 1024 bad episodes.
Hi, I was wondering if there are environment/experiment with continuous action space, Box() or is there plans to include them?
Hello Ian and others!
I'm having a look at bsuite
after Ian Osband's talk at the Simons Institute Deep RL workshop. After spending a few minutes browsing the documentation and source code here on GitHub I had a suggestion for improving the documentation.
My first question when browsing this project is "The radar plot on the readme is lovely, I wonder what experiments contribute to a good ____________ score". Where the blank is e.g. 'generalization'.
After browsing the source code for a few minutes this isn't immediately obvious. I can see a little bit of information regarding this at the example colab notebook. It would be nice to promote this mapping to a 'first class' member of the documentation somewhere :)
I noticed you changed the optimizer and some hyper-parameters in DQN compared to those in the "Nature" paper, well, from my side I can't reproduce results by taking any of the two settings, could you share a learning curve of "Breakout"? I have been struggling with the hyper-parameters optimization for two months. Thanks.
I might have missed this, but are bsuite's cartpole observation parameters in bsuite same as OpenAI's Gym?
Thanks,
kb
this is probably my goof, but i wanted to play with this + baselines so i cloned the repo and installed on ubuntu 16.04, and the step to install baselines overwrote my pip installed tf 2.0.0-rc1
better support for tf2.0 would be awesome, i like the idea of bsuite a lot. Thanks for sharing
When running in openai_ppo
python3 ./run.py --bsuite_id=SWEEP
I get this error:
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
I ran test.sh and all the tests passes
Here is the log:
`
[Last finished: bandit/15]: 3%|███▋ | 15/468 [34:10<88:11:07, 700.81s/it]concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.6/concurrent/futures/process.py", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.6/concurrent/futures/process.py", line 153, in
return [fn(*args) for args in chunk]
File "./run.py", line 81, in run
gamma=FLAGS.agent_discount,
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/ppo2.py", line 108, in learn
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/model.py", line 111, in init
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 595, in apply_gradients
self._create_slots(var_list)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/adam.py", line 135, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 1153, in _zeros_slot
new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 183, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 157, in create_slot_with_initializer
dtype)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 65, in _create_slot_var
validate_shape=validate_shape)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1479, in get_variable
aggregation=aggregation)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1220, in get_variable
aggregation=aggregation)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 547, in get_variable
aggregation=aggregation)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 499, in _true_getter
aggregation=aggregation)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 848, in _get_single_variable
traceback.format_list(tb))))
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/model.py", line 111, in init
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/ppo2.py", line 108, in learn
File "./run.py", line 81, in run
gamma=FLAGS.agent_discount,
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./run.py", line 107, in
app.run(main)
File "/home/amdfanboy/.local/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/amdfanboy/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "./run.py", line 100, in main
pool.map_mpi(run, bsuite_sweep)
File "/home/amdfanboy/github/bsuite/bsuite/baselines/utils/pool.py", line 53, in map_mpi
for bsuite_id in pool.map(run_fn, bsuite_ids):
File "/usr/lib/python3.6/concurrent/futures/process.py", line 366, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/model.py", line 111, in init
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/ppo2.py", line 108, in learn
File "./run.py", line 81, in run
gamma=FLAGS.agent_discount,
`
Good morning,
You inserted an error in the setup.py file in your last commit.
I report the error I receive below when executing the pip install command.
pip install \bsuite
ERROR: Command errored out with exit status 1:
command: 'C:\Users\Torquato\Anaconda3\envs\nlp\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Torquato\\Downloads\\bsuite\\setup.py'"'"'; __file__='"'"'C:\\Users\\Torquato\\Downloads\\bsuite\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
cwd: C:\Users\Torquato\Downloads\bsuite\
Complete output (1 lines):
error in bsuite setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
I believe that the line that causes the problem is highlighted in the code below.
baselines_jax_require = [
'dm-haiku',
'dm-tree',
'jax',
'jaxlib',
'git+git://github.com/deepmind/rlax.git' <------------------
'tqdm',
]
Pip is not able to recognise that command as a valid requirement specifier.
Thank you very much.
Michelangelo Conserva
Hi There!
Thanks very much for bsuite
, it is a great resource for reproducible research.
I have a question on the framework.
I am setting up some pedagogic implementation of canonical rl algorithms, among which, sarsa.
Is there any design pattern you had in mind for n-step methods or any method that requires access to experience from longer transitions?
I am currently solving the issue with sarsa by computing the next action with the select_action
method in the update
function.
What about n-step methods or model-based methods?
I thought that it might be useful to be able to use openai gym environments within bsuite since there are so many of them. I noticed that there is a wrapper here that converts bsuite environments to openai gym environments, so in my fork I made a reverse wrapper that would convert an openai gym environment to a bsuite environement here. It's pretty untested right now but if you are interested I would be happy to clean it up and make a PR - I think this could be a useful feature.
Hi Ian,
I was trying to run the baseline agents on some of my environments. However, I couldn't get exact reproducibility. I think this is because numpy's own RNG is used for action selection, e.g., here:
https://github.com/deepmind/bsuite/blob/f4d12fb029c533ec610902a9565860bf377db556/bsuite/baselines/tf/dqn/agent.py#L78
Is this by design?
Greetings,
Raghu.
Hi, I have one simple question about DQN's loss here.
Why do you use tf.reduce_sum
instead of tf.reduce_mean
here??
Are there some reasons for it? Have experiments in the paper been done calculating loss which sums over the batch??
Sorry for asking such a simple question, but I would really appreciate it if you answer my question.
Anyway, this is a great project !!
Thank you :)
The public trfl
builts don't support tf2.*
and hence pip install bsuite[baselines]
fails.
(base) ➜ Developer conda create -y -n bsuite python=3.6.9
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.7.12
latest version: 4.8.3
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /usr/local/Caskroom/miniconda/base/envs/bsuite
added / updated specs:
- python=3.6.9
The following NEW packages will be INSTALLED:
ca-certificates pkgs/main/osx-64::ca-certificates-2020.1.1-0
certifi pkgs/main/osx-64::certifi-2020.4.5.1-py36_0
libcxx pkgs/main/osx-64::libcxx-4.0.1-hcfea43d_1
libcxxabi pkgs/main/osx-64::libcxxabi-4.0.1-hcfea43d_1
libedit pkgs/main/osx-64::libedit-3.1.20181209-hb402a30_0
libffi pkgs/main/osx-64::libffi-3.2.1-h475c297_4
ncurses pkgs/main/osx-64::ncurses-6.2-h0a44026_0
openssl pkgs/main/osx-64::openssl-1.1.1f-h1de35cc_0
pip pkgs/main/osx-64::pip-20.0.2-py36_1
python pkgs/main/osx-64::python-3.6.9-h359304d_0
readline pkgs/main/osx-64::readline-7.0-h1de35cc_5
setuptools pkgs/main/osx-64::setuptools-46.1.3-py36_0
sqlite pkgs/main/osx-64::sqlite-3.31.1-ha441bb4_0
tk pkgs/main/osx-64::tk-8.6.8-ha441bb4_0
wheel pkgs/main/osx-64::wheel-0.34.2-py36_0
xz pkgs/main/osx-64::xz-5.2.4-h1de35cc_4
zlib pkgs/main/osx-64::zlib-1.2.11-h1de35cc_3
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate bsuite
#
# To deactivate an active environment, use
#
# $ conda deactivate
(base) ➜ Developer conda activate bsuite
(bsuite) ➜ Developer pip install git+https://github.com/deepmind/bsuite.git#egg=bsuite[baselines]
Collecting bsuite[baselines]
Cloning https://github.com/deepmind/bsuite.git to /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/bsuite
Running command git clone -q https://github.com/deepmind/bsuite.git /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/bsuite
Processing /Users/omega/Library/Caches/pip/wheels/8e/28/49/fad4e7f0b9a1227708cbbee4487ac8558a7334849cb81c813d/absl_py-0.9.0-cp36-none-any.whl
Collecting dm_env
Using cached dm_env-1.2-py3-none-any.whl (22 kB)
Collecting matplotlib
Using cached matplotlib-3.2.1-cp36-cp36m-macosx_10_9_x86_64.whl (12.4 MB)
Collecting numpy
Using cached numpy-1.18.2-cp36-cp36m-macosx_10_9_x86_64.whl (15.2 MB)
Collecting pandas
Using cached pandas-1.0.3-cp36-cp36m-macosx_10_9_x86_64.whl (10.2 MB)
Collecting plotnine
Using cached plotnine-0.6.0-py3-none-any.whl (4.1 MB)
Collecting scipy
Using cached scipy-1.4.1-cp36-cp36m-macosx_10_6_intel.whl (28.5 MB)
Collecting scikit-image
Using cached scikit_image-0.16.2-cp36-cp36m-macosx_10_6_intel.whl (30.4 MB)
Collecting six
Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Processing /Users/omega/Library/Caches/pip/wheels/7c/06/54/bc84598ba1daf8f970247f550b175aaaee85f68b4b0c5ab2c6/termcolor-1.1.0-cp36-none-any.whl
Collecting dm-sonnet
Using cached dm_sonnet-2.0.0-py3-none-any.whl (254 kB)
Collecting dm-tree
Using cached dm_tree-0.1.4-cp36-cp36m-macosx_10_9_x86_64.whl (93 kB)
Collecting tensorflow
Using cached tensorflow-2.1.0-cp36-cp36m-macosx_10_11_x86_64.whl (120.8 MB)
Collecting tensorflow_probability
Using cached tensorflow_probability-0.9.0-py2.py3-none-any.whl (3.2 MB)
Collecting trfl@ git+git://github.com/deepmind/trfl.git#egg=trfl
Cloning git://github.com/deepmind/trfl.git to /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl
Running command git clone -q git://github.com/deepmind/trfl.git /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl
Collecting tqdm
Using cached tqdm-4.45.0-py2.py3-none-any.whl (60 kB)
Collecting kiwisolver>=1.0.1
Using cached kiwisolver-1.2.0-cp36-cp36m-macosx_10_9_x86_64.whl (60 kB)
Collecting python-dateutil>=2.1
Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1
Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
Collecting cycler>=0.10
Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting pytz>=2017.2
Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Collecting descartes>=1.1.0
Using cached descartes-1.1.0-py3-none-any.whl (5.8 kB)
Collecting statsmodels>=0.9.0
Using cached statsmodels-0.11.1-cp36-cp36m-macosx_10_13_x86_64.whl (8.4 MB)
Collecting patsy>=0.4.1
Using cached patsy-0.5.1-py2.py3-none-any.whl (231 kB)
Collecting mizani>=0.6.0
Using cached mizani-0.6.0-py2.py3-none-any.whl (61 kB)
Collecting PyWavelets>=0.4.0
Using cached PyWavelets-1.1.1-cp36-cp36m-macosx_10_9_x86_64.whl (4.3 MB)
Collecting imageio>=2.3.0
Using cached imageio-2.8.0-py3-none-any.whl (3.3 MB)
Collecting networkx>=2.0
Using cached networkx-2.4-py3-none-any.whl (1.6 MB)
Collecting pillow>=4.3.0
Using cached Pillow-7.1.1-cp36-cp36m-macosx_10_10_x86_64.whl (2.2 MB)
Processing /Users/omega/Library/Caches/pip/wheels/32/42/7f/23cae9ff6ef66798d00dc5d659088e57dbba01566f6c60db63/wrapt-1.12.1-cp36-cp36m-macosx_10_7_x86_64.whl
Collecting tabulate>=0.7.5
Using cached tabulate-0.8.7-py3-none-any.whl (24 kB)
Collecting keras-applications>=1.0.8
Using cached Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
Processing /Users/omega/Library/Caches/pip/wheels/5c/2e/7e/a1d4d4fcebe6c381f378ce7743a3ced3699feb89bcfbdadadd/gast-0.2.2-cp36-none-any.whl
Collecting grpcio>=1.8.6
Using cached grpcio-1.28.1-cp36-cp36m-macosx_10_9_x86_64.whl (2.6 MB)
Collecting protobuf>=3.8.0
Using cached protobuf-3.11.3-cp36-cp36m-macosx_10_9_x86_64.whl (1.3 MB)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from tensorflow->bsuite[baselines]) (0.34.2)
Collecting tensorboard<2.2.0,>=2.1.0
Using cached tensorboard-2.1.1-py3-none-any.whl (3.8 MB)
Collecting keras-preprocessing>=1.1.0
Using cached Keras_Preprocessing-1.1.0-py2.py3-none-any.whl (41 kB)
Collecting opt-einsum>=2.3.2
Using cached opt_einsum-3.2.0-py3-none-any.whl (63 kB)
Collecting tensorflow-estimator<2.2.0,>=2.1.0rc0
Using cached tensorflow_estimator-2.1.0-py2.py3-none-any.whl (448 kB)
Collecting astor>=0.6.0
Using cached astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting google-pasta>=0.1.6
Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting cloudpickle>=1.2.2
Using cached cloudpickle-1.3.0-py2.py3-none-any.whl (26 kB)
Collecting decorator
Using cached decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Collecting palettable
Using cached palettable-3.3.0-py2.py3-none-any.whl (111 kB)
Collecting h5py
Using cached h5py-2.10.0-cp36-cp36m-macosx_10_6_intel.whl (3.0 MB)
Requirement already satisfied: setuptools in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from protobuf>=3.8.0->tensorflow->bsuite[baselines]) (46.1.3.post20200330)
Collecting google-auth-oauthlib<0.5,>=0.4.1
Using cached google_auth_oauthlib-0.4.1-py2.py3-none-any.whl (18 kB)
Collecting google-auth<2,>=1.6.3
Downloading google_auth-1.14.0-py2.py3-none-any.whl (88 kB)
|████████████████████████████████| 88 kB 707 kB/s
Collecting werkzeug>=0.11.15
Using cached Werkzeug-1.0.1-py2.py3-none-any.whl (298 kB)
Collecting markdown>=2.6.8
Using cached Markdown-3.2.1-py2.py3-none-any.whl (88 kB)
Collecting requests<3,>=2.21.0
Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting requests-oauthlib>=0.7.0
Using cached requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)
Collecting rsa<4.1,>=3.1.4
Using cached rsa-4.0-py2.py3-none-any.whl (38 kB)
Collecting pyasn1-modules>=0.2.1
Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting cachetools<5.0,>=2.0.0
Using cached cachetools-4.1.0-py3-none-any.whl (10 kB)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->bsuite[baselines]) (2020.4.5.1)
Collecting idna<3,>=2.5
Using cached idna-2.9-py2.py3-none-any.whl (58 kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
Using cached urllib3-1.25.8-py2.py3-none-any.whl (125 kB)
Collecting chardet<4,>=3.0.2
Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting oauthlib>=3.0.0
Using cached oauthlib-3.1.0-py2.py3-none-any.whl (147 kB)
Collecting pyasn1>=0.1.3
Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Building wheels for collected packages: bsuite, trfl
Building wheel for bsuite (setup.py) ... done
Created wheel for bsuite: filename=bsuite-0.0.0-py3-none-any.whl size=177123 sha256=1d0d8738f92032e854e3ec9211a76fbc031b484dbd62dc75b27acfaf729bab93
Stored in directory: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-ephem-wheel-cache-mkj55_9i/wheels/7b/5e/ac/15fb44dea4f625a5cf4801445436f8a50d023233f734fc7d41
Building wheel for trfl (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-wheel-7q14rby_
cwd: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/
Complete output (5 lines):
running bdist_wheel
running build
running build_py
creating build
error: could not create 'build': File exists
----------------------------------------
ERROR: Failed building wheel for trfl
Running setup.py clean for trfl
Successfully built bsuite
Failed to build trfl
Installing collected packages: six, absl-py, dm-tree, numpy, dm-env, kiwisolver, python-dateutil, pyparsing, cycler, matplotlib, pytz, pandas, scipy, descartes, patsy, statsmodels, palettable, mizani, plotnine, PyWavelets, pillow, imageio, decorator, networkx, scikit-image, termcolor, wrapt, tabulate, dm-sonnet, h5py, keras-applications, gast, grpcio, protobuf, pyasn1, rsa, pyasn1-modules, cachetools, google-auth, idna, urllib3, chardet, requests, oauthlib, requests-oauthlib, google-auth-oauthlib, werkzeug, markdown, tensorboard, keras-preprocessing, opt-einsum, tensorflow-estimator, astor, google-pasta, tensorflow, cloudpickle, tensorflow-probability, trfl, tqdm, bsuite
Running setup.py install for trfl ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-record-_kglrl6r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/Caskroom/miniconda/base/envs/bsuite/include/python3.6m/trfl
cwd: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/
Complete output (5 lines):
running install
running build
running build_py
creating build
error: could not create 'build': File exists
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-record-_kglrl6r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/Caskroom/miniconda/base/envs/bsuite/include/python3.6m/trfl Check the logs for full command output.
(bsuite) ➜ Developer python -c "import trfl"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'trfl'
After installation, I was getting the 'cannot import Random', there was conflict between your bsuite/baselines/random/random.py
and system random.Random
class (as suggested here). After refactoring to random_baseline.RandomBaseline
everything seems to work.
I didn't find a parent common interface for all the bsuite
environments, but a common patter is to have a method get_observation
to collect the current observation.
Catch
, however, is the only environment to have an _observation
method in place of a _get_observation
one.
https://github.com/deepmind/bsuite/blob/6d8f64997ca256473c3d10be021431facc5a14d7/bsuite/environments/catch.py#L109-L114
Is there any specific reason why?
If not, would it be reasonable to omogenise the interface and make Catch
compliant?
Contex:
I usually use a simple interface to interoperate between gym
, bsuite
, dm_env
and other common libraries, and the lack of a shared interface for bsuite.Environment
s is an obstacle.
See also #44 for a tentative edit of Catch
.
It does note modify the parent Environment
yet.
Thanks,
Edu
I wanted to propose adding a pendulum experiment to bsuite. I think it fits the targeted, simple, challenging, scalable, fast criteria outlined in the bsuite paper. Also, now that #8 has been merged, DMEnvFromGym
can be used to convert the Openai pendulum environment to a bsuite environment without having to reimplement it i.e. something like
env = DMEnvFromGym(gym.make('Pendulum-v0'))
If there is interest I would be happy to work on it. Also, let me know if there are any concerns with implementing homegrown environments vs importing them from third parties like Openai.
In the notebook (bit.ly/bsuite-agents), I only found the description of 6 scores (basis, noise, scale, exploration, memory, and credit assignment). I wonder how is the generalization score computed? Thank you!
Hi,
while working on a PyTorch DQN agent for BSuite experiments, I noticed quite bad results on the mnist and mountain car experiments. I see that a similar question was addressed here, but the thread was closed.
To further investigate, I created a new conda environment, downloaded and installed a fresh copy of BSuite and ran the DQN agent from the baselines. The only settings I've changed were "bsuite_id" to "SWEEP" and the save path.
When you compare the results from both agents with the barplot on page 16 of the BSuite manuscript, you notice that both agents have worse performance on mnist and mountaincar and better performance on catch.
Were there any changes on the environments that I missed? The DQN agent from the manuscript did use the default parameters from the baseline directory, correct?
Thanks,
Peter
The threshold is set to 0.8 by default instead of 0.9 as the paper indicates "The summary ‘score’
computes the percentage of runs for which the average regret drops below 0.9 faster than the 2^N episodes expected by dithering."
def find_solution(df_in: pd.DataFrame,
sweep_vars: Sequence[Text] = None,
merge: bool = True,
thresh: float = 0.8) -> pd.DataFrame:
In dm_control there was the ability to render control tasks like cartpole swingup, even though the environment only had a dynamics-based observation space. It would be nice to have that ability here, especially since the environment is from dm_control.
It looks like the rendering function in the Gym wrapper just returns the last observation (in both human and rgb_array mode), which doesn't really work for a lot of tasks in bsuite when the observation is not an rgb_array.
Is there some way to see grab RGB output for bsuite environments?
For now, I hacked in the cartpole specific viewer from Gym to cartpole.py
and cartpole_swingup.py
. It works and it doesn't look horrible, but it's not exactly ideal.
How to set a seed in a bsuite environment instance? In the notebook, the output of sweep.SETTINGS
has a seed attribute which is not None:
Loaded bsuite_id: bandit_noise/0.
bsuite_id=bandit_noise/0, settings={'noise_scale': 0.1, 'seed': 0}, num_episodes=10000
Loaded bsuite_id: bandit_noise/1.
bsuite_id=bandit_noise/1, settings={'noise_scale': 0.1, 'seed': 1}, num_episodes=10000
Loaded bsuite_id: bandit_noise/2.
bsuite_id=bandit_noise/2, settings={'noise_scale': 0.1, 'seed': 2}, num_episodes=10000
Loaded bsuite_id: bandit_noise/3.
bsuite_id=bandit_noise/3, settings={'noise_scale': 0.1, 'seed': 3}, num_episodes=10000
but when I printed it again on my own computer, seed was None
(if I do that in the notebook, seed was None but there's an extra mapping_seed
which was not None).
I tried two methods to seed the environment: (1) sweep.SETTINGS[bsuite_id]['seed']=0
; (2) doing env.seed()
after wrapping it with OpenAI env, but neither worked (multiple experiments, same seed, different results). A minimal example to demonstrate these two seeding methods are not working:
import random
import torch as t
import numpy as np
import bsuite
from bsuite import sweep
from bsuite.utils import gym_wrapper
def set_seed(seed, deterministic=True):
random.seed(seed)
np.random.seed(seed)
t.manual_seed(seed)
t.cuda.manual_seed_all(seed)
t.cuda.manual_seed(seed)
if deterministic:
t.backends.cudnn.deterministic = True
t.backends.cudnn.benchmark = False
set_seed(0)
bsuite_id = 'cartpole_swingup/0'
raw_env = bsuite.load_from_id(bsuite_id)
# method 1
sweep.SETTINGS[bsuite_id]['seed']=0
for episode in range(10):
timestep = raw_env.reset()
total_reward = 0
while not timestep.last():
action = np.random.choice(raw_env.action_spec().num_values)
timestep = raw_env.step(action)
total_reward += timestep.reward
print(episode,total_reward)
# method 2
env = gym_wrapper.GymFromDMEnv(raw_env)
env.seed(seed=0)
for episode in range(10):
timestep = env.reset()
total_reward = 0
done = False
while not done:
action = np.random.choice(raw_env.action_spec().num_values)
sn,r,done,_ = env.step(action)
total_reward += r
print(episode,total_reward)
Hey there!
Thanks for open sourcing this tool for understanding better behavior in RL agents :)
There seems to be an error in the colab tutorial when executing load bsuite environments as OpenAI gym cell
#@title Simple to load bsuite environments as OpenAI gym
from bsuite.utils import gym_wrapper
raw_env = bsuite.load_from_id(bsuite_id='memory_len/0')
env = gym_wrapper.GymWrapper(raw_env)
isinstance(env, gym.Env)
might
env = gym_wrapper.GymWrapper(raw_env)
be
env = gym_wrapper.GymFromDMEnv(raw_env)
, like the documentation pinpoints ?
My apologies for not submitting a PR here, I was not able to access the Colab doc .
Have a nice day !
Use collections.abc instead
bsuite/logging/logging_utils.py
76: if not isinstance(path_collection, collections.Mapping):
Hi,
I am observing a strange behavior by the tensorflow default boot dqn agent that I am a bit baffled by.
When running sweeps over multiple environments, the agent loses its expected behavior after the first iteration and does not seem to explore. I've tried to debug for some time but haven't figured out the cause.
Code for reproduction (double-checked in a newly installed env):
import bsuite
from bsuite.baselines.tf import boot_dqn
from bsuite import sweep
from bsuite.baselines import experiment
bsuite_id = "DEEP_SEA"
log_dir = "./logs/"
bsuite_sweep = getattr(sweep, bsuite_id)[:3]
for id in bsuite_sweep:
env = bsuite.load_and_record(id, save_path=log_dir, overwrite=True)
agent = boot_dqn.default_agent(
obs_spec=env.observation_spec(),
action_spec=env.action_spec(),
)
experiment.run(agent, env, num_episodes=300)
Iterations 2 and 3 do not reach the end of the chain in 300 episodes and neither in very long training horizons (see also the colab link for results).
In contrast, the jax agent produces the expected results reliably in this loop (i.e., by replacing <bsuite.baselines.tf> with <bsuite.baselines.jax>).
The same can be observed in colab:
https://colab.research.google.com/drive/1hnJMDLG-aXCKKsjFqVd6YWGY4luz29ku?usp=sharing
best,
anyboby
np.int
and analogous expired their deprecation period and have been removed in Numpy 0.24.0:
https://numpy.org/doc/stable/release/1.24.0-notes.html#expired-deprecations
This causes:
import bsuite
# Output
AttributeError: module 'numpy' has no attribute 'int'
What value should be assigned to “experiments = {}”?
code:
bsuite/analysis/results.ipynb
`
#@ title loading results from local data:
experiments = {} # Add results here
DF, SWEEP_VARS = sqlite_load.load_bsuite(experiments)
`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.