google-deepmind / bsuite Goto Github PK

bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent

License: Apache License 2.0

Python 56.74% Jupyter Notebook 13.47% TeX 29.49% Shell 0.31%

bsuite's People

Contributors

Stargazers

Watchers

Forkers

rlacephas shyamalschandra ml-lab stjordanis codeaudit collector-m gabeochieng mahimanzum mathematicalmodels stenpiren gyunt nunofernandes-plight vmuthuk2 fmigone seungjaeryanlee wwxfromtju gerjanvl yasuiniko hafidzdaud scape1989 awesome-archive yanxg wanglisa15 hhy5277 mbrukman shaunstanislauslau mattsherar zhangmwg github30 evolvingfridge paritoshgoyal s-bl chenmoshushi fpli-mbr trendingtechnology cerkovni alexminnaar ai-nikolai claudiopinheiro srikanthvadapalli iamharis madhavadama chandansinha jingweiz diegosiqueir4 ggaamm huamichaelchen sinsixx asylumcorp 27260102 winnerineast xingzikeji liuweiping2020 alvin917 bisonreto striderw kammitama5 ziyibaby jetaimy madhukar-m-rao shakey-ltd leo-xxx konradbachusz-zz aslanides vivienzou1 zeigar nguyenducnhaty rayphaistos1 dragomirradev mbusakwe bs3537 bchalamayya meetrahulverma johan-kallstrom kronerte spencerx nogtini upasana23 vitalyvels dushyanttara szrlee mothergoose31 abhinavjain13 hanbaoan123 aakashofficial byo-ai leoyichen k-vamshi etarakci-hvl thibaultallart adillibabu suncherry creatorcen onepiec1 b1sounours sandguine chris-chris michalpleva tbz233 zivzone

bsuite's Issues

bsuite_tutorial problem when build PPO OpenAI baseline agent

There is a small problem I had when building PPO OpenAI baseline agent in the bsuite_tutorial.

After I logged results to CSV file using the following code,

from baselines.common.vec_env import dummy_vec_env
from baselines.ppo2 import ppo2
from bsuite.utils import gym_wrapper
import tensorflow as tf

SAVE_PATH_PPO = './demo_results/bsuite/ppo'
def _load_env():
raw_env = bsuite.load_and_record(
bsuite_id='bandit_noise/0', 
save_path=SAVE_PATH_PPO, logging_mode='csv', overwrite=True)
return gym_wrapper.GymFromDMEnv(raw_env)
env = dummy_vec_env.DummyVecEnv([_load_env])

I got bsuite_id_-_bandit_noise-0.csv file like this:

steps,episode,total_return,episode_len,episode_return,total_regret
1,1,[49.09808016],1,[0.67640523],[51.5]
2,2,[49.09808016],1,[0.74001572],[51.5]
3,3,[49.09808016],1,[0.7978738],[51.5]
4,4,[49.09808016],1,[0.62408932],[51.5]

When I ran the next cell, there is an assertion error.

ppo2.learn(
    env=env, network='mlp', lr=1e-3, gamma=.99,
    total_timesteps=10000, nsteps=100)

**output**
input shape is (1, 1)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-2-d47907e196cf> in <module>
      1 ppo2.learn(
      2     env=env, network='mlp', lr=1e-3, gamma=.99,
----> 3     total_timesteps=10000, nsteps=100)

~/anaconda3/envs/drl/lib/python3.6/site-packages/baselines/ppo2/ppo2.py in learn(network, env, total_timesteps, eval_env, seed, nsteps, ent_coef, lr, vf_coef, max_grad_norm, gamma, lam, log_interval, nminibatches, noptepochs, cliprange, save_interval, load_path, model_fn, **network_kwargs)
    177             # or if it's just worse than predicting nothing (ev =< 0)
    178 #             print( returns.shape,values.shape)
--> 179             ev = explained_variance(values, returns)
    180             logger.logkv("misc/serial_timesteps", update*nsteps)
    181             logger.logkv("misc/nupdates", update)

~/anaconda3/envs/drl/lib/python3.6/site-packages/baselines/common/math_util.py in explained_variance(ypred, y)
     34 
     35     """
---> 36     assert y.ndim == 1 and ypred.ndim == 1
     37     vary = np.var(y)
     38     return np.nan if vary==0 else 1 - np.var(y-ypred)/vary

AssertionError:

I found this due to mismatched shape of values(100, 1) and returns(10000, 1) before explained_variance(values, returns).
When I add one line in 'baselines/ppo2/runner.py', it seems to run correctly.

...
       #batch of steps to batch of rollouts
        mb_obs = np.asarray(mb_obs, dtype=self.obs.dtype)
        mb_rewards = np.asarray(mb_rewards, dtype=np.float32)
        mb_actions = np.asarray(mb_actions)
        mb_values = np.asarray(mb_values, dtype=np.float32)
        mb_values = mb_values.reshape(mb_rewards.shape)  <<<  add this line
        
        mb_neglogpacs = np.asarray(mb_neglogpacs, dtype=np.float32)
        mb_dones = np.asarray(mb_dones, dtype=np.bool)
        last_values = self.model.value(tf.constant(self.obs))._numpy()
...

final result

Stepping environment...
--------------------------------------------
| eplenmean               | nan            |
| eprewmean               | nan            |
| fps                                 | 271            |
| loss/approxkl           | 2.5486004e-08  |
| loss/clipfrac              | 0.0            |
| loss/policy_entropy     | 2.3978922      |
| loss/policy_loss        | -2.7894964e-09 |
| loss/value_loss         | 0.061606925    |
| misc/explained_variance | 0              |
| misc/nupdates                  | 100            |
| misc/serial_timesteps   | 10000          |
| misc/time_elapsed        | 37.5           |
| misc/total_timesteps    | 10000          |
--------------------------------------------

p.s. I use tf2.1.0 and checkout to tf2 branch after git clone baselines.

BootDQN+ not matching claimed performance

Several runs on deep_sea/0 i.e., DeepSea with N=10 take longer than 100 episodes, some even longer than 2^10=1024 episodes when running the default_agent BootDQN with no modifications.

To reproduce, this is the code I am running in Colab with a GPU runtime:

# first install bsuite[baselines]
import bsuite
from bsuite.baselines import experiment
from bsuite.baselines.tf import dqn
from bsuite.baselines.tf import boot_dqn

SAVE_PATH_DQN = './logs/test_boot'
env = bsuite.load_and_record("deep_sea/0", save_path=SAVE_PATH_DQN, overwrite=True)
agent = boot_dqn.default_agent(
      obs_spec=env.observation_spec(),
      action_spec=env.action_spec()
)
experiment.run(agent, env, num_episodes=env.bsuite_num_episodes)

I reran this multiple times and have had a few runs with > 1024 bad episodes.

environment/experiment with continuous action space, Box()

Hi, I was wondering if there are environment/experiment with continuous action space, Box() or is there plans to include them?

Documentation: Clarify mapping from high-level agent properties to experiments and environments

Hello Ian and others!

I'm having a look at bsuite after Ian Osband's talk at the Simons Institute Deep RL workshop. After spending a few minutes browsing the documentation and source code here on GitHub I had a suggestion for improving the documentation.

My first question when browsing this project is "The radar plot on the readme is lovely, I wonder what experiments contribute to a good ____________ score". Where the blank is e.g. 'generalization'.

After browsing the source code for a few minutes this isn't immediately obvious. I can see a little bit of information regarding this at the example colab notebook. It would be nice to promote this mapping to a 'first class' member of the documentation somewhere :)

Can't reproduce DQN performance

I noticed you changed the optimizer and some hyper-parameters in DQN compared to those in the "Nature" paper, well, from my side I can't reproduce results by taking any of the two settings, could you share a learning curve of "Breakout"? I have been struggling with the hyper-parameters optimization for two months. Thanks.

dm_env: Broken Link

The link to dm_env in README.md is broken.

Cartpole environment observation parameters

I might have missed this, but are bsuite's cartpole observation parameters in bsuite same as OpenAI's Gym?

Thanks,
kb

bsuite uninstalls tensorflow-gpu==2.0.0-rc1 when installing baselines via cloned repo

this is probably my goof, but i wanted to play with this + baselines so i cloned the repo and installed on ubuntu 16.04, and the step to install baselines overwrote my pip installed tf 2.0.0-rc1

better support for tf2.0 would be awesome, i like the idea of bsuite a lot. Thanks for sharing

ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

When running in openai_ppo
python3 ./run.py --bsuite_id=SWEEP

I get this error:
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

I ran test.sh and all the tests passes

python 3.6
tensorflow 1.14
Ubuntu 18.10

Here is the log:

`
[Last finished: bandit/15]: 3%|███▋ | 15/468 [34:10<88:11:07, 700.81s/it]concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.6/concurrent/futures/process.py", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.6/concurrent/futures/process.py", line 153, in
return [fn(*args) for args in chunk]
File "./run.py", line 81, in run
gamma=FLAGS.agent_discount,
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/ppo2.py", line 108, in learn
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/model.py", line 111, in init
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 595, in apply_gradients
self._create_slots(var_list)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/adam.py", line 135, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 1153, in _zeros_slot
new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 183, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 157, in create_slot_with_initializer
dtype)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py", line 65, in _create_slot_var
validate_shape=validate_shape)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1479, in get_variable
aggregation=aggregation)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1220, in get_variable
aggregation=aggregation)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 547, in get_variable
aggregation=aggregation)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 499, in _true_getter
aggregation=aggregation)
File "/home/amdfanboy/vbsuite/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 848, in _get_single_variable
traceback.format_list(tb))))
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/model.py", line 111, in init
File "/home/amdfanboy/github/baselines-youtube/baselines/ppo2/ppo2.py", line 108, in learn
File "./run.py", line 81, in run
gamma=FLAGS.agent_discount,

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "./run.py", line 107, in
app.run(main)
File "/home/amdfanboy/.local/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/amdfanboy/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "./run.py", line 100, in main
pool.map_mpi(run, bsuite_sweep)
File "/home/amdfanboy/github/bsuite/bsuite/baselines/utils/pool.py", line 53, in map_mpi
for bsuite_id in pool.map(run_fn, bsuite_ids):
File "/usr/lib/python3.6/concurrent/futures/process.py", line 366, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
ValueError: Variable ppo2_model/pi/mlp_fc0/w/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

setup.py broken after last commit

Good morning,

You inserted an error in the setup.py file in your last commit.
I report the error I receive below when executing the pip install command.

pip install \bsuite
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\Torquato\Anaconda3\envs\nlp\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Torquato\\Downloads\\bsuite\\setup.py'"'"'; __file__='"'"'C:\\Users\\Torquato\\Downloads\\bsuite\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
         cwd: C:\Users\Torquato\Downloads\bsuite\
    Complete output (1 lines):
    error in bsuite setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I believe that the line that causes the problem is highlighted in the code below.

baselines_jax_require = [
    'dm-haiku',
    'dm-tree',
    'jax',
    'jaxlib',
    'git+git://github.com/deepmind/rlax.git'   <------------------
    'tqdm',
]

Pip is not able to recognise that command as a valid requirement specifier.

Thank you very much.
Michelangelo Conserva

The signature for `update` does not allow for sarsa or n-step methods?

Hi There!

Thanks very much for bsuite, it is a great resource for reproducible research.

I have a question on the framework.
I am setting up some pedagogic implementation of canonical rl algorithms, among which, sarsa.

Is there any design pattern you had in mind for n-step methods or any method that requires access to experience from longer transitions?
I am currently solving the issue with sarsa by computing the next action with the select_action method in the update function.
What about n-step methods or model-based methods?

Converting Openai gym environments to bsuite environments

I thought that it might be useful to be able to use openai gym environments within bsuite since there are so many of them. I noticed that there is a wrapper here that converts bsuite environments to openai gym environments, so in my fork I made a reverse wrapper that would convert an openai gym environment to a bsuite environement here. It's pretty untested right now but if you are interested I would be happy to clean it up and make a PR - I think this could be a useful feature.

Using the agent's RNG, and not numpy's, to select actions

Hi Ian,

I was trying to run the baseline agents on some of my environments. However, I couldn't get exact reproducibility. I think this is because numpy's own RNG is used for action selection, e.g., here:
https://github.com/deepmind/bsuite/blob/f4d12fb029c533ec610902a9565860bf377db556/bsuite/baselines/tf/dqn/agent.py#L78

Is this by design?

Greetings,
Raghu.

Question about DQN's loss

Hi, I have one simple question about DQN's loss here.

Why do you use tf.reduce_sum instead of tf.reduce_mean here??
Are there some reasons for it? Have experiments in the paper been done calculating loss which sums over the batch??

Sorry for asking such a simple question, but I would really appreciate it if you answer my question.

Anyway, this is a great project !!
Thank you :)

dependency on trfl breaks TF2

The public trfl builts don't support tf2.* and hence pip install bsuite[baselines] fails.

(base) ➜  Developer conda create -y -n bsuite python=3.6.9
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.7.12
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /usr/local/Caskroom/miniconda/base/envs/bsuite

  added / updated specs:
    - python=3.6.9


The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/osx-64::ca-certificates-2020.1.1-0
  certifi            pkgs/main/osx-64::certifi-2020.4.5.1-py36_0
  libcxx             pkgs/main/osx-64::libcxx-4.0.1-hcfea43d_1
  libcxxabi          pkgs/main/osx-64::libcxxabi-4.0.1-hcfea43d_1
  libedit            pkgs/main/osx-64::libedit-3.1.20181209-hb402a30_0
  libffi             pkgs/main/osx-64::libffi-3.2.1-h475c297_4
  ncurses            pkgs/main/osx-64::ncurses-6.2-h0a44026_0
  openssl            pkgs/main/osx-64::openssl-1.1.1f-h1de35cc_0
  pip                pkgs/main/osx-64::pip-20.0.2-py36_1
  python             pkgs/main/osx-64::python-3.6.9-h359304d_0
  readline           pkgs/main/osx-64::readline-7.0-h1de35cc_5
  setuptools         pkgs/main/osx-64::setuptools-46.1.3-py36_0
  sqlite             pkgs/main/osx-64::sqlite-3.31.1-ha441bb4_0
  tk                 pkgs/main/osx-64::tk-8.6.8-ha441bb4_0
  wheel              pkgs/main/osx-64::wheel-0.34.2-py36_0
  xz                 pkgs/main/osx-64::xz-5.2.4-h1de35cc_4
  zlib               pkgs/main/osx-64::zlib-1.2.11-h1de35cc_3


Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate bsuite
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) ➜  Developer conda activate bsuite
(bsuite) ➜  Developer pip install git+https://github.com/deepmind/bsuite.git#egg=bsuite[baselines]
Collecting bsuite[baselines]
  Cloning https://github.com/deepmind/bsuite.git to /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/bsuite
  Running command git clone -q https://github.com/deepmind/bsuite.git /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/bsuite
Processing /Users/omega/Library/Caches/pip/wheels/8e/28/49/fad4e7f0b9a1227708cbbee4487ac8558a7334849cb81c813d/absl_py-0.9.0-cp36-none-any.whl
Collecting dm_env
  Using cached dm_env-1.2-py3-none-any.whl (22 kB)
Collecting matplotlib
  Using cached matplotlib-3.2.1-cp36-cp36m-macosx_10_9_x86_64.whl (12.4 MB)
Collecting numpy
  Using cached numpy-1.18.2-cp36-cp36m-macosx_10_9_x86_64.whl (15.2 MB)
Collecting pandas
  Using cached pandas-1.0.3-cp36-cp36m-macosx_10_9_x86_64.whl (10.2 MB)
Collecting plotnine
  Using cached plotnine-0.6.0-py3-none-any.whl (4.1 MB)
Collecting scipy
  Using cached scipy-1.4.1-cp36-cp36m-macosx_10_6_intel.whl (28.5 MB)
Collecting scikit-image
  Using cached scikit_image-0.16.2-cp36-cp36m-macosx_10_6_intel.whl (30.4 MB)
Collecting six
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Processing /Users/omega/Library/Caches/pip/wheels/7c/06/54/bc84598ba1daf8f970247f550b175aaaee85f68b4b0c5ab2c6/termcolor-1.1.0-cp36-none-any.whl
Collecting dm-sonnet
  Using cached dm_sonnet-2.0.0-py3-none-any.whl (254 kB)
Collecting dm-tree
  Using cached dm_tree-0.1.4-cp36-cp36m-macosx_10_9_x86_64.whl (93 kB)
Collecting tensorflow
  Using cached tensorflow-2.1.0-cp36-cp36m-macosx_10_11_x86_64.whl (120.8 MB)
Collecting tensorflow_probability
  Using cached tensorflow_probability-0.9.0-py2.py3-none-any.whl (3.2 MB)
Collecting trfl@ git+git://github.com/deepmind/trfl.git#egg=trfl
  Cloning git://github.com/deepmind/trfl.git to /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl
  Running command git clone -q git://github.com/deepmind/trfl.git /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl
Collecting tqdm
  Using cached tqdm-4.45.0-py2.py3-none-any.whl (60 kB)
Collecting kiwisolver>=1.0.1
  Using cached kiwisolver-1.2.0-cp36-cp36m-macosx_10_9_x86_64.whl (60 kB)
Collecting python-dateutil>=2.1
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1
  Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
Collecting cycler>=0.10
  Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting pytz>=2017.2
  Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Collecting descartes>=1.1.0
  Using cached descartes-1.1.0-py3-none-any.whl (5.8 kB)
Collecting statsmodels>=0.9.0
  Using cached statsmodels-0.11.1-cp36-cp36m-macosx_10_13_x86_64.whl (8.4 MB)
Collecting patsy>=0.4.1
  Using cached patsy-0.5.1-py2.py3-none-any.whl (231 kB)
Collecting mizani>=0.6.0
  Using cached mizani-0.6.0-py2.py3-none-any.whl (61 kB)
Collecting PyWavelets>=0.4.0
  Using cached PyWavelets-1.1.1-cp36-cp36m-macosx_10_9_x86_64.whl (4.3 MB)
Collecting imageio>=2.3.0
  Using cached imageio-2.8.0-py3-none-any.whl (3.3 MB)
Collecting networkx>=2.0
  Using cached networkx-2.4-py3-none-any.whl (1.6 MB)
Collecting pillow>=4.3.0
  Using cached Pillow-7.1.1-cp36-cp36m-macosx_10_10_x86_64.whl (2.2 MB)
Processing /Users/omega/Library/Caches/pip/wheels/32/42/7f/23cae9ff6ef66798d00dc5d659088e57dbba01566f6c60db63/wrapt-1.12.1-cp36-cp36m-macosx_10_7_x86_64.whl
Collecting tabulate>=0.7.5
  Using cached tabulate-0.8.7-py3-none-any.whl (24 kB)
Collecting keras-applications>=1.0.8
  Using cached Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
Processing /Users/omega/Library/Caches/pip/wheels/5c/2e/7e/a1d4d4fcebe6c381f378ce7743a3ced3699feb89bcfbdadadd/gast-0.2.2-cp36-none-any.whl
Collecting grpcio>=1.8.6
  Using cached grpcio-1.28.1-cp36-cp36m-macosx_10_9_x86_64.whl (2.6 MB)
Collecting protobuf>=3.8.0
  Using cached protobuf-3.11.3-cp36-cp36m-macosx_10_9_x86_64.whl (1.3 MB)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from tensorflow->bsuite[baselines]) (0.34.2)
Collecting tensorboard<2.2.0,>=2.1.0
  Using cached tensorboard-2.1.1-py3-none-any.whl (3.8 MB)
Collecting keras-preprocessing>=1.1.0
  Using cached Keras_Preprocessing-1.1.0-py2.py3-none-any.whl (41 kB)
Collecting opt-einsum>=2.3.2
  Using cached opt_einsum-3.2.0-py3-none-any.whl (63 kB)
Collecting tensorflow-estimator<2.2.0,>=2.1.0rc0
  Using cached tensorflow_estimator-2.1.0-py2.py3-none-any.whl (448 kB)
Collecting astor>=0.6.0
  Using cached astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting google-pasta>=0.1.6
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting cloudpickle>=1.2.2
  Using cached cloudpickle-1.3.0-py2.py3-none-any.whl (26 kB)
Collecting decorator
  Using cached decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Collecting palettable
  Using cached palettable-3.3.0-py2.py3-none-any.whl (111 kB)
Collecting h5py
  Using cached h5py-2.10.0-cp36-cp36m-macosx_10_6_intel.whl (3.0 MB)
Requirement already satisfied: setuptools in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from protobuf>=3.8.0->tensorflow->bsuite[baselines]) (46.1.3.post20200330)
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Using cached google_auth_oauthlib-0.4.1-py2.py3-none-any.whl (18 kB)
Collecting google-auth<2,>=1.6.3
  Downloading google_auth-1.14.0-py2.py3-none-any.whl (88 kB)
     |████████████████████████████████| 88 kB 707 kB/s
Collecting werkzeug>=0.11.15
  Using cached Werkzeug-1.0.1-py2.py3-none-any.whl (298 kB)
Collecting markdown>=2.6.8
  Using cached Markdown-3.2.1-py2.py3-none-any.whl (88 kB)
Collecting requests<3,>=2.21.0
  Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting requests-oauthlib>=0.7.0
  Using cached requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)
Collecting rsa<4.1,>=3.1.4
  Using cached rsa-4.0-py2.py3-none-any.whl (38 kB)
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting cachetools<5.0,>=2.0.0
  Using cached cachetools-4.1.0-py3-none-any.whl (10 kB)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/Caskroom/miniconda/base/envs/bsuite/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->bsuite[baselines]) (2020.4.5.1)
Collecting idna<3,>=2.5
  Using cached idna-2.9-py2.py3-none-any.whl (58 kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Using cached urllib3-1.25.8-py2.py3-none-any.whl (125 kB)
Collecting chardet<4,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting oauthlib>=3.0.0
  Using cached oauthlib-3.1.0-py2.py3-none-any.whl (147 kB)
Collecting pyasn1>=0.1.3
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Building wheels for collected packages: bsuite, trfl
  Building wheel for bsuite (setup.py) ... done
  Created wheel for bsuite: filename=bsuite-0.0.0-py3-none-any.whl size=177123 sha256=1d0d8738f92032e854e3ec9211a76fbc031b484dbd62dc75b27acfaf729bab93
  Stored in directory: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-ephem-wheel-cache-mkj55_9i/wheels/7b/5e/ac/15fb44dea4f625a5cf4801445436f8a50d023233f734fc7d41
  Building wheel for trfl (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-wheel-7q14rby_
       cwd: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/
  Complete output (5 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  error: could not create 'build': File exists
  ----------------------------------------
  ERROR: Failed building wheel for trfl
  Running setup.py clean for trfl
Successfully built bsuite
Failed to build trfl
Installing collected packages: six, absl-py, dm-tree, numpy, dm-env, kiwisolver, python-dateutil, pyparsing, cycler, matplotlib, pytz, pandas, scipy, descartes, patsy, statsmodels, palettable, mizani, plotnine, PyWavelets, pillow, imageio, decorator, networkx, scikit-image, termcolor, wrapt, tabulate, dm-sonnet, h5py, keras-applications, gast, grpcio, protobuf, pyasn1, rsa, pyasn1-modules, cachetools, google-auth, idna, urllib3, chardet, requests, oauthlib, requests-oauthlib, google-auth-oauthlib, werkzeug, markdown, tensorboard, keras-preprocessing, opt-einsum, tensorflow-estimator, astor, google-pasta, tensorflow, cloudpickle, tensorflow-probability, trfl, tqdm, bsuite
    Running setup.py install for trfl ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-record-_kglrl6r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/Caskroom/miniconda/base/envs/bsuite/include/python3.6m/trfl
         cwd: /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/
    Complete output (5 lines):
    running install
    running build
    running build_py
    creating build
    error: could not create 'build': File exists
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/Caskroom/miniconda/base/envs/bsuite/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"'; __file__='"'"'/private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-install-picf14vk/trfl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/24/1z9y7lhs7371zshgyqfgrzb40000gn/T/pip-record-_kglrl6r/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/Caskroom/miniconda/base/envs/bsuite/include/python3.6m/trfl Check the logs for full command output.
(bsuite) ➜  Developer python -c "import trfl"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'trfl'

Cannot import Random

After installation, I was getting the 'cannot import Random', there was conflict between your bsuite/baselines/random/random.py and system random.Random class (as suggested here). After refactoring to random_baseline.RandomBaseline everything seems to work.

`Catch._observation` does not follow the other environments with `_get_observation`

I didn't find a parent common interface for all the bsuite environments, but a common patter is to have a method get_observation to collect the current observation.

Catch, however, is the only environment to have an _observation method in place of a _get_observation one.
https://github.com/deepmind/bsuite/blob/6d8f64997ca256473c3d10be021431facc5a14d7/bsuite/environments/catch.py#L109-L114

Is there any specific reason why?
If not, would it be reasonable to omogenise the interface and make Catch compliant?

Contex:
I usually use a simple interface to interoperate between gym, bsuite, dm_env and other common libraries, and the lack of a shared interface for bsuite.Environments is an obstacle.

See also #44 for a tentative edit of Catch.
It does note modify the parent Environment yet.

Thanks,
Edu

Adding a pendulum environment/experiment

I wanted to propose adding a pendulum experiment to bsuite. I think it fits the targeted, simple, challenging, scalable, fast criteria outlined in the bsuite paper. Also, now that #8 has been merged, DMEnvFromGym can be used to convert the Openai pendulum environment to a bsuite environment without having to reimplement it i.e. something like

env = DMEnvFromGym(gym.make('Pendulum-v0'))

If there is interest I would be happy to work on it. Also, let me know if there are any concerns with implementing homegrown environments vs importing them from third parties like Openai.

How is the 'generalization' score computed?

In the notebook (bit.ly/bsuite-agents), I only found the description of 6 scores (basis, noise, scale, exploration, memory, and credit assignment). I wonder how is the generalization score computed? Thank you!

DQN mnist & mountain car performance

Hi,

while working on a PyTorch DQN agent for BSuite experiments, I noticed quite bad results on the mnist and mountain car experiments. I see that a similar question was addressed here, but the thread was closed.

To further investigate, I created a new conda environment, downloaded and installed a fresh copy of BSuite and ran the DQN agent from the baselines. The only settings I've changed were "bsuite_id" to "SWEEP" and the save path.

When you compare the results from both agents with the barplot on page 16 of the BSuite manuscript, you notice that both agents have worse performance on mnist and mountaincar and better performance on catch.

Were there any changes on the environments that I missed? The DQN agent from the manuscript did use the default parameters from the baseline directory, correct?

Thanks,
Peter

In deep_sea the threshold is set to 0.8 instead of 0.9 as the paper indicates.

The threshold is set to 0.8 by default instead of 0.9 as the paper indicates "The summary ‘score’
computes the percentage of runs for which the average regret drops below 0.9 faster than the 2^N episodes expected by dithering."

def find_solution(df_in: pd.DataFrame,
sweep_vars: Sequence[Text] = None,
merge: bool = True,
thresh: float = 0.8) -> pd.DataFrame:

Rendering control environments

In dm_control there was the ability to render control tasks like cartpole swingup, even though the environment only had a dynamics-based observation space. It would be nice to have that ability here, especially since the environment is from dm_control.
It looks like the rendering function in the Gym wrapper just returns the last observation (in both human and rgb_array mode), which doesn't really work for a lot of tasks in bsuite when the observation is not an rgb_array.

Is there some way to see grab RGB output for bsuite environments?

For now, I hacked in the cartpole specific viewer from Gym to cartpole.py and cartpole_swingup.py. It works and it doesn't look horrible, but it's not exactly ideal.

Environment seeding

How to set a seed in a bsuite environment instance? In the notebook, the output of sweep.SETTINGS has a seed attribute which is not None:

Loaded bsuite_id: bandit_noise/0.
bsuite_id=bandit_noise/0, settings={'noise_scale': 0.1, 'seed': 0}, num_episodes=10000
Loaded bsuite_id: bandit_noise/1.
bsuite_id=bandit_noise/1, settings={'noise_scale': 0.1, 'seed': 1}, num_episodes=10000
Loaded bsuite_id: bandit_noise/2.
bsuite_id=bandit_noise/2, settings={'noise_scale': 0.1, 'seed': 2}, num_episodes=10000
Loaded bsuite_id: bandit_noise/3.
bsuite_id=bandit_noise/3, settings={'noise_scale': 0.1, 'seed': 3}, num_episodes=10000

but when I printed it again on my own computer, seed was None (if I do that in the notebook, seed was None but there's an extra mapping_seed which was not None).

I tried two methods to seed the environment: (1) sweep.SETTINGS[bsuite_id]['seed']=0; (2) doing env.seed() after wrapping it with OpenAI env, but neither worked (multiple experiments, same seed, different results). A minimal example to demonstrate these two seeding methods are not working:

import random
import torch as t
import numpy as np
import bsuite
from bsuite import sweep
from bsuite.utils import gym_wrapper

def set_seed(seed, deterministic=True):
    random.seed(seed)
    np.random.seed(seed)
    t.manual_seed(seed)
    t.cuda.manual_seed_all(seed)
    t.cuda.manual_seed(seed)
    if deterministic:
        t.backends.cudnn.deterministic = True
        t.backends.cudnn.benchmark = False

set_seed(0)
bsuite_id = 'cartpole_swingup/0'
raw_env = bsuite.load_from_id(bsuite_id)

# method 1
sweep.SETTINGS[bsuite_id]['seed']=0
for episode in range(10):
    timestep = raw_env.reset()
    total_reward = 0
    while not timestep.last():
        action = np.random.choice(raw_env.action_spec().num_values)
        timestep = raw_env.step(action)
        total_reward += timestep.reward
    print(episode,total_reward)

# method 2
env = gym_wrapper.GymFromDMEnv(raw_env)
env.seed(seed=0)
for episode in range(10):
    timestep = env.reset()
    total_reward = 0
    done = False
    while not done:
        action = np.random.choice(raw_env.action_spec().num_values)
        sn,r,done,_ = env.step(action)
        total_reward += r
    print(episode,total_reward)

bsuite_tutorial.ipynb error - load bsuite environments as OpenAI gym-

Hey there!
Thanks for open sourcing this tool for understanding better behavior in RL agents :)

There seems to be an error in the colab tutorial when executing load bsuite environments as OpenAI gym cell

#@title Simple to load bsuite environments as OpenAI gym

from bsuite.utils import gym_wrapper
raw_env = bsuite.load_from_id(bsuite_id='memory_len/0')
env = gym_wrapper.GymWrapper(raw_env)
isinstance(env, gym.Env)

might

env = gym_wrapper.GymWrapper(raw_env)

env = gym_wrapper.GymFromDMEnv(raw_env)

, like the documentation pinpoints ?

My apologies for not submitting a PR here, I was not able to access the Colab doc .

Have a nice day !

Westworld host attribute matrix

Hi DeepMind.

No issue here; just came here to say that this project scaringly reminds me od Westworlds host attribute matrix 😄

Even some attributes are similar, such as exploration == curiosity.

Cheers!

Importing ABC directly from collections will be removed in Python 3.10

Use collections.abc instead

bsuite/logging/logging_utils.py
76:  if not isinstance(path_collection, collections.Mapping):

Tensorflow BOOT DQN agent loses performance after first iteration

Hi,

I am observing a strange behavior by the tensorflow default boot dqn agent that I am a bit baffled by.
When running sweeps over multiple environments, the agent loses its expected behavior after the first iteration and does not seem to explore. I've tried to debug for some time but haven't figured out the cause.

Code for reproduction (double-checked in a newly installed env):

import bsuite
from bsuite.baselines.tf import boot_dqn
from bsuite import sweep
from bsuite.baselines import experiment

bsuite_id = "DEEP_SEA"
log_dir = "./logs/"
bsuite_sweep = getattr(sweep, bsuite_id)[:3]

for id in bsuite_sweep:
    env = bsuite.load_and_record(id, save_path=log_dir, overwrite=True)
    agent = boot_dqn.default_agent(
        obs_spec=env.observation_spec(),
        action_spec=env.action_spec(),
    )
    
    experiment.run(agent, env, num_episodes=300)

Iterations 2 and 3 do not reach the end of the chain in 300 episodes and neither in very long training horizons (see also the colab link for results).

In contrast, the jax agent produces the expected results reliably in this loop (i.e., by replacing <bsuite.baselines.tf> with <bsuite.baselines.jax>).

The same can be observed in colab:
https://colab.research.google.com/drive/1hnJMDLG-aXCKKsjFqVd6YWGY4luz29ku?usp=sharing

best,
anyboby

Incompatible with numpy>0.24

np.int and analogous expired their deprecation period and have been removed in Numpy 0.24.0:
https://numpy.org/doc/stable/release/1.24.0-notes.html#expired-deprecations

This causes:

import bsuite

# Output
AttributeError: module 'numpy' has no attribute 'int'

How to add the results to results.py? What's the results format should be?

What value should be assigned to “experiments = {}”?

code:
bsuite/analysis/results.ipynb

`
#@ title loading results from local data:

experiments = {} # Add results here

DF, SWEEP_VARS = sqlite_load.load_bsuite(experiments)
`