Light

seungjaeryanlee / agents Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tensorflow/agents

1.0 1.0 0.0 4.16 MB

TF-Agents is a library for Reinforcement Learning in TensorFlow

License: Apache License 2.0

Shell 0.18% Python 91.67% Jupyter Notebook 8.14%

agents's Introduction

👋 Hi there

I am a Machine Learning Engineer at Bloomberg. I believe in learning by doing, and I always welcome new, exciting challenges that will help me grow.

I graduated froom Princeton University with a Bachelors Degree on Mathematics with minors in (1) Computer Science and (2) Machine Learning.

⚡ Fun fact ⚡: When I type, I use all five fingers of my left hand, but just two fingers of my right hand.

📫 How to reach me

agents's People

Contributors

Stargazers

Watchers

agents's Issues

RNDPPO Benchmark: LunarLander-v2

Commit: 7d09b27
TensorBoard Folder

RNDPPO: MontezumaRevenge-v0

Commit d44c53e
TensorBoard Folder
Ran on GCP

How to use `replay_buffer.as_dataset()` for minibatches

I tried to use the replay_buffer.as_dataset() the same way as the TD3 example:

agents/tf_agents/agents/td3/examples/v2/train_eval.py

Line 227 in c0ee158

dataset = replay_buffer.as_dataset(

agents/tf_agents/agents/ppo/examples/v2/train_eval_gym.py

Line 249 in c0ee158

dataset = replay_buffer.as_dataset(

dataset = replay_buffer.as_dataset(
             sample_batch_size=30,
             num_steps=64+1,
             num_parallel_calls=1
).prefetch(3)
iterator = iter(dataset)

def train_step():
  experience, _ = next(iterator)
  loss_info = tf_agent.train(experience)
  # TODO(seungjaeryanlee): Can't use for loop
  # AttributeError: Tensor.op is meaningless when eager execution is enabled.
  # for experience, _ in dataset:
  #   loss_info = tf_agent.train(experience)
  return loss_info

For LunarLander-v2, I thought that it would work if sample_batch_size = 30 and num_steps = 128+1, but it gives the following error:

python tf_agents/agents/ppo/examples/v2/train_eval_gym.py   --root_dir=$HOME/tmp/rndppo/gym/LunarLander-v2/   --logtostderr --use_rnd
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2019-07-30 00:54:19.740202: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1696250000 Hz
2019-07-30 00:54:19.740708: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5637b089c420 executing computations on platform Host. Devices:
2019-07-30 00:54:19.740771: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
I0730 00:54:19.771466 140352013358912 parallel_py_environment.py:81] Spawning all processes.
I0730 00:54:20.329198 140352013358912 parallel_py_environment.py:88] All processes started.
W0730 00:54:20.984402 140352013358912 module_wrapper.py:136] From /home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/util/module_wrapper.py:163: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.

W0730 00:54:22.374413 140352013358912 deprecation.py:323] From /home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py:317: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0730 00:54:31.984498 140352013358912 deprecation.py:323] From /home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/training/optimizer.py:172: BaseResourceVariable.constraint (from tensorflow.python.ops.resource_variable_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Apply a constraint manually following the optimizer update step.
2019-07-30 00:55:14.501165: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_get_next_347}} assertion failed: [TFUniformReplayBuffer is empty. Make sure to add items before sampling the buffer.] [Condition x > y did not hold element-wise:x (TFUniformReplayBuffer/get_next/Select_1:0) = ] [0] [y (TFUniformReplayBuffer/get_next/Select:0) = ] [0]
         [[{{node TFUniformReplayBuffer/get_next/assert_greater/Assert/AssertGuard/else/_1/Assert}}]]
         [[IteratorGetNext]]
Traceback (most recent call last):
  File "tf_agents/agents/ppo/examples/v2/train_eval_gym.py", line 346, in <module>
    app.run(main)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "tf_agents/agents/ppo/examples/v2/train_eval_gym.py", line 341, in main
    num_eval_episodes=FLAGS.num_eval_episodes)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/config.py", line 1032, in wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/config.py", line 1009, in wrapper
    return fn(*new_args, **new_kwargs)
  File "tf_agents/agents/ppo/examples/v2/train_eval_gym.py", line 292, in train_eval
    total_loss, _ = train_step()
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 451, in __call__
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 665, in _filtered_call
    self.captured_inputs)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 778, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 471, in call
    ctx=ctx)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError:   assertion failed: [TFUniformReplayBuffer is empty. Make sure to add items before sampling the buffer.] [Condition x > y did not hold element-wise:x (TFUniformReplayBuffer/get_next/Select_1:0) = ] [0] [y (TFUniformReplayBuffer/get_next/Select:0) = ] [0]
         [[{{node TFUniformReplayBuffer/get_next/assert_greater/Assert/AssertGuard/else/_1/Assert}}]]
         [[IteratorGetNext]] [Op:__inference_train_step_69111]

Function call stack:
train_step -> train_step

  In call to configurable 'train_eval' (<function train_eval at 0x7fa63fdd11e0>)

The error does not seem to appear when num_steps=64+1 or smaller.

On a similar note, in the TD3 example, am I understanding it correctly in that it only calls next(iterator) once, so it is only using one minibatch?

Thank you!

PPO train_eval_atari throws error

NOTE: This might be an issue related to the exploding loss, since it only happened in Venture (has exploding loss) and not in Pong (no exploding loss)

E0828 09:13:05.150099 140230284850944 parallel_py_environment.py:390] Error in environment process: Traceback (most recent call last):
  File "/mnt/rlee0201/git/agents/tf_agents/environments/parallel_py_environment.py", line 377, in _worker
    result = getattr(env, name)(*args, **kwargs)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_wrappers.py", line 86, in _step
    time_step = self._env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/gym_wrapper.py", line 178, in _step
    observation, reward, self._done, self._info = self._gym_env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_wrappers.py", line 57, in step
    observation, reward, done, info = self._env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_preprocessing.py", line 146, in step
    _, reward, game_over, info = self.env.step(action)
  File "/mnt/rlee0201/anaconda3/envs/gsoc/lib/python3.7/site-packages/gym/envs/atari/atari_env.py", line 113, in step
    action = self._action_set[a]
IndexError: index 18 is out of bounds for axis 0 with size 18

E0828 09:13:05.152348 140230284850944 parallel_py_environment.py:390] Error in environment process: Traceback (most recent call last):
  File "/mnt/rlee0201/git/agents/tf_agents/environments/parallel_py_environment.py", line 377, in _worker
    result = getattr(env, name)(*args, **kwargs)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_wrappers.py", line 86, in _step
    time_step = self._env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/gym_wrapper.py", line 178, in _step
    observation, reward, self._done, self._info = self._gym_env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_wrappers.py", line 57, in step
    observation, reward, done, info = self._env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_preprocessing.py", line 146, in step
    _, reward, game_over, info = self.env.step(action)
  File "/mnt/rlee0201/anaconda3/envs/gsoc/lib/python3.7/site-packages/gym/envs/atari/atari_env.py", line 113, in step
    action = self._action_set[a]
IndexError: index 18 is out of bounds for axis 0 with size 18

tf.clip_by_value results in an error in _init_rnd_normalizer

Upon searching online, it seems like this happens when the memory is too small?

2019-07-25 04:09:29.063615: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-25 04:09:29.088399: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2712000000 Hz
2019-07-25 04:09:29.088872: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ec3676cd00 executing computations on platform Host. Devices:
2019-07-25 04:09:29.088950: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
I0725 04:09:29.253317 139884285708032 parallel_py_environment.py:81] Spawning all processes.
I0725 04:09:31.808124 139884285708032 parallel_py_environment.py:88] All processes started.
W0725 04:09:40.928066 139884285708032 deprecation.py:323] From /home/rlee/git/agents/tf_agents/replay_buffers/tf_uniform_replay_buffer.py:540: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Traceback (most recent call last):
  File "tf_agents/agents/ppo/examples/v2/train_eval_atari.py", line 335, in <module>
    app.run(main)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "tf_agents/agents/ppo/examples/v2/train_eval_atari.py", line 330, in main
    num_eval_episodes=FLAGS.num_eval_episodes)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/config.py", line 1032, in wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/config.py", line 1009, in wrapper
    return fn(*new_args, **new_kwargs)
  File "tf_agents/agents/ppo/examples/v2/train_eval_atari.py", line 247, in train_eval
    tf_agent._init_rnd_normalizer(experience=trajectories)
  File "/home/rlee/git/agents/tf_agents/agents/ppo/rndppo_agent.py", line 330, in _init_rnd_normalizer
    intrinsic_rewards, _ = self.rnd_loss(time_steps, debug_summaries=self._debug_summaries)
  File "/home/rlee/git/agents/tf_agents/agents/ppo/rndppo_agent.py", line 777, in rnd_loss
    self._observation_clip_value)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/ops/clip_ops.py", line 83, in clip_by_value
    t_min = math_ops.minimum(values, clip_value_max)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6443, in minimum
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.NotFoundError: Could not find valid device for node.
Node:{{node Minimum}}
All kernels registered for op Minimum :
  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='XLA_CPU'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64, DT_BFLOAT16, DT_HALF]
  device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64, DT_BFLOAT16, DT_HALF]
 [Op:Minimum]
  In call to configurable 'train_eval' (<function train_eval at 0x7f39591f71e0>)

RND fails on LunarLander-v2

The problem is gone when I don't normalize observation by dividing by 255. The high value estimation loss does not seem to matter.

~~It used to work, but now it gives worse performance than vanilla PPO. I suspect it has something to do with~~

~~overly high value estimation loss OR~~
~~observation normalization~~

Algorithm	Average Return	Value Estimation Loss
RND
PPO

RNDPPO Unnormalized vs Normalized Benchmark: LunarLander-v2

Commit 57562ff
TensorBoard folder
Light Blue: Unnormalized
Blue: Normalized

Action out of bound when running PPO on Atari

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/tf_py_environment.py", line 203, in _step_py
    self._time_step = self._env.step(packed)

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/parallel_py_environment.py", line 135, in _step
    time_steps = [promise() for promise in time_steps]

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/parallel_py_environment.py", line 135, in <listcomp>
    time_steps = [promise() for promise in time_steps]

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/parallel_py_environment.py", line 337, in _receive
    raise Exception(stacktrace)

Exception: Traceback (most recent call last):
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/parallel_py_environment.py", line 376, in _worker
    result = getattr(env, name)(*args, **kwargs)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/atari_wrappers.py", line 86, in _step
    time_step = self._env.step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/gym_wrapper.py", line 178, in _step
    observation, reward, self._done, self._info = self._gym_env.step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/atari_wrappers.py", line 57, in step
    observation, reward, done, info = self._env.step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/atari_preprocessing.py", line 146, in step
    _, reward, game_over, info = self.env.step(action)
  File "/home/seungjaeryanlee/anaconda3/envs/gsoc/lib/python3.6/site-packages/gym/envs/atari/atari_env.py", line 113, in step
    action = self._action_set[a]
IndexError: index 18 is out of bounds for axis 0 with size 18



         [[{{node driver_loop/body/_1/step/step_py_func}}]] [Op:__inference_run_44015]

Function call stack:
run

  In call to configurable 'train_eval' (<function train_eval at 0x7fcde7d1f7b8>)

Value loss explodes in Atari Venture for both PPO and RND

As shown in the figure above, the value estimation loss explodes. This is shown both in PPO (shown above) and RND, so I am assuming that it might be due to how I am handling Atari environment.

RNDDQN Benchmark: LunarLander-v2

Commit: 2495edc
TensorBoard Folder

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.