Giter Club home page Giter Club logo

agents's Introduction

๐Ÿ‘‹ Hi there

I am a Machine Learning Engineer at Bloomberg. I believe in learning by doing, and I always welcome new, exciting challenges that will help me grow.

I graduated froom Princeton University with a Bachelors Degree on Mathematics with minors in (1) Computer Science and (2) Machine Learning.

โšก Fun fact โšก: When I type, I use all five fingers of my left hand, but just two fingers of my right hand.

๐Ÿ“ซ How to reach me

LinkedIn Logo Twitter Logo

agents's People

Contributors

adammichaelwood avatar ageron avatar alexlee-gk avatar anandijain avatar bartokg avatar bkungfoo avatar cclauss avatar cornellgit avatar dantup avatar ebrevdo avatar efiko avatar egonina avatar eholly-g avatar jaingaurav avatar jmribeiro avatar kbanoop avatar kuanghuei avatar marek-at-work avatar mhe500 avatar mmoffitt avatar nealwu avatar npfp avatar oars avatar pana1990 avatar peterzhizhin avatar samfishman avatar seungjaeryanlee avatar sguada avatar tfboyd avatar vcarbune avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

agents's Issues

How to use `replay_buffer.as_dataset()` for minibatches

I tried to use the replay_buffer.as_dataset() the same way as the TD3 example:

dataset = replay_buffer.as_dataset(

dataset = replay_buffer.as_dataset(

dataset = replay_buffer.as_dataset(
             sample_batch_size=30,
             num_steps=64+1,
             num_parallel_calls=1
).prefetch(3)
iterator = iter(dataset)

def train_step():
  experience, _ = next(iterator)
  loss_info = tf_agent.train(experience)
  # TODO(seungjaeryanlee): Can't use for loop
  # AttributeError: Tensor.op is meaningless when eager execution is enabled.
  # for experience, _ in dataset:
  #   loss_info = tf_agent.train(experience)
  return loss_info

For LunarLander-v2, I thought that it would work if sample_batch_size = 30 and num_steps = 128+1, but it gives the following error:

python tf_agents/agents/ppo/examples/v2/train_eval_gym.py   --root_dir=$HOME/tmp/rndppo/gym/LunarLander-v2/   --logtostderr --use_rnd
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2019-07-30 00:54:19.740202: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1696250000 Hz
2019-07-30 00:54:19.740708: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5637b089c420 executing computations on platform Host. Devices:
2019-07-30 00:54:19.740771: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
I0730 00:54:19.771466 140352013358912 parallel_py_environment.py:81] Spawning all processes.
I0730 00:54:20.329198 140352013358912 parallel_py_environment.py:88] All processes started.
W0730 00:54:20.984402 140352013358912 module_wrapper.py:136] From /home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/util/module_wrapper.py:163: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.

W0730 00:54:22.374413 140352013358912 deprecation.py:323] From /home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py:317: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0730 00:54:31.984498 140352013358912 deprecation.py:323] From /home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/training/optimizer.py:172: BaseResourceVariable.constraint (from tensorflow.python.ops.resource_variable_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Apply a constraint manually following the optimizer update step.
2019-07-30 00:55:14.501165: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_get_next_347}} assertion failed: [TFUniformReplayBuffer is empty. Make sure to add items before sampling the buffer.] [Condition x > y did not hold element-wise:x (TFUniformReplayBuffer/get_next/Select_1:0) = ] [0] [y (TFUniformReplayBuffer/get_next/Select:0) = ] [0]
         [[{{node TFUniformReplayBuffer/get_next/assert_greater/Assert/AssertGuard/else/_1/Assert}}]]
         [[IteratorGetNext]]
Traceback (most recent call last):
  File "tf_agents/agents/ppo/examples/v2/train_eval_gym.py", line 346, in <module>
    app.run(main)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "tf_agents/agents/ppo/examples/v2/train_eval_gym.py", line 341, in main
    num_eval_episodes=FLAGS.num_eval_episodes)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/config.py", line 1032, in wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/config.py", line 1009, in wrapper
    return fn(*new_args, **new_kwargs)
  File "tf_agents/agents/ppo/examples/v2/train_eval_gym.py", line 292, in train_eval
    total_loss, _ = train_step()
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 451, in __call__
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 665, in _filtered_call
    self.captured_inputs)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 778, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 471, in call
    ctx=ctx)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError:   assertion failed: [TFUniformReplayBuffer is empty. Make sure to add items before sampling the buffer.] [Condition x > y did not hold element-wise:x (TFUniformReplayBuffer/get_next/Select_1:0) = ] [0] [y (TFUniformReplayBuffer/get_next/Select:0) = ] [0]
         [[{{node TFUniformReplayBuffer/get_next/assert_greater/Assert/AssertGuard/else/_1/Assert}}]]
         [[IteratorGetNext]] [Op:__inference_train_step_69111]

Function call stack:
train_step -> train_step

  In call to configurable 'train_eval' (<function train_eval at 0x7fa63fdd11e0>)

The error does not seem to appear when num_steps=64+1 or smaller.

On a similar note, in the TD3 example, am I understanding it correctly in that it only calls next(iterator) once, so it is only using one minibatch?

Thank you!

PPO train_eval_atari throws error

NOTE: This might be an issue related to the exploding loss, since it only happened in Venture (has exploding loss) and not in Pong (no exploding loss)

E0828 09:13:05.150099 140230284850944 parallel_py_environment.py:390] Error in environment process: Traceback (most recent call last):
  File "/mnt/rlee0201/git/agents/tf_agents/environments/parallel_py_environment.py", line 377, in _worker
    result = getattr(env, name)(*args, **kwargs)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_wrappers.py", line 86, in _step
    time_step = self._env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/gym_wrapper.py", line 178, in _step
    observation, reward, self._done, self._info = self._gym_env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_wrappers.py", line 57, in step
    observation, reward, done, info = self._env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_preprocessing.py", line 146, in step
    _, reward, game_over, info = self.env.step(action)
  File "/mnt/rlee0201/anaconda3/envs/gsoc/lib/python3.7/site-packages/gym/envs/atari/atari_env.py", line 113, in step
    action = self._action_set[a]
IndexError: index 18 is out of bounds for axis 0 with size 18

E0828 09:13:05.152348 140230284850944 parallel_py_environment.py:390] Error in environment process: Traceback (most recent call last):
  File "/mnt/rlee0201/git/agents/tf_agents/environments/parallel_py_environment.py", line 377, in _worker
    result = getattr(env, name)(*args, **kwargs)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_wrappers.py", line 86, in _step
    time_step = self._env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/gym_wrapper.py", line 178, in _step
    observation, reward, self._done, self._info = self._gym_env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_wrappers.py", line 57, in step
    observation, reward, done, info = self._env.step(action)
  File "/mnt/rlee0201/git/agents/tf_agents/environments/atari_preprocessing.py", line 146, in step
    _, reward, game_over, info = self.env.step(action)
  File "/mnt/rlee0201/anaconda3/envs/gsoc/lib/python3.7/site-packages/gym/envs/atari/atari_env.py", line 113, in step
    action = self._action_set[a]
IndexError: index 18 is out of bounds for axis 0 with size 18

tf.clip_by_value results in an error in _init_rnd_normalizer

Upon searching online, it seems like this happens when the memory is too small?

2019-07-25 04:09:29.063615: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-25 04:09:29.088399: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2712000000 Hz
2019-07-25 04:09:29.088872: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ec3676cd00 executing computations on platform Host. Devices:
2019-07-25 04:09:29.088950: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
I0725 04:09:29.253317 139884285708032 parallel_py_environment.py:81] Spawning all processes.
I0725 04:09:31.808124 139884285708032 parallel_py_environment.py:88] All processes started.
W0725 04:09:40.928066 139884285708032 deprecation.py:323] From /home/rlee/git/agents/tf_agents/replay_buffers/tf_uniform_replay_buffer.py:540: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Traceback (most recent call last):
  File "tf_agents/agents/ppo/examples/v2/train_eval_atari.py", line 335, in <module>
    app.run(main)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "tf_agents/agents/ppo/examples/v2/train_eval_atari.py", line 330, in main
    num_eval_episodes=FLAGS.num_eval_episodes)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/config.py", line 1032, in wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/gin/config.py", line 1009, in wrapper
    return fn(*new_args, **new_kwargs)
  File "tf_agents/agents/ppo/examples/v2/train_eval_atari.py", line 247, in train_eval
    tf_agent._init_rnd_normalizer(experience=trajectories)
  File "/home/rlee/git/agents/tf_agents/agents/ppo/rndppo_agent.py", line 330, in _init_rnd_normalizer
    intrinsic_rewards, _ = self.rnd_loss(time_steps, debug_summaries=self._debug_summaries)
  File "/home/rlee/git/agents/tf_agents/agents/ppo/rndppo_agent.py", line 777, in rnd_loss
    self._observation_clip_value)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/ops/clip_ops.py", line 83, in clip_by_value
    t_min = math_ops.minimum(values, clip_value_max)
  File "/home/rlee/anaconda3/envs/gsoc/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6443, in minimum
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.NotFoundError: Could not find valid device for node.
Node:{{node Minimum}}
All kernels registered for op Minimum :
  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='XLA_CPU'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64, DT_BFLOAT16, DT_HALF]
  device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64, DT_BFLOAT16, DT_HALF]
 [Op:Minimum]
  In call to configurable 'train_eval' (<function train_eval at 0x7f39591f71e0>)

RND fails on LunarLander-v2

The problem is gone when I don't normalize observation by dividing by 255. The high value estimation loss does not seem to matter.


It used to work, but now it gives worse performance than vanilla PPO. I suspect it has something to do with

  1. overly high value estimation loss OR
  2. observation normalization
Algorithm Average Return Value Estimation Loss
RND image image
PPO image image

Action out of bound when running PPO on Atari

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/tf_py_environment.py", line 203, in _step_py
    self._time_step = self._env.step(packed)

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/parallel_py_environment.py", line 135, in _step
    time_steps = [promise() for promise in time_steps]

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/parallel_py_environment.py", line 135, in <listcomp>
    time_steps = [promise() for promise in time_steps]

  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/parallel_py_environment.py", line 337, in _receive
    raise Exception(stacktrace)

Exception: Traceback (most recent call last):
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/parallel_py_environment.py", line 376, in _worker
    result = getattr(env, name)(*args, **kwargs)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/atari_wrappers.py", line 86, in _step
    time_step = self._env.step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/py_environment.py", line 174, in step
    self._current_time_step = self._step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/gym_wrapper.py", line 178, in _step
    observation, reward, self._done, self._info = self._gym_env.step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/atari_wrappers.py", line 57, in step
    observation, reward, done, info = self._env.step(action)
  File "/home/seungjaeryanlee/git/agents/tf_agents/environments/atari_preprocessing.py", line 146, in step
    _, reward, game_over, info = self.env.step(action)
  File "/home/seungjaeryanlee/anaconda3/envs/gsoc/lib/python3.6/site-packages/gym/envs/atari/atari_env.py", line 113, in step
    action = self._action_set[a]
IndexError: index 18 is out of bounds for axis 0 with size 18



         [[{{node driver_loop/body/_1/step/step_py_func}}]] [Op:__inference_run_44015]

Function call stack:
run

  In call to configurable 'train_eval' (<function train_eval at 0x7fcde7d1f7b8>)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.