Giter Club home page Giter Club logo

tensorforce's Introduction

Tensorforce: a TensorFlow library for applied reinforcement learning

Docs Gitter Build Status pypi version python version License Donate Donate

This project is not maintained any longer!

Introduction

Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of Google's TensorFlow framework and requires Python 3.

Tensorforce follows a set of high-level design choices which differentiate it from other similar libraries:

  • Modular component-based design: Feature implementations, above all, strive to be as generally applicable and configurable as possible, potentially at some cost of faithfully resembling details of the introducing paper.
  • Separation of RL algorithm and application: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.
  • Full-on TensorFlow models: The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, to enable portable computation graphs independent of application programming language, and to facilitate the deployment of models.

Quicklinks

Table of content

Installation

A stable version of Tensorforce is periodically updated on PyPI and installed as follows:

pip3 install tensorforce

To always use the latest version of Tensorforce, install the GitHub version instead:

git clone https://github.com/tensorforce/tensorforce.git
pip3 install -e tensorforce

Note on installation on M1 Macs: At the moment Tensorflow, which is a core dependency of Tensorforce, cannot be installed on M1 Macs directly. Follow the "M1 Macs" section in the documentation for a workaround.

Environments require additional packages for which there are setup options available (ale, gym, retro, vizdoom, carla; or envs for all environments), however, some require additional tools to be installed separately (see environments documentation). Other setup options include tfa for TensorFlow Addons and tune for HpBandSter required for the tune.py script.

Note on GPU usage: Different from (un)supervised deep learning, RL does not always benefit from running on a GPU, depending on environment and agent configuration. In particular for environments with low-dimensional state spaces (i.e., no images), it is hence worth trying to run on CPU only.

Quickstart example code

from tensorforce import Agent, Environment

# Pre-defined or custom environment
environment = Environment.create(
    environment='gym', level='CartPole', max_episode_timesteps=500
)

# Instantiate a Tensorforce agent
agent = Agent.create(
    agent='tensorforce',
    environment=environment,  # alternatively: states, actions, (max_episode_timesteps)
    memory=10000,
    update=dict(unit='timesteps', batch_size=64),
    optimizer=dict(type='adam', learning_rate=3e-4),
    policy=dict(network='auto'),
    objective='policy_gradient',
    reward_estimation=dict(horizon=20)
)

# Train for 300 episodes
for _ in range(300):

    # Initialize episode
    states = environment.reset()
    terminal = False

    while not terminal:
        # Episode timestep
        actions = agent.act(states=states)
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)

agent.close()
environment.close()

Command line usage

Tensorforce comes with a range of example configurations for different popular reinforcement learning environments. For instance, to run Tensorforce's implementation of the popular Proximal Policy Optimization (PPO) algorithm on the OpenAI Gym CartPole environment, execute the following line:

python3 run.py --agent benchmarks/configs/ppo.json --environment gym \
    --level CartPole-v1 --episodes 100

For more information check out the documentation.

Features

  • Network layers: Fully-connected, 1- and 2-dimensional convolutions, embeddings, pooling, RNNs, dropout, normalization, and more; plus support of Keras layers.
  • Network architecture: Support for multi-state inputs and layer (block) reuse, simple definition of directed acyclic graph structures via register/retrieve layer, plus support for arbitrary architectures.
  • Memory types: Simple batch buffer memory, random replay memory.
  • Policy distributions: Bernoulli distribution for boolean actions, categorical distribution for (finite) integer actions, Gaussian distribution for continuous actions, Beta distribution for range-constrained continuous actions, multi-action support.
  • Reward estimation: Configuration options for estimation horizon, future reward discount, state/state-action/advantage estimation, and for whether to consider terminal and horizon states.
  • Training objectives: (Deterministic) policy gradient, state-(action-)value approximation.
  • Optimization algorithms: Various gradient-based optimizers provided by TensorFlow like Adam/AdaDelta/RMSProp/etc, evolutionary optimizer, natural-gradient-based optimizer, plus a range of meta-optimizers.
  • Exploration: Randomized actions, sampling temperature, variable noise.
  • Preprocessing: Clipping, deltafier, sequence, image processing.
  • Regularization: L2 and entropy regularization.
  • Execution modes: Parallelized execution of multiple environments based on Python's multiprocessing and socket.
  • Optimized act-only SavedModel extraction.
  • TensorBoard support.

By combining these modular components in different ways, a variety of popular deep reinforcement learning models/features can be replicated:

Note that in general the replication is not 100% faithful, since the models as described in the corresponding paper often involve additional minor tweaks and modifications which are hard to support with a modular design (and, arguably, also questionable whether it is important/desirable to support them). On the upside, these models are just a few examples from the multitude of module combinations supported by Tensorforce.

Environment adapters

  • Arcade Learning Environment, a simple object-oriented framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games.
  • CARLA, is an open-source simulator for autonomous driving research.
  • OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms which supports teaching agents everything from walking to playing games like Pong or Pinball.
  • OpenAI Retro, lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games.
  • OpenSim, reinforcement learning with musculoskeletal models.
  • PyGame Learning Environment, learning environment which allows a quick start to Reinforcement Learning in Python.
  • ViZDoom, allows developing AI bots that play Doom using only the visual information.

Support, feedback and donating

Please get in touch via mail or on Gitter if you have questions, feedback, ideas for features/collaboration, or if you seek support for applying Tensorforce to your problem.

If you want to support the Tensorforce core team (see below), please also consider donating: GitHub Sponsors or Liberapay.

Core team and contributors

Tensorforce is currently developed and maintained by Alexander Kuhnle.

Earlier versions of Tensorforce (<= 0.4.2) were developed by Michael Schaarschmidt, Alexander Kuhnle and Kai Fricke.

The advanced parallel execution functionality was originally contributed by Jean Rabault (@jerabaul29) and Vincent Belus (@vbelus). Moreover, the pretraining feature was largely developed in collaboration with Hongwei Tang (@thw1021) and Jean Rabault (@jerabaul29).

The CARLA environment wrapper is currently developed by Luca Anzalone (@luca96).

We are very grateful for our open-source contributors (listed according to Github, updated periodically):

Islandman93, sven1977, Mazecreator, wassname, lefnire, daggertye, trickmeyer, mkempers, mryellow, ImpulseAdventure, janislavjankov, andrewekhalel, HassamSheikh, skervim, beflix, coord-e, benelot, tms1337, vwxyzjn, erniejunior, Deathn0t, petrbel, nrhodes, batu, yellowbee686, tgianko, AdamStelmaszczyk, BorisSchaeling, christianhidber, Davidnet, ekerazha, gitter-badger, kborozdin, Kismuz, mannsi, milesmcc, nagachika, neitzal, ngoodger, perara, sohakes, tomhennigan.

Cite Tensorforce

Please cite the framework as follows:

@misc{tensorforce,
  author       = {Kuhnle, Alexander and Schaarschmidt, Michael and Fricke, Kai},
  title        = {Tensorforce: a TensorFlow library for applied reinforcement learning},
  howpublished = {Web page},
  url          = {https://github.com/tensorforce/tensorforce},
  year         = {2017}
}

If you use the parallel execution functionality, please additionally cite it as follows:

@article{rabault2019accelerating,
  title        = {Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach},
  author       = {Rabault, Jean and Kuhnle, Alexander},
  journal      = {Physics of Fluids},
  volume       = {31},
  number       = {9},
  pages        = {094105},
  year         = {2019},
  publisher    = {AIP Publishing}
}

If you use Tensorforce in your research, you may additionally consider citing the following paper:

@article{lift-tensorforce,
  author       = {Schaarschmidt, Michael and Kuhnle, Alexander and Ellis, Ben and Fricke, Kai and Gessert, Felix and Yoneki, Eiko},
  title        = {{LIFT}: Reinforcement Learning in Computer Systems by Learning From Demonstrations},
  journal      = {CoRR},
  volume       = {abs/1808.07903},
  year         = {2018},
  url          = {http://arxiv.org/abs/1808.07903},
  archivePrefix = {arXiv},
  eprint       = {1808.07903}
}

tensorforce's People

Contributors

alexkuhnle avatar andrewekhalel avatar befelix avatar bwarre471 avatar christianhidber avatar coord-e avatar dependabot[bot] avatar dlperf avatar ernestum avatar hassamsheikh avatar impulseadventure avatar islandman93 avatar janislavjankov avatar krfricke avatar lefnire avatar louaaron avatar luca96 avatar mazecreator avatar michaelschaarschmidt avatar mkempers avatar mryellow avatar mschaars avatar petrosgk avatar skervim avatar sven1977 avatar tms1337 avatar trickmeyer avatar vbelus avatar vwxyzjn avatar wassname avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorforce's Issues

Load and test a learned policy

Hi, I was wondering if there's currently a straightforward way to load a saved policy and run that policy with an environment without training updates, or do I have to write my own runner for this purpose. Thanks.

Configs should not change when passing to an agent

Currently, a configuration contains additional default and internal values after the initialization of an agent. This should not be the case, instead the agent could, for instance, create a copy of the configuration before modification.

simple_q_agent example

When I try to run the simple_q_agent.py script I get the following error:

  File "/Users/aidanrocke/Desktop/open_ai_solutions/tensor_force/examples/simple_dqn.py", line 214, in main
    runner.run(max_episodes, max_timesteps, episode_finished=episode_finished)

  File "/Users/aidanrocke/tensorforce/tensorforce/execution/runner.py", line 58, in run
    action = self.agent.get_action(processed_state, self.episode)

  File "/Users/aidanrocke/tensorforce/tensorforce/agents/memory_agent.py", line 94, in get_action
    action = self.model.get_action(*args, **kwargs)

AttributeError: 'NoneType' object has no attribute 'get_action'

(Example of) support for multi-valued Box actions?

When trying to run the TRPO agent on BipedalWalker, as follows, I run into:

foo$ PYTHONPATH=. python examples/openai_gym.py BipedalWalker-v2 -D -a TRPOAgent -c examples/configs/trpo_agent.json -n examples/configs/trpo_network.json
....
File "/../tensorforce/tensorforce/environments/openai_gym.py", line 67, in execute
 state, reward, terminal, _ = self.gym.step(action)
File "/usr/local/lib/python2.7/dist-packages/gym/core.py", line 99, in step
 return self._step(action)
File "/usr/local/lib/python2.7/dist-packages/gym/wrappers/time_limit.py", line 36, in _step
 observation, reward, done, info = self.env.step(action)
File "/usr/local/lib/python2.7/dist-packages/gym/core.py", line 99, in step
 return self._step(action)
File "/usr/local/lib/python2.7/dist-packages/gym/envs/box2d/bipedal_walker.py", line 372, in _step
 self.joints[1].motorSpeed     = float(SPEED_KNEE    * np.sign(action[1]))
IndexError: list index out of range

Looking at OpenAIGym.actions, it doesn't seem to unravel that environment's Box(4) action space as wanted - am I just failing to configure the agent as required, or are such action spaces not handled right now?

Cannot install

on docker, this just hangs:

Step 7/8 : RUN pip install tensorforce[tf] -e .
 ---> Running in 55d5d05d7049
Obtaining file:///code/tensorforce

Gaussian distribution parameters ignored

If I create a distribution with Gaussian(distribution=(0, 0.1)), the parameters (0, 0.1) are ignored and instead the result from Gaussian.create_tf_operations is used. At the very least I would expect the parameters that I pass to Gaussian to be used as initial guesses for the parameterization.

In general the initial variance of the policy cannot be specified right now. In practice that's an important tuning parameter. The easiest way to do this might be to allow users to pass an instance of the distribution as part of the config, rather than the class.

Lastly, the sigmoid rescaling of the policy within Gaussian seems hacky. What if I already provide a custom network that has properly scaled actions? In that case I wouldn't want another sigmoid nonlinearity to be applied. I think this would better fit into the network_builder.

result logging and policy saving

Hi, maybe I'm missing something but where do you save the various training metrics (returns, entropy, etc) and is there a mechanism to save the trained model or do we have to implement that. Thanks!

Make Gaussian initial std configurable

From #26:

'Another thing I noticed in continuous state spaces is that the standard deviation of the Gaussian (exploration) noise is not parameterized. That seems like a bad default for this kind of on-policy method. It's an easy fix since the required code in theย Gaussianย class is just commented out, but enabling this does not seem possible without low-level adjustments at the moment.'

API: allow update from external batch

Agent API needs to allow to pass in a batch of experiences to update from - for use cases where data is collected in a way where passing it sample by sample to TensorForce isn't needed/creates too much I/O.

Min/max values for continuous actions

Currently it is possible to define min_value and max_value for continuous actions, but this value is never actually used. Part of the problem is that the so far only continuous distribution Gaussian does not naturally bound its possible samples.

Options of strategy about experience sampling at Replay.get_batch.

Current Replay.get_batch return the samples as continuous range of original sequence of experiences.
I'd like to get batch data whose each sample is picked up from memory at random to get rid of bias of samples.
I would like to add an option to change the strategy about sample in Replay.get_batch.

See #59

Quick start example raises TypeError

Fresh install. Command from http://tensorforce.readthedocs.io/en/latest/#quick-start:

python examples/openai_gym.py CartPole-v0 -a TRPOAgent -c examples/configs/trpo_agent.json -n examples/configs/trpo_network.json

Gives:

[2017-07-24 22:51:58,560] Making new env: CartPole-v0
Traceback (most recent call last):
  File "examples/openai_gym.py", line 121, in <module>
    main()
  File "examples/openai_gym.py", line 70, in main
    agent = agents[args.agent](config=agent_config)
  File "/home/tensorforce/tensorforce/agents/batch_agent.py", line 50, in __init__
    super(BatchAgent, self).__init__(config)
  File "/home/tensorforce/tensorforce/agents/agent.py", line 143, in __init__
    self.model = self.__class__.model(config)
  File "/home/tensorforce/tensorforce/models/trpo_model.py", line 54, in __init__
    super(TRPOModel, self).__init__(config)
  File "/home/tensorforce/tensorforce/models/policy_gradient_model.py", line 81, in __init__
    self.baseline = Baseline.from_config(config=config.baseline)
  File "/home/tensorforce/tensorforce/core/baselines/baseline.py", line 43, in from_config
    predefined=tensorforce.core.baselines.baselines
  File "/home/tensorforce/tensorforce/util.py", line 123, in get_object
    return obj(**full_kwargs)
TypeError: __init__() takes at least 2 arguments (1 given)

obj from util.py:119 is <class 'tensorforce.core.baselines.mlp.MLPBaseline'>, kwargs is None and full_kwargs is {}.

MLPBaseline's __init__ indeed takes at least 2 arguments.

Clean up dtype configuration

Ideally, we would want to allow to specify float precisions everywhere. Currently, we only use this in a few classes and inconsistently.

Documentation for epsilon decay

It's not the linear decay based on the remaining I was expecting.

self.epsilon -= ((self.epsilon - self.epsilon_final) / self.epsilon_timesteps) * timestep

So over 100 steps that takes about 30-40 to get "close" to epsilon_final.

Potential for optional decay mode.

Calling finalize() on the graph

The runner should probably call finalize on the graph, but if the runner is not used, we should also call finalize internally somewhere.

Incorrect number of columns computing lower triangular matrix in NAF agent

In naf_model.py, lines 71-79:

     if num_actions > 1:
                offset = num_actions
                l_columns = list()
                for zeros, size in enumerate(xrange(num_actions - 1, 0, -1), 1):
                    column = tf.pad(l_entries[:, offset: offset + size], ((0, 0), (zeros, 0)))
                    l_columns.append(column)
                    offset += size
                l_matrix += tf.stack(l_columns, 1)

I believe the number of columns given to tf.stack is incorrect (one too few). I think there needs to be an extra column, e.g. by adding something like:

l_columns.append(tf.zeros_like(l_columns[0]))

Is this correct?

The error I'm getting is:

ValueError: Dimensions must be equal, but are 59 and 58 for 'training_outputs/add' (op: 'Add') with input shapes: [?,59,59], [?,58,59].

from the line

l_matrix += tf.stack(l_columns, 1)

Investigate occasional NaN in TRPO

TRPO occasionally fails to produce a robust update with the langrange multiplier being None, need to check if gradient computation can produce None

Prioritized replay index out-of-range

Traceback (most recent call last):
  File "examples/openai_gym.py", line 121, in <module>
    main()
  File "examples/openai_gym.py", line 112, in main
    runner.run(args.episodes, args.max_timesteps, episode_finished=episode_finished)
  File "/home/yellow/work/tf/tensorforce/tensorforce/execution/runner.py", line 144, in run
    self.agent.observe(reward=reward, terminal=terminal)
  File "/home/yellow/work/tf/tensorforce/tensorforce/agents/dqn_agent.py", line 94, in observe
    super(DQNAgent, self).observe(reward=reward, terminal=terminal)
  File "/home/yellow/work/tf/tensorforce/tensorforce/agents/memory_agent.py", line 84, in observe
    internal=self.current_internal
  File "/home/yellow/work/tf/tensorforce/tensorforce/core/memories/prioritized_replay.py", line 55, in add_observation
    priority, _ = self.observations.pop(self.positive_priority_index)
IndexError: pop index out of range

setup.py and tensorflow with/without gpu

Hi,

Unless the goal is not to support tensorflow with gpu, I would recommend to move the tensorflow requirement to "extra_requires". I have seen this pattern in both sonnet and tensor2tensor.

For example:

extra_packages = {
'tensorflow': ['tensorflow>=1.0.1'],
'tensorflow with gpu': ['tensorflow-gpu>=1.0.1']
}

install_requires=[
'numpy',
'six',
'scipy',
'pillow',
'pytest'
]

setup_requires=['numpy', 'recommonmark', 'mistune']

setup(name='tensorforce',
version='0.2',
description='Reinforcement learning for TensorFlow',
url='http://github.com/reinforceio/tensorforce',
author='reinforce.io',
author_email='[email protected]',
license='Apache 2.0',
packages=['tensorforce'],
install_requires=install_requires,
extra_requires=extra_packages,
setup_requires=setup_requires,
zip_safe=False)

Regards,

Pedro

PS Will spend my weekend understanding tensorforce. Great work!

Issues with multiple continuous actions

Hi,
first of all, thanks for the hard work that is going into this project. You are saving me a ton of work.
Second, I encountered some strange behavior when trying to define an agent with multiple continuous actions. All code below was run in a Jupyter notebook with Anaconda and Python 3.5:

#Configuration, adapted from config in readme
config = Configuration(
    batch_size=100,
    states=dict(shape=(4,), type='float'),
    actions=dict(opt_a = dict(continuous=True, min_value = 0, max_value = 2),
                opt_b = dict(continuous=True, min_value = 0, max_value = 2)),
    network=layered_network_builder([dict(type='dense', size=50), dict(type='dense', size=50)])
)

# Create a TRPO agent
agent = TRPOAgent(config=config)

This code crashes with the trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-b10cf4edc1d7> in <module>()
      1 # Create a VPGA agent
----> 2 agent = TRPOAgent(config=config)

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/agents/batch_agent.py in __init__(self, config)
     48     def __init__(self, config):
     49         config.default(BatchAgent.default_config)
---> 50         super(BatchAgent, self).__init__(config)
     51         self.batch_size = config.batch_size
     52         self.batch = None

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/agents/agent.py in __init__(self, config)
    141         self.actions_config = config.actions
    142 
--> 143         self.model = self.__class__.model(config)
    144 
    145         self.episode = 0

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/models/trpo_model.py in __init__(self, config)
     52     def __init__(self, config):
     53         config.default(TRPOModel.default_config)
---> 54         super(TRPOModel, self).__init__(config)
     55 
     56         self.override_line_search = config.override_line_search

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/models/policy_gradient_model.py in __init__(self, config)
     81             self.baseline = Baseline.from_config(config=config.baseline)
     82 
---> 83         super(PolicyGradientModel, self).__init__(config)
     84 
     85         # advantage estimation

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/models/model.py in __init__(self, config)
    118                 scope = scope_context.__enter__()
    119 
--> 120             self.create_tf_operations(config)
    121 
    122             if config.distributed:

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/models/trpo_model.py in create_tf_operations(self, config)
    117 
    118             gradients = tf.gradients(fixed_kl_divergence, variables)
--> 119             gradient_vector_product = [tf.reduce_sum(g * t) for (g, t) in zip(gradients, tangents)]
    120 
    121             self.flat_variable_helper = FlatVarHelper(variables)

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/models/trpo_model.py in <listcomp>(.0)
    117 
    118             gradients = tf.gradients(fixed_kl_divergence, variables)
--> 119             gradient_vector_product = [tf.reduce_sum(g * t) for (g, t) in zip(gradients, tangents)]
    120 
    121             self.flat_variable_helper = FlatVarHelper(variables)

/Users/jannes/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py in r_binary_op_wrapper(y, x)
    895   def r_binary_op_wrapper(y, x):
    896     with ops.name_scope(None, op_name, [x, y]) as name:
--> 897       x = ops.convert_to_tensor(x, dtype=y.dtype.base_dtype, name="x")
    898       return func(x, y, name=name)
    899 

/Users/jannes/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype)
    649       name=name,
    650       preferred_dtype=preferred_dtype,
--> 651       as_ref=False)
    652 
    653 

/Users/jannes/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
    714 
    715         if ret is None:
--> 716           ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    717 
    718         if ret is NotImplemented:

/Users/jannes/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    174                                          as_ref=False):
    175   _ = as_ref
--> 176   return constant(v, dtype=dtype, name=name)
    177 
    178 

/Users/jannes/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name, verify_shape)
    163   tensor_value = attr_value_pb2.AttrValue()
    164   tensor_value.tensor.CopyFrom(
--> 165       tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
    166   dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
    167   const_tensor = g.create_op(

/Users/jannes/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape)
    358   else:
    359     if values is None:
--> 360       raise ValueError("None values not supported.")
    361     # if dtype is provided, forces numpy array to be the type
    362     # provided if possible.

ValueError: None values not supported.

I tried different agents and encountered another strange behavior:

# Create a VPG agent
agent = VPGAgent(config=config)
state = np.array([1,2,3,4])
agent.act(state)

Crashes with:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-73-565d0bd87882> in <module>()
----> 1 agent.act(state)

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/agents/agent.py in act(self, state, deterministic)
    194 
    195         # model action
--> 196         self.current_action, self.next_internal = self.model.get_action(state=self.current_state, internal=self.current_internal, deterministic=deterministic)
    197 
    198         # exploration

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/models/model.py in get_action(self, state, internal, deterministic)
    219         fetches.update({n: internal_output for n, internal_output in enumerate(self.internal_outputs)})
    220 
--> 221         feed_dict = {state_input: (state[name],) for name, state_input in self.state.items()}
    222         feed_dict.update({internal_input: (internal[n],) for n, internal_input in enumerate(self.internal_inputs)})
    223         feed_dict[self.deterministic] = deterministic

/Users/jannes/AnacondaProjects/tensorforce/tensorforce/models/model.py in <dictcomp>(.0)
    219         fetches.update({n: internal_output for n, internal_output in enumerate(self.internal_outputs)})
    220 
--> 221         feed_dict = {state_input: (state[name],) for name, state_input in self.state.items()}
    222         feed_dict.update({internal_input: (internal[n],) for n, internal_input in enumerate(self.internal_inputs)})
    223         feed_dict[self.deterministic] = deterministic

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

But when I redefine config, that is, I run

#Configuration, adapted from config in readme
config = Configuration(
    batch_size=100,
    states=dict(shape=(4,), type='float'),
    actions=dict(opt_a = dict(continuous=True, min_value = 0, max_value = 2),
                opt_b = dict(continuous=True, min_value = 0, max_value = 2)),
    network=layered_network_builder([dict(type='dense', size=50), dict(type='dense', size=50)])
)

again, it does not crash, but it occasionally outputs negative values for actions, although min_value = 0
{'opt_a': 0.28892395, 'opt_b': -0.10657883}
The PPO agent displays the same behavior as the VPG Agent.

I have tried this with many slightly different configurations, it seems to be a consistent issue.
Please let me know if you need any more code / info / data to reproduce the issue. Kindly, Jannes

Check state type in act()

Currently, not all iterables seem to work in agent.ac(), e.g. a tuple is expected and a nd-array of the correct shape can cause a tensorflow freeze without any error message.

Act needs to either:

  • Check incoming state type and shape against the given state config and raise an Error
  • Convert other types to tuples

Error running the example

[egor@host tensorforce]$ python examples/openai_gym.py CartPole-v0 -a TRPOAgent -c examples/configs/trpo_cartpole.json -n examples/configs/trpo_cartpole_network.json -s /home/egor/Software/tensorforce/examples/output
[2017-07-19 00:35:06,206] Making new env: CartPole-v0
2017-07-19 00:35:06.922073: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-19 00:35:06.922107: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-19 00:35:06.922116: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-19 00:35:06.922128: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-19 00:35:06.922135: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[2017-07-19 00:35:06,977] Starting TRPOAgent for Environment 'OpenAIGym(CartPole-v0)'
[2017-07-19 00:35:08,600] Finished episode 50 after 12 timesteps
[2017-07-19 00:35:08,600] Episode reward: 12.0
[2017-07-19 00:35:08,600] Average of last 500 rewards: 2.332
[2017-07-19 00:35:08,600] Average of last 100 rewards: 11.66
Saving agent after episode 100
Traceback (most recent call last):
  File "examples/openai_gym.py", line 121, in <module>
    main()
  File "examples/openai_gym.py", line 112, in main
    runner.run(args.episodes, args.max_timesteps, episode_finished=episode_finished)
  File "/home/egor/Software/tensorforce/tensorforce/execution/runner.py", line 158, in run
    self.agent.save_model(self.save_path)
  File "/home/egor/Software/tensorforce/tensorforce/agents/agent.py", line 238, in save_model
    self.model.save_model(path)
  File "/home/egor/Software/tensorforce/tensorforce/models/model.py", line 274, in save_model
    self.saver.save(self.session, path)
AttributeError: 'NoneType' object has no attribute 'save'

TRPO struggling with CartPole-v0 from quick start

After running python examples/quickstart.py (3000 episodes), the average reward from last 100 episodes is only 33.38. I would expect it to be close to the maximum, 200. Especially that it reached it couple of times before, e.g. on episode 1469, however later it deteriorates.

I also tried running it with provided command:

python examples/openai_gym.py CartPole-v0 -a TRPOAgent -c examples/configs/trpo_cartpole.json -n examples/configs/trpo_cartpole_network.json

However the results were also unsatisfactory:

[2017-07-24 23:58:58,363] Finished episode 4050 after 61 timesteps
[2017-07-24 23:58:58,363] Episode reward: 61.0
[2017-07-24 23:58:58,363] Average of last 500 rewards: 63.346
[2017-07-24 23:58:58,364] Average of last 100 rewards: 62.33

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.