openai / gym Goto Github PK

View Code? Open in Web Editor NEW

33.9K 33.9K 8.6K 6.95 MB

A toolkit for developing and comparing reinforcement learning algorithms.

Home Page: https://www.gymlibrary.dev

License: Other

Shell 0.04% Python 99.87% Dockerfile 0.09%

gym's Introduction

Important Notice

The team that has been maintaining Gym since 2021 has moved all future development to Gymnasium, a drop in replacement for Gym (import gymnasium as gym), and Gym will not be receiving any future updates. Please switch over to Gymnasium as soon as you're able to do so. If you'd like to read more about the story behind this switch, please check out this blog post.

Gym

Gym is an open source Python library for developing and comparing reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API. Since its release, Gym's API has become the field standard for doing this.

Gym documentation website is at https://www.gymlibrary.dev/, and you can propose fixes and changes to it here.

Gym also has a discord server for development purposes that you can join here: https://discord.gg/nHg2JRN489

Installation

To install the base Gym library, use pip install gym.

This does not include dependencies for all families of environments (there's a massive number, and some can be problematic to install on certain systems). You can install these dependencies for one family like pip install gym[atari] or use pip install gym[all] to install all dependencies.

We support Python 3.7, 3.8, 3.9 and 3.10 on Linux and macOS. We will accept PRs related to Windows, but do not officially support it.

API

The Gym API's API models environments as simple Python env classes. Creating environment instances and interacting with them is very simple- here's an example using the "CartPole-v1" environment:

import gym
env = gym.make("CartPole-v1")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()
env.close()

Notable Related Libraries

Please note that this is an incomplete list, and just includes libraries that the maintainers most commonly point newcommers to when asked for recommendations.

CleanRL is a learning library based on the Gym API. It is designed to cater to newer people in the field and provides very good reference implementations.
Tianshou is a learning library that's geared towards very experienced users and is design to allow for ease in complex algorithm modifications.
RLlib is a learning library that allows for distributed training and inferencing and supports an extraordinarily large number of features throughout the reinforcement learning space.
PettingZoo is like Gym, but for environments with multiple agents.

Environment Versioning

Gym keeps strict versioning for reproducibility reasons. All environments end in a suffix like "_v0". When changes are made to environments that might impact learning results, the number is increased by one to prevent potential confusion.

MuJoCo Environments

The latest "_v4" and future versions of the MuJoCo environments will no longer depend on mujoco-py. Instead mujoco will be the required dependency for future gym MuJoCo environment versions. Old gym MuJoCo environment versions that depend on mujoco-py will still be kept but unmaintained. To install the dependencies for the latest gym MuJoCo environments use pip install gym[mujoco]. Dependencies for old MuJoCo environments can still be installed by pip install gym[mujoco_py].

Citation

A whitepaper from when Gym just came out is available https://arxiv.org/pdf/1606.01540, and can be cited with the following bibtex entry:

@misc{1606.01540,
  Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
  Title = {OpenAI Gym},
  Year = {2016},
  Eprint = {arXiv:1606.01540},
}

Release Notes

There used to be release notes for all the new Gym versions here. New release notes are being moved to releases page on GitHub, like most other libraries do. Old notes can be viewed here.

gym's People

Contributors

Stargazers

Watchers

Forkers

trustifier gitter-badger wavelets codeaudit domluna snurkabill nishithbsk rtvt123 fdoperezi rajendraranabhat iandanforth mkolod tuxdna thegreatshasha nescotos lgmars johannah kevark hnkulkarni amoliu justinsmorganhhc wanjinchang rustam-e caomw cliff007 tianmh dennyglee mowayao oztc zhmz90 legendvijay phvu tspannhw aniruddhm unbornchikken winning1120xx ishafizan nghiatranuit sxjscience gandalfvn carpedm20 akriot moheo ml-lab ucheeke zozo123 outlaw-poet dommueller prassorathia riashat junaidqadirb charubutr cretaceous-creature jaysquare87 ddekker bquast atousatorabi buddies2705 tailintalent anichikage davinirjr rdspring1 neuroidss dwatkins123 orchestor serhii-havrylov keironstoddart gerjo gdtm86 agaurav srinathji gdb ry darriall botev elibol aforr axissun1 tuevo21 animesh-garg rlugojr stone8oy bccw tigerneil xhjftech w0w obinsc saycv apurveyajnik ncjie saurav111 yiiwood little1tow birkhimerjr lixiaosi33 wb14123 richardkelley fzls feynman27 jonathathan

gym's Issues

Bugs in Taxi-v1 environment

I think the strings for the last action of North and South are switched. Also sometimes there seems to be a passenger to pick up, but no destination.

Are there any plans to extend the environment in a probabilistic manner? Like in the original paper when choosing an action there is a 20 % chance to end up somewhere else.

Taxi-v0 isn't returning the right reward for dropoff

It returns -1 reward for a successful dropoff. The paper (https://www.jair.org/media/639/live-639-1834-jair.pdf) uses +20.

Env should provide reward range

In some algorithms like R-Max it is necessary to know the maximum reward given at any single step by the environment. Just like continuous state environments provide ranges for each dimension, I suggest that they should provide reward ranges too. Note that it is not the accumulated reward range, but single step reward range.

RepeatCopy description does not fit task

The description of RepeatCopy and the task don't match.

A generic input is [mx₁x₂...x_k] and the desired output is [x₁x₂ ...x_kx₁x₂...x_kx₁x₂...x_k], where the number of copies is given by m. Thus the goal is to copy the input m times, where m can be only 2 or 3.

However, m is always 3 and it is never displayed as an observation. Furthermore, the expected output is not just a copy, but the the second copy is reversed, as in [x₁x₂...x_kx_k...x₂x₁x₁x₂...x_k].

Env.isTerminal(observation)

I'm working with model-based RL and it is necessary to know if some observation is terminal without experiencing it. But this information is buried into the environment's step function when assigning a value to "done". Then, for instance, I have to hardcode the condition for each environment in my algorithm (like "if observation[0] >= 0.5" in Mountain Car). If there was a "isTerminal(observation)" method in Env, hardcoding wouldn't be necessary.
I don't know if it is ok according to the Gym philosophy or if you think it is too much work for too little, so I'll leave it here for discussion.

Edit: according to the project's coding style, I think the method's name would be is_terminal instead.

Write more documentation about environments

We should write a more detailed explanation of every environment, in particular, how the reward function is computed.

Docker is available?

Monitoring causes OOM Error when python holds more than 50% of available memory

tl;dr It is useful if gym supports a recording method which does not involve subprocess.Popen.

When monitoring is enabled, ImageEncoder execute recoding in subprocess with Popen.

Popen requests OS an extra memory as much as Python is currently eating up. (See this StackOverflow answer for more detailed explanation.)

This can be troublesome when running memory-intensive training process (like deep learning).
When training cosumes more than 50 % of physical memory*, then Popen throws OSError: [Errno 12] Cannot allocate memory.

* To avoid this, one can create swap file, by which virtual memory size is increased.

Fix all instances of tobytes vs tostring

File "MsPacman.py", line 4, in <module>
env.render()
File "/Library/Python/2.7/site-packages/gym/core.py", line 146, in render
return self._render(mode=mode, close=close)
File "/Library/Python/2.7/site-packages/gym/envs/atari/atari_env.py", line 95, in _render
self.viewer.imshow(img)
File "/Library/Python/2.7/site-packages/gym/envs/classic_control/rendering.py", line 287, in imshow
image = pyglet.image.ImageData(self.width, self.height, 'RGB', arr.tobytes(), pitch=self.width * -3)
AttributeError: 'numpy.ndarray' object has no attribute 'tobytes'

Windows support

It would be nice if you could add support for Windows.

Knowledge transfer for black box optimization environment

I have a suggestion of environment(s), where I can contribute.

Consider a certain distribution of black box optimization problems, e.g. model selection / parameter tuning for computer vision applications. When solving new problem from such distribution, one would like to exploit knowledge obtained from solving previous ones.

AFAIK with most popular approaches for black box optimization like Bayesian Optimization every new problem is solved independently and thus domain knowledge is not exploited. One can expect however that exploiting such knowledge would make a significant difference.

I can imagine solving this with episodic RL, where one episode is solving one random problem from distribution. The feedback to the agent is a feature description of the sample (e.g. description of the dataset) and reward (e.g. accuracy with predicted model parameters). Overall reward for the episode is maximum achieved score (accuracy).

I have a simple code in this repo which does generation of random datasets (completely / partially artificial) and compares model selection with BO, random and genetic algos. Generation of datasets code could be converted to environment(s), and simple optimization methods used as a baseline.

Does this sounds interesting to you? I think this might be specially well suited for model based reinforcement learning, as it tries to be more data efficient than other RL approaches, and as fitting model is a computationally expensive procedure.

gym.scoreboard.scoring.compute_graph_stats incorrect import

gym.scoreboard.scoring.compute_graph_stats has import scipy but requires import scipy.stats
Function also hard fails if provided input lists are empty

inline support for ipython notebooks

it would be awesome to see inline support for ipython notebooks, similar to how %matplotlib inline prevents the secondary python application windows from opening. it seems like an ideal use case for quickly iterating and sharing openai environments (although it is very easy to set up).

i've been unable to find any demos online of people using %pyglet inline, which is discouraging. would this need to be built out at the pyglet level?

Python 3.5 incompatibility: print without parentheses in code examples

I'm working through the example scripts from the docs using python 3.5. Most example code snippets break, however, because the print statements are not yet functions with parentheses. It would be great if someone could make the print 'string' statements into print('string') statements in the docs. Thanks!

Make tips more prominent; also make score easy to access

From chat:

@gdb I vote for adding helpful tips like video_callable=lambda count: False somewhere prominent. I found it very here https://github.com/openai/gym/blob/master/gym/monitoring/monitor.py#L73 (after knowing to look for it from having read about it earlier in chat here).
Is there a way that I can easily print the performance of my agent using the monitor without? I've briefly tried inspecting the tmp file. That would be a nice thing to add somewhere to the intro docs too. I imagine folks will want to iterate quickly on their code and do this all the time.
episode_rewards in the tmp file looks the right thing. I guess I could track this myself. Nice to give folks a way to print this out out-of-the-box, though.

Add Hex

Since this as also a connection game, but a simpler than Go, yet hard enough on bigger boards (hard enough so no efficient algorithm exists (up to my knowledge) for boards on 11+ size.

MoJoCo env could not be setup

import gym
env=gym.make('Humanoid-v0')
[2016-05-07 00:16:38,716] Making new env: Humanoid-v0
Traceback (most recent call last):
File "", line 1, in
File "gym/envs/registration.py", line 76, in make
return spec.make()
File "gym/envs/registration.py", line 51, in make
cls = load(self._entry_point)
File "gym/envs/registration.py", line 13, in load
result = entry_point.load(False)
File "/usr/local/anaconda/lib/python2.7/site-packages/setuptools-20.7.0-py2.7.egg/pkg_resources/init.py", line 2229, in load
File "/usr/local/anaconda/lib/python2.7/site-packages/setuptools-20.7.0-py2.7.egg/pkg_resources/init.py", line 2235, in resolve
ImportError: No module named None

Python 3.5 incompatibility: no attribute 'itervalues'

Hi there,

I was trying to run a code snippet from the docs when I ran into what seems to be a compatibility issue. For completeness, here's the snippet:

import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in xrange(1000):
    env.render()
    env.step(env.action_space.sample()) # take a random action

And here's the error I'm getting.

$ /opt/local/bin/python3.5 demo.py 
Traceback (most recent call last):
  File "demo.py", line 1, in <module>
    import gym
  File "/Users/bart/Documents/Programming/Python/gym/gym/__init__.py", line 47, in <module>
    from gym.scoreboard.api import upload
  File "/Users/bart/Documents/Programming/Python/gym/gym/scoreboard/__init__.py", line 562, in <module>
    registry.finalize()
  File "/Users/bart/Documents/Programming/Python/gym/gym/scoreboard/registration.py", line 41, in finalize
    registered_ids = set(env_id for group in self.groups.itervalues() for env_id in group['envs'])
AttributeError: 'collections.OrderedDict' object has no attribute 'itervalues'

As Python 3 support is experimental and you requested bug reports, I thought I'd let you know about this one.

Comment is not informative

Comment :

Action '[ 0.28445863 1.73202721 -1.63753301]' is not contained within action space 'Box(3,)'. HINT: Try using a value like '[-0.2986672 -0.03983531 0.62766094]' instead.

is not informative. Particularly part "try using value like '[-0.2986672 -0.03983531 0.62766094]'". Could you either remove this part of the comment, or describe what are the desired ranges.

Environment freeze with no reason

I think there is a bug in breakout environment that a game freeze (no new ball is generated) after a life is discounted (ex. from 4->3)

freeze example 1, freeze example 2, normal exaple

I don't know this happen to other environment because I'm only using breakout to train my model. I don't think this frequently occurs but while I testing my trained model, I found 2 freezing cases among 100 test cases. I only use env.step(action) when there is no terminal signal.

Do a freeze possibly occur when a specific key is entered in a specific state such as state right after dead of ball?

Support for SMDP

Hi,
Does this toolkit support semi-MDP or MDP reinforcement learning only?
I am currently experimenting with the Options framework, and I am building everything from scratch. I am trying to find a quick and well tested solution for this.
Cheers,
Omar

TensorFlow capitalization

Minor point, but the docs and website refer to "Tensorflow", but the capitalization should be "TensorFlow". :)

Box2D make environment error: RAND_LIMIT_swigconstant

Installed most recent gym and when running env = gym.make('LunarLander-v0') for the Box2D environments it's displaying the error:

[2016-05-10 19:13:27,417] Making new env: LunarLander-v0
Traceback (most recent call last):
  File "/home/jesse/AI/OpenAI/openai/Environments/CartPole-v0/CartPole-v0.py", line 58, in <module>
    main()
  File "/home/jesse/AI/OpenAI/openai/Environments/CartPole-v0/CartPole-v0.py", line 13, in main
    env = gym.make('LunarLander-v0')
  File "/home/jesse/AI/OpenAI/gym/gym/envs/registration.py", line 79, in make
    return spec.make()
  File "/home/jesse/AI/OpenAI/gym/gym/envs/registration.py", line 54, in make
    cls = load(self._entry_point)
  File "/home/jesse/AI/OpenAI/gym/gym/envs/registration.py", line 13, in load
    result = entry_point.load(False)
  File "/home/jesse/anaconda3/envs/openai/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2229, in load
    return self.resolve()
  File "/home/jesse/anaconda3/envs/openai/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2235, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/jesse/AI/OpenAI/gym/gym/envs/box2d/__init__.py", line 1, in <module>
    from gym.envs.box2d.lunar_lander import LunarLander
  File "/home/jesse/AI/OpenAI/gym/gym/envs/box2d/lunar_lander.py", line 4, in <module>
    import Box2D
  File "/home/jesse/anaconda3/envs/openai/lib/python2.7/site-packages/Box2D/__init__.py", line 20, in <module>
    from .Box2D import *
  File "/home/jesse/anaconda3/envs/openai/lib/python2.7/site-packages/Box2D/Box2D.py", line 435, in <module>
    _Box2D.RAND_LIMIT_swigconstant(_Box2D)
AttributeError: 'module' object has no attribute 'RAND_LIMIT_swigconstant'

_Box2D is being loaded correctly as I can modify the Box2D.py file to display _Box2D.RAND_LIMIT before the _Box2D.RAND_LIMIT_swigconstant(_Box2D) call. Seems to be affecting all _swig_constant calls (commenting out the offending line then throws and error on _Box2D.b2_pi_swigconstant(_Box2D). Does another library need to be installed to get this to work?

Python3 Incompability for frozen_lake.py

Running gym with python3.5, task FrozenLake-v0, episodes never terminates even when the agent reaches 'H' or 'G'.

It seems there's some problem with python3 str/bytes.
For example, frozen_lake.py line 71:
isd = np.array(desc == 'S').astype('float64').ravel()
Comparison desc == 'S' will return a big pure python False, rather than a numpy array of booleans. Changing it to desc == b'S' works.
Lines 103, 104, 110, 111 also needs modification:

done = bytes(newletter).decode() in 'GH'
rew = float(newletter == b'G')

Build error by running "pip install -e .[all]"

I got some error msg like:
Failed building wheel for pachi-py
and
ld: targeted OS version does not support use of thread local variables in _fast_srandom for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) error: command 'g++' failed with exit status 1

I do have cmake installed:
cmake version 3.5.1

pip version is 8.0.2

OS version is Mac OS X 10.9.5

Configure CI for Windows and OSX

Our code is developed primarily on OSX, so it works well there. It's not developed or tested on Windows; so while it mostly works there today, we can't guarantee it'll keep working over time.

I would love if someone in the community wanted to configure Windows and OSX CI, so we can be sure we support all of these platforms:

Travis CI supports OSX; we should just use them.
AppVeyor seems like a plausible Windows CI service, though I haven't used them.

Support for other languages

I see that you plan to support other languages soon. So I was wondering if there's a plan to support Torch in the near future.

Cartpole observations can occur outside of observation space limits

Cartpole is set to fail if the angle of the pole is greater than 0.21, however this value is also used as the observation space limit. This means that the final observation of any episode where cartpole fails from having an angle too high will be outside the observation_space bounds.

Simple fix is just to set the observation space limit for that value to be twice the fail point. Will have a pull request up shortly

video_recorder trying to load StringIO with Python 3.5

I am using Python 3.5.1 with virtualenvwrapper and a minimal gym installation via:

pip install gym

You can reproduce this issue with the basic CartPole example in introduction:

import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
    env.render()
    env.step(env.action_space.sample()) # take a random action

Running this script results in the following error

  File "cartpole.py", line 1, in <module>
    import gym
  File "/Users/ozkanm/.virtualenvs/gym/lib/python3.5/site-packages/gym/__init__.py", line 4, in <module>
    from gym.core import Env, Space
  File "/Users/ozkanm/.virtualenvs/gym/lib/python3.5/site-packages/gym/core.py", line 4, in <module>
    from gym import error, monitoring
  File "/Users/ozkanm/.virtualenvs/gym/lib/python3.5/site-packages/gym/monitoring/__init__.py", line 1, in <module>
    from gym.monitoring.monitor import Monitor, load_results, monitors as _monitors
  File "/Users/ozkanm/.virtualenvs/gym/lib/python3.5/site-packages/gym/monitoring/monitor.py", line 12, in <module>
    from gym.monitoring import stats_recorder, video_recorder
  File "/Users/ozkanm/.virtualenvs/gym/lib/python3.5/site-packages/gym/monitoring/video_recorder.py", line 9, in <module>
    import StringIO
ImportError: No module named 'StringIO'

env.render() crashes python

Hey,

I am just trying out the most basic tutorial, cart-pole-v0. When I render the video it plays and then crashes afterward (every time). Everything else seems to work fine, ie without rendering. Dont know if this is a problem on my end or not, I am using anaconda with python 3.5.

Inverted double pendulum state vector meaning?

What does each state dimension represent on the inverted double pendulum environment? I was expecting to get the cart position/velocity plus the angles/angular velocities of each joint. That would give 6 state variables, but the state vector returned by gym has 11 dimensions.

video rendering doesn't work on windows

Hello,

I am able to run the gym on my windows machine, for the most part. However, despite ffmpeg being installed and added to the PATH (and detected), I get the following error when trying to render video:

Clearing 2 monitor files from previous run (because force=True was provided) [2016-04-27 16:31:49,407] Starting new video recorder writing to C:\tmp\random-agent-results\openaigym.video.0.17236.video000000.mp4 /dev/stdin: No such file or directory Traceback (most recent call last): File "Gym.py", line 76, in <module> ob = env.reset() File "C:\Users\BlackBox\Anaconda3\envs\py27\lib\site-packages\gym-0.0.3-py2.7.egg\gym\core.py", line 87, in reset self.monitor._after_reset(observation) File "C:\Users\BlackBox\Anaconda3\envs\py27\lib\site-packages\gym-0.0.3-py2.7.egg\gym\monitoring\monitor.py", line 237, in _after_reset self.video_recorder.capture_frame() File "C:\Users\BlackBox\Anaconda3\envs\py27\lib\site-packages\gym-0.0.3-py2.7.egg\gym\monitoring\video_recorder.py", line 111, in capture_frame self._encode_image_frame(frame) File "C:\Users\BlackBox\Anaconda3\envs\py27\lib\site-packages\gym-0.0.3-py2.7.egg\gym\monitoring\video_recorder.py", line 161, in _encode_image_frame self.encoder.capture_frame(frame) File "C:\Users\BlackBox\Anaconda3\envs\py27\lib\site-packages\gym-0.0.3-py2.7.egg\gym\monitoring\video_recorder.py", line 284, in capture_frame self.proc.stdin.write(frame.tobytes()) IOError: [Errno 22] Invalid argument [2016-04-27 16:31:50,016] VideoRecorder encoder exited with status 1

When not using video, and explicitly calling env.render(), it works fine.

It seems to be a Linux-specific thing? Is there a quick way to fix this to get it to run on windows?

Instructions on developping new game environment

May I ask if you can provide a general instructions on developing new game environment?

I want to train an agent for a racing game, but the games provided by default does not contain the game I want. Therefore I am considering to develop a new game environment for it. A general instructions from you on this will help a lot for me.

Best,
Bin

Recording video on a server

Greetings!

I am trying to run this on a server (accessed via ssh)

save_path = '/tmp/whatever'

subm_env = gym.make(GAME_TITLE)

subm_env.monitor.start(save_path,force=True)

observation = subm_env.reset()

action = 0

observation, reward, done, info = subm_env.step(action)

And it returns this error

[2016-04-30 22:53:30,482] Making new env: SpaceInvaders-v0
[2016-04-30 22:53:30,512] Creating monitor directory /tmp/whatever
[2016-04-30 22:53:30,527] Starting new video recorder writing to /tmp/whatever/openaigym.video.13.18369.video000000.mp4

---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-42-16c8f8e17e6b> in <module>()
     10 action = 0
     11 
---> 12 observation, reward, done, info = subm_env.step(action)

/home/jheuristic/yozhik/gym/gym/core.pyc in step(self, action)
     80         self.monitor._before_step(action)
     81         observation, reward, done, info = self._step(action)
---> 82         done = self.monitor._after_step(observation, reward, done, info)
     83         return observation, reward, done, info
     84 

/home/jheuristic/yozhik/gym/gym/monitoring/monitor.pyc in _after_step(self, observation, reward, done, info)
    209         self.stats_recorder.after_step(observation, reward, done, info)
    210         # Record video
--> 211         self.video_recorder.capture_frame()
    212 
    213         return done

/home/jheuristic/yozhik/gym/gym/monitoring/video_recorder.pyc in capture_frame(self)
    109                 self._encode_ansi_frame(frame)
    110             else:
--> 111                 self._encode_image_frame(frame)
    112 
    113     def close(self):

/home/jheuristic/yozhik/gym/gym/monitoring/video_recorder.pyc in _encode_image_frame(self, frame)
    159 
    160         try:
--> 161             self.encoder.capture_frame(frame)
    162         except error.InvalidFrame as e:
    163             logger.warn('Tried to pass invalid video frame, marking as broken: %s', e)

/home/jheuristic/yozhik/gym/gym/monitoring/video_recorder.pyc in capture_frame(self, frame)
    282             raise error.InvalidFrame("Your frame has data type {}, but we require uint8 (i.e. RGB values from 0-255).".format(frame.dtype))
    283 
--> 284         self.proc.stdin.write(frame.tobytes())
    285 
    286     def close(self):

IOError: [Errno 32] Broken pipe

Is there any way to record video in my setup?

p.s. the environment is Atari Space Invaders and the code is here

p.c. on a previous recently closed issue - any comments on how to make tutorial code more readable? (since that was the purpose of the issue in the first place)

Error try to run monitoring

When trying to run following code on Ubuntu MATE 14.04 LTS on intel i7:

import gym
env = gym.make("CartPole-v0")
env.monitor.start("/tmp/cartpole-experiment-1")
for i_episode in xrange(20):
    observation = env.reset()
    for t in xrange(100):
        env.render()
        print observation
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print "Episode finished after {} timesteps".format(t+1)
            break

env.monitor.close()

I get following error:

[2016-04-28 10:59:12,799] Making new env: CartPole-v0
[2016-04-28 10:59:12,800] Creating monitor directory /tmp/cartpole-experiment-1
[2016-04-28 10:59:12,801] Starting new video recorder writing to /tmp/cartpole-experiment-1/openaigym.video.0.8846.video000000.mp4
avconv version 9.18-6:9.18-0ubuntu0.14.04.1, Copyright (c) 2000-2014 the Libav developers
  built on Mar 16 2015 13:19:10 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1)
Traceback (most recent call last):
  File "temp.py", line 5, in <module>
    observation = env.reset()
  File "/usr/local/lib/python2.7/dist-packages/gym/core.py", line 87, in reset
    self.monitor._after_reset(observation)
  File "/usr/local/lib/python2.7/dist-packages/gym/monitoring/monitor.py", line 237, in _after_reset
    self.video_recorder.capture_frame()
  File "/usr/local/lib/python2.7/dist-packages/gym/monitoring/video_recorder.py", line 111, in capture_frame
    self._encode_image_frame(frame)
  File "/usr/local/lib/python2.7/dist-packages/gym/monitoring/video_recorder.py", line 161, in _encode_image_frame
    self.encoder.capture_frame(frame)
  File "/usr/local/lib/python2.7/dist-packages/gym/monitoring/video_recorder.py", line 284, in capture_frame
    self.proc.stdin.write(frame.tobytes())
AttributeError: 'numpy.ndarray' object has no attribute 'tobytes'
[2016-04-28 10:59:13,196] Finished writing results. You can upload them to the scoreboard via gym.upload('/tmp/cartpole-experiment-1')

Clearly the frame returned by encoder.capture_frame() is storing returning ndarray in the frame. How can I solve this?
PS: Without monitor, code runs perfectly.

Machine learning to build/simulate environments

Firstly, I like to thank everyone involved with this project - I believe this has potential to really speed up machine/reinforcement learning progress.

As far as I can tell, this project focuses predominately on experimenting with RL algorithms for achieving a goal in an explicitly defined environment (e.g. the Cartpole environment is hand-written based on physics). I was wondering if there are plans to investigate machine learning algorithms that learn how to best model an environment and agent behaviour within it? I haven't found of such methods being implemented previously, however I feel it is just as an important aspect for progressing machine learning/robots in real world (physical) applications. For example, biology is extremely complicated and I imagine we don't have a good set of equations describing how cancer grows; thus we can't learn how to beat it if we can't model it in the first place.

I hope I'm making sense, and any information of this topic would be greatly appreciated. Not sure if a GitHub issue is the best place to raise this question, but hopefully it can encourage discussion between as wide an audience as possible.

(On a related note, are their any plans to implement more "worthwhile" environments? I agree with the need for generality regarding any designed algorithms, and using simpler environments as a testing platform is useful for quick experimentation, however would it be worthwhile to e.g. implement the equations of nuclear fusion (if this hasn't already been done within another simulation platform) to experiment how to make it more stable? Thanks again for any help in advance!)

[working deep RL demo][need help]Lasagne+Agentnet baselines

Greetings!
We happen to have just pushed into the open source one of the Lasagne-based library for reinforcement learning algorithm design.

The repo's here: AgentNet
A step-by-step demo for Atari SpaceInvaders
Neural Kung-Fu with GRU and actor-critic

On the bright side,

it is capable of sustaining virtually any custom RL (and even non-RL) architecture using minimalistic Lasagne network design
it has most of the generic reinforcement learning algorithms (Q-learning, K-step algos, SARSA, Advantage Actor-Critic)
It is also capable of interacting with any external environment with a simple wrapper.

On the gloomy one, it has been made public ~4 days ago and doesn't have a community yet. Prior to that, it's only been used by several Yandex researchers for tinkering.

I would very much like to provide a set of baseline training/testing stands for several problems (and i will do so shortly), for people to be able to experiment with NN architecture, but i'm a bit doubtful about

whether the interface of the library doesn't have any flaws worth immediate fixing
whether there is someone interested in tinkering with openai baselines, provided they're readable and performing reasonably.

The most basic Reinforcement Learning pipeline looks like this

The questions are, again, if there is anyone interested in having such baselines for gym problems, and if so, what are possible api improvements you would recommend?

Documentation improvement

core.py states:
reward (float) : amount of reward due to the previous action
but as I understand it, the reward may not necessarily be due to the previous action only.

More discreet environments (Box2D)

Hello,

I've found very few discreet environments to play with. It's MountainCar and CartPole, that's all, if you don't count Atari. All MuJoCo games have continuous actions.

I happen to have a few Box2D reinforcement learning environments, that I want to port. I think it's a good idea because Box2D is open-source, easy to install (pip install box2d) like the rest of gym. Should be a good start for new people.

CartPole swing up, physics, friction, more reward if kept in center (3 actions).
MoonLander (6-wide state, 4 actions).
PipedalWalker (16-wide state, 4 actuated joints, 3_3_3*3=81 actions).

Greg, John please tell me if you don't need it and not going to merge it, I'll stop.

API into monitor to obtain your current score

can not pip install -e '.[all]'

I don't know what happens, pip install -e . is okay, but when I execute pip install '.[all]', it says:
Exception:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 209, in main
status = self.run(options, args)
File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 305, in run
wheel_cache
File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 280, in populate_requirement_set
wheel_cache=wheel_cache
File "/usr/lib/python2.7/dist-packages/pip/req/req_install.py", line 136, in from_editable
editable_req, default_vcs)
File "/usr/lib/python2.7/dist-packages/pip/req/req_install.py", line 1146, in parse_editable
'placeholder' + extras
File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/init.py", line 2833, in parse
req, = parse_requirements(s)
File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/init.py", line 2781, in parse_requirements
yield Requirement(line)
File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/init.py", line 2790, in init
raise RequirementParseError(str(e))
RequirementParseError: Invalid requirement, parse error at "'__placeh'"

python3

It would be great if gym could support python3 for future proofing and supporting transition efforts.

Reproducibility: default random number generator

The default random number generator of numpy seems to be used throughout the code (grep np.random). For example in: mountain_car.py, line 51.

It would be preferable to give the user the ability to explicitly pass a random number generator around. This is the approach adopted in scikit-learn for example (see random_state in sklearn.linear_model.RandomizedLogisticRegression).

occasional exception when resetting the environment

Symptom is an exception while resetting the environment during monitoring:

File "/Users/julie/github/gym/gym/core.py", line 102, in reset
    self.monitor._after_reset(observation)
  File "/Users/julie/github/gym/gym/monitoring/monitor.py", line 250, in _after_reset
    self.video_recorder.capture_frame()
  File "/Users/julie/github/gym/gym/monitoring/video_recorder.py", line 105, in capture_frame
    frame = self.env.render(mode=render_mode)
  File "/Users/julie/github/gym/gym/core.py", line 153, in render
    return self._render(mode=mode, close=close)
  File "/Users/julie/github/gym/gym/envs/classic_control/cartpole.py", line 132, in _render
    return self.viewer.get_array()
  File "/Users/julie/github/gym/gym/envs/classic_control/rendering.py", line 102, in get_array
    arr = arr.reshape(self.height, self.width, 4)
ValueError: total size of new array must be unchanged

Unfortunately I don't have exact steps to reproduce as it seems to happen sporadically when using the CartPole-V0 environment (I haven't tried others). Will give more details as they are uncovered.

If it helps, information about my python environment is attached.

pythonenv.txt

Allow arbitrary size of graphics display

Some of the games (e.g., MsPacman, MontezumaRevenge) are rendered in very small windows which can't be changed - it would make sense to allow a size parameter when rendering to allow for the graphics to possibly be upscaled (even at the cost of pixelating) for presentation / demonstrative purposes.

ipython render window can't be closed

If you attempt to create a notebook with the first CartPole example, the code runs but the rendered window cannot be closed:

Neither the standard x, nor ctrl-c, nor terminating the kernel through the notebook UI cause the window to close. If you kill the parent ipython process from the cmd line, that will kill all the child windows as expected.

System: Ubuntu 14.04 64.

Pretty cool this works at all though!

Python 3.5: TypeError, "is not JSON serializable"

While trying to run examples/agents/random_agent.py from 7b91967 with Python 3.5.1, I get:

Traceback (most recent call last):
  File "examples/agents/random_agent.py", line 36, in <module>
    ob = env.reset()
  File "/home/pierre-luc/anaconda3/lib/python3.5/site-packages/gym-0.0.7-py3.5.egg/gym/core.py", line 96, in reset
    self.monitor._after_reset(observation)
  File "/home/pierre-luc/anaconda3/lib/python3.5/site-packages/gym-0.0.7-py3.5.egg/gym/monitoring/monitor.py", line 228, in _after_reset
    self._close_video_recorder()
  File "/home/pierre-luc/anaconda3/lib/python3.5/site-packages/gym-0.0.7-py3.5.egg/gym/monitoring/monitor.py", line 243, in _close_video_recorder
    self.video_recorder.close()
  File "/home/pierre-luc/anaconda3/lib/python3.5/site-packages/gym-0.0.7-py3.5.egg/gym/monitoring/video_recorder.py", line 144, in close
    self.write_metadata()
  File "/home/pierre-luc/anaconda3/lib/python3.5/site-packages/gym-0.0.7-py3.5.egg/gym/monitoring/video_recorder.py", line 150, in write_metadata
    json.dump(self.metadata, f)
  File "/home/pierre-luc/anaconda3/lib/python3.5/json/__init__.py", line 178, in dump
    for chunk in iterable:
  File "/home/pierre-luc/anaconda3/lib/python3.5/json/encoder.py", line 429, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/home/pierre-luc/anaconda3/lib/python3.5/json/encoder.py", line 403, in _iterencode_dict
    yield from chunks
  File "/home/pierre-luc/anaconda3/lib/python3.5/json/encoder.py", line 403, in _iterencode_dict
    yield from chunks
  File "/home/pierre-luc/anaconda3/lib/python3.5/json/encoder.py", line 436, in _iterencode
    o = _default(o)
  File "/home/pierre-luc/anaconda3/lib/python3.5/json/encoder.py", line 180, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'avconv 9.18-6:9.18-0ubuntu0.14.04.1\nlibavutil     52.  3. 0 / 52.  3. 0\nlibavcodec    54. 35. 0 / 54. 35. 0\nlibavformat   54. 20. 4 / 54. 20. 4\nlibavdevice   53.  2. 0 / 53.  2. 0\nlibavfi
lter    3.  3. 0 /  3.  3. 0\nlibavresample  1.  0. 1 /  1.  0. 1\nlibswscale     2.  1. 1 /  2.  1. 1\n' is not JSON serializable

The error is due to the fact that subprocess.check_output on line 257 of gym/monitoring/video_recording.py returns a bytes string but json.dump expects a str type.

I fixed the problem on my system with str() on the output of check_output:

@property
def version_info(self):
    return {'backend':self.backend,'version': str(subprocess.check_output([self.backend, '-version'])),'cmdline':self.cmdline}

There is a probably a better way to fix this problem portably using six.

Env.step() with no action

Intuitively, an algorithm should at any point also have the choice of not acting at all.

At least in the example of control/physics environments, being able to not act might sometimes be the "best" choice. Defining "best" here can be a long discussion, but as an example an algorithm might be optimizing for efficiency (reward/effort), rather than absolute reward.

Cartpole_v0 reward cannot drop below 1

I allowed the pole to continue swinging all over, upside down, sideways, and reward is still 1 all the way!

Box2d won't find some RAND_LIMIT_swigconstant

Hello!

It's probably some silly mistake on my side, but i wasn't able to fix by random lever pulling, as usual.

Installing Box2d as in instuctions (using pip install -e .[all]) will throw error when trying to use some of Box2D examples.

Code that reproduces the issue:

import gym
atari = gym.make('LunarLander-v0')
atari.reset()

[2016-05-16 02:14:25,430] Making new env: LunarLander-v0

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-f89e78f4410b> in <module>()
      1 import gym
----> 2 atari = gym.make('LunarLander-v0')
      3 atari.reset()
      4 #plt.imshow(atari.render('rgb_array'))

/home/jheuristic/yozhik/gym/gym/envs/registration.pyc in make(self, id)
     77         logger.info('Making new env: %s', id)
     78         spec = self.spec(id)
---> 79         return spec.make()
     80 
     81     def all(self):

/home/jheuristic/yozhik/gym/gym/envs/registration.pyc in make(self)
     52             raise error.Error('Attempting to make deprecated env {}. (HINT: is there a newer registered version of this env?)'.format(self.id))
     53 
---> 54         cls = load(self._entry_point)
     55         env = cls(**self._kwargs)
     56 

/home/jheuristic/yozhik/gym/gym/envs/registration.pyc in load(name)
     11 def load(name):
     12     entry_point = pkg_resources.EntryPoint.parse('x={}'.format(name))
---> 13     result = entry_point.load(False)
     14     return result
     15 

/home/jheuristic/thenv/local/lib/python2.7/site-packages/pkg_resources/__init__.pyc in load(self, require, *args, **kwargs)
   2378         if require:
   2379             self.require(*args, **kwargs)
-> 2380         return self.resolve()
   2381 
   2382     def resolve(self):

/home/jheuristic/thenv/local/lib/python2.7/site-packages/pkg_resources/__init__.pyc in resolve(self)
   2384         Resolve the entry point from its module and attrs.
   2385         """
-> 2386         module = __import__(self.module_name, fromlist=['__name__'], level=0)
   2387         try:
   2388             return functools.reduce(getattr, self.attrs, module)

/home/jheuristic/yozhik/gym/gym/envs/box2d/__init__.py in <module>()
----> 1 from gym.envs.box2d.lunar_lander import LunarLander
      2 from gym.envs.box2d.bipedal_walker import BipedalWalker, BipedalWalkerHardcore

/home/jheuristic/yozhik/gym/gym/envs/box2d/lunar_lander.py in <module>()
      3 from six.moves import xrange
      4 
----> 5 import Box2D
      6 from Box2D.b2 import (edgeShape, circleShape, fixtureDef, polygonShape, revoluteJointDef, contactListener)
      7 

/home/jheuristic/thenv/local/lib/python2.7/site-packages/Box2D/__init__.py in <module>()
     18 # 3. This notice may not be removed or altered from any source distribution.
     19 #
---> 20 from .Box2D import *
     21 __author__ = '$Date$'
     22 __version__ = '2.3.1'

/home/jheuristic/thenv/local/lib/python2.7/site-packages/Box2D/Box2D.py in <module>()
    433     return _Box2D.b2CheckPolygon(shape, additional_checks)
    434 
--> 435 _Box2D.RAND_LIMIT_swigconstant(_Box2D)
    436 RAND_LIMIT = _Box2D.RAND_LIMIT
    437 

AttributeError: 'module' object has no attribute 'RAND_LIMIT_swigconstant'

What didn't help:

pip uninstall gym
apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl
git clone https://github.com/openai/gym
cd gym
pip install -e .[all] --upgrade

The OS is Ubuntu 14.04 Server x64
It may be a clue that i am running the thing from inside python2 virtualenv (with all numpys, etc. installed)