Exploration baselines

This repository contains a number of environments and algorithms for exploration in RL, with a particular focus on model-based RL.

Constraints

Only considers continuous actions
Only open source implementations (i.e. not MuJoCo)

Requirements

pip install torch
pip install gym
pip install roboschool==1.0.48
pip install box2d-py

Environments

SparseMountainCar

A continuous-action version of the mountain car problem. A reward of +1 is achieved when the car escapes the valley.

Used in:

CartpoleSwingup

A pole that starts facing down. The aim is to swing the pole upright. A reward of +1 is achieved when cos(angle) > 0.8

Used in:

SparseDoublePendulum

Yield a reward of +1 if agent reaches upright position (given some threshold). This is a continuous action version of Acrobot.

Used in:

SparseHalfCheetah

A reward of +1 is achieved when the cheetah moves over five units in the x-axis.

Note: Reward function not work as expected

Used in:

SparseBipedalWalker

A reward of +1 is achieved when the cheetah moves over ten units in the x-axis

Future

Downstream HalfCheetah

This task implements a separate exploration phase in which no reward is provided. Exploration performance is then measured implicitly by measuring task performance in a downstream task. These tasks are running and flipping:

Used in:

Pusher task

Used in:

Learning latent state representation for speeding up exploration

Ant Maze

Navigate an ant through a U-shaped maze. Exploration performance is measured as the fraction of states visited. Currently only implemented in MuJoCo

Used in:

Sparse VizDoom

Requires discrete actions

Used in:

Curiosity-Driven Exploration by Self-Supervised Prediction

Swimmer Gather

Currently only implemented in MuJoCo

PyBox2D Maze

Used in

Approximate Bayesian inference in spatial environments

References

Notes

In Parameter state noise for exploration, the authors demonstrate that Hopper and Walker2d do not require exploration due to well-shaped rewards, but that half-cheetah does (due to convergence to local minima, namely, flipping on back and wiggling).

In Large-Scale Study of Curiosity-Driven Learning, the authors show naive exploration can solve MountainCar, CartPole, LunarLander and Acrobot:

berenmillidge / exploration-baselines Goto Github PK