This repository contains a number of environments and algorithms for exploration in RL, with a particular focus on model-based RL.
- Only considers continuous actions
- Only open source implementations (i.e. not
MuJoCo
)
pip install torch
pip install gym
pip install roboschool==1.0.48
pip install box2d-py
A continuous-action version of the mountain car problem. A reward of +1 is achieved when the car escapes the valley.
Used in:
- VIME
- MAX
- Parameter state noise for exploration
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- Surprise-based intrinsic motivation for deep RL
A pole that starts facing down. The aim is to swing the pole upright. A reward of +1 is achieved when cos(angle) > 0.8
Used in:
- VIME
- Parameter state noise for exploration
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- Surprise-based intrinsic motivation for deep RL
Yield a reward of +1 if agent reaches upright position (given some threshold). This is a continuous action version of Acrobot
.
Used in:
- VIME
- Parameter state noise for exploration
- Information Maximizing Exploration with a Latent Dynamics Model
- Implicit generative modelling for efficient exploration
A reward of +1 is achieved when the cheetah moves over five units in the x-axis
.
Note: Reward function not work as expected
Used in:
- VIME
- Parameter state noise for exploration
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- EMI: Exploration with Mutual Information
- Surprise-based intrinsic motivation for deep RL
A reward of +1 is achieved when the cheetah moves over ten units in the x-axis
This task implements a separate exploration phase in which no reward is provided. Exploration performance is then measured implicitly by measuring task performance in a downstream task. These tasks are running and flipping:
Used in:
Used in:
Navigate an ant through a U-shaped maze. Exploration performance is measured as the fraction of states visited. Currently only implemented in MuJoCo
Used in:
Requires discrete actions
Used in:
Currently only implemented in MuJoCo
- VIME
- Parameter state noise for exploration
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- EMI: Exploration with Mutual Information
Used in
- Receding Horizon Curiosity
- VIME: Variational Information Maximizing Exploration
- Model-based active exploration
- Curiosity-Driven Exploration by Self-Supervised Prediction
- Count-Based Exploration with Neural Density Models
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- Diversity is all your need
- Large-Scale Study of Curiosity-Driven Learning
- Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control
- Decomposition of Uncertainty for Active Learning and Reliable Reinforcement Learning in Stochastic Systems
- Self-Supervised Exploration via Disagreement
- SMiRL: Surprise Minimizing RL in Dynamic Environments
- A survey on intrinsic motivation in reinforcement learning
- InfoBot: Transfer and Exploration via the Information Bottleneck
- Approximate Bayesian inference in spatial environments
- EMI: Exploration with Mutual Information
- Learning latent state representation for speeding up exploration
- Exploration by uncertainty in reward space
- Unsupervised Exploration with Deep Model-Based Reinforcement Learning
- Information Maximizing Exploration with a Latent Dynamics Model
- Bayesian Curiosity for Efficient Exploration in Reinforcement Learning
- Surprise-based intrinsic motivation for deep RL
- Implicit generative modelling for efficient exploration
- rllab
In Parameter state noise for exploration, the authors demonstrate that Hopper
and Walker2d
do not require exploration due to well-shaped rewards, but that half-cheetah does (due to convergence to local minima, namely, flipping on back and wiggling).
In Large-Scale Study of Curiosity-Driven Learning, the authors show naive exploration can solve MountainCar, CartPole, LunarLander and Acrobot: