Giter Club home page Giter Club logo

mrl's Introduction

mrl: modular RL

This is a modular RL code base for research. The intent is to enable surgical modifications by designing the base agent as a list of modules that all live inside the agent's global namespace (so they can all access each other directly by name). This means we can change the algorithm of a complex hierarchical, multi-goal, intrinsically motivated, etc. agent from DDPG to SAC by simply changing the algorithm module (and adding the additional critic network). Similarly, to add something like a forward model, intrinsic motivation, landmark generation, a new HER strategy, etc., you only need to create/modify the relevant module(s).

The agent has life-cycle hooks that the modules "hook" into. The important ones are: _setup (called after all modules are set but before any environment interactions), _process_experience (called with each new experience), _optimize (called at each optimization step), save/load (called upon saving / loading the agent).

See comments in mrl/agent_base.py, brief test scripts in tests, and example TD3/SAC Mujoco agents in experiments/benchmarks/train_online_agent.py.

The modular structure is technically framework agnostic, so could be used with either Pytorch or TF-based modules, or even a mix, but right now all modules that need a framework use Pytorch.

Train loop is easily customized, so that you can do, e.g., BatchRL, transfer, or meta RL with minimal modifications.

Environment parallelization is done via VecEnv, and we rely on GPU for optimization parallelization. Future work should consider how they can be done asynchronously; e.g., using Ray.

Performance Benchmarks

mrl provides state of the art implementations of SAC, TD3, and DDPG+HER. See the Mujoco and Multi-goal benchmarks.

Installation

There is a requirements.txt that was works with venv:

python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Then pip install the appropriate version of Pytorch by following the instructions here: https://pytorch.org/get-started/locally/.

To run Mujoco environments you need to have the Mujoco binaries and a license key. Follow the instructions here.

To test run:

pytest tests
PYTHONPATH=./ python experiments/mega/train_mega.py --env FetchReach-v1 --layers 256 256 --max_steps 5000

The first command should have 3/3 success. The second command should solve the environment in <1 minute (better than -5 avg test reward).

Usage

To understand how the code works, read mrl/agent_base.py.

See tests/test_agent_sac.py and experiments/benchmarks for example usage. The basic outline is as follows:

  1. Construct a config object that contains all the agent hyperparameters and modules. There are some existing base configs / convenience methods for creating default SAC/TD3/DDPG agents (see, e.g., the benchmarks code). If you use argparse you can use a config object automatically populate the parser using parser = add_config_args(parser, config).
  2. Call mrl.config_to_agent on the config to get back an agent.
  3. Use the agent however you want; e.g., call its train/eval methods, save/load, module methods, and so on.

To add functionality or a new algorithm, you generally just need to define a one or more modules that hook into the agent's lifecycle methods and add them to the config. They automatically hook into the agent's lifecycle methods, so the rest of the code can stay the same.

Implemented / Outstanding

Implemented:

Some todos:

  • Distributional predictions
  • Uncertainty predictions
  • Hierarchical RL
  • Support for goal-based intrinsic motivation in general environments
  • DQN variants

Papers using this Repository

Below is a list of papers that use mrl. If you use mrl in one of your papers please let us know and we can add you to the list. If you build on the experiments related to the below papers, please cite the original papers:

  • Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning (ICML 2020 (15 minute presentation), Arxiv, ALA 2020 Best Paper (25 minute presentation)
  • ProtoGE: Prototype Goal Encodings for Multi-goal Reinforcement Learning (RLDM 2019, pdf) [As of July 2020, this is still far and away the state-of-the-art on Gym's Fetch environments]
  • Counterfactual Data Augmentation using Locally Factored Dynamics (Preprint, Arxiv)

Citing this Repository

If you use or extend this codebase in your work, please consider citing:

@misc{mrl,
  author = {Pitis, Silviu and Chan, Harris and Zhao, Stephen},
  title = {mrl: modular RL},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/spitis/mrl}},
}

References

This code has used parts of the following repositories:

Contributors

Silviu Pitis (spitis), Harris Chan (takonan), Stephen Zhao (Silent-Zebra)

mrl's People

Contributors

spitis avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.