Giter Club home page Giter Club logo

marllib's Introduction

Multi-Agent RLlib (MARLlib) is a MARL benchmark based on Ray and one of its toolkits RLlib. It provides MARL research community a unified platform for developing and evaluating the new ideas in various multi-agent environments. There are four core features of MARLlib.

  • it collects most of the existing MARL algorithms that are widely acknowledged by the community and unifies them under one framework.
  • it gives a solution that enables different multi-agent environments using the same interface to interact with the agents.
  • it guarantees great efficiency in both the training and sampling process.
  • it provides trained results including learning curves and pretrained models specific to each task and each algorithm's combination, with finetuned hyper-parameters to guarantee credibility.

Project Website: https://sites.google.com/view/marllib/home


The README is organized as follows:

Part I. Overview

We collected most of the existing multi-agent environment and multi-agent reinforcement learning algorithms and unify them under one framework based on Ray 's RLlib to boost the MARL research.

The common MARL baselines include independence learning (IQL, A2C, DDPG, TRPO, PPO), centralized critic learning (COMA, MADDPG, MAPPO, HATRPO), and value decomposition (QMIX, VDN, FACMAC, VDA2C) are all implemented.

The popular MARL environments like SMAC, MaMujoco, Google Research Football are all provided with a unified interface.

The algorithm code and environment code are fully separated. Changing the environment needs no modification on the algorithm side and vice versa.

Here we provide a table for comparison of MARLlib and before benchmarks.

Benchmark Github Stars Learning Mode Available Env Algorithm Type Algorithm Number Continues Control Asynchronous Interact Distributed Training Framework Last Update
PyMARL GitHub stars CP 1 VD 5 * GitHub last commit
PyMARL2 GitHub stars CP 1 VD 12 PyMARL GitHub last commit
MARL-Algorithms GitHub stars CP 1 VD+Comm 9 * GitHub last commit
EPyMARL GitHub stars CP 4 IL+VD+CC 10 PyMARL GitHub last commit
Marlbenchmark GitHub stars CP+CL 4 VD+CC 5 ✔️ pytorch-a2c-ppo-acktr-gail GitHub last commit
MARLlib GitHub stars CP+CL+CM+MI 10 IL+VD+CC 18 ✔️ ✔️ ✔️ Ray/RLlib GitHub last commit

CP, CL, CM, and MI represent for cooperative, collaborative, competitive, and mixed task learning mode respectively. IL, VD, and CC represent for independent learning, value decomposition, and centralized critic categorization. Comm represents communication-based learning. Asterisk denotes that the benchmark uses its framework.

The tutorial of RLlib can be found at this link. Fast examples can be found at this link. These will help you easily dive into RLlib.

We hope everyone interested in MARL can be benefited from MARLlib.

Part II. Environment

Supported Multi-agent Environments / Tasks

Most of the popular environment in MARL research has been incorporated in this benchmark:

Env Name Learning Mode Observability Action Space Observations
LBF Mixed Both Discrete Discrete
RWARE Collaborative Partial Discrete Discrete
MPE Mixed Both Both Continuous
SMAC Cooperative Partial Discrete Continuous
MetaDrive Collaborative Partial Continuous Continuous
MAgent Mixed Partial Discrete Discrete
Pommerman Mixed Both Discrete Discrete
MaMujoco Cooperative Partial Continuous Continuous
GRF Collaborative Full Discrete Continuous
Hanabi Cooperative Partial Discrete Discrete

Each environment has a readme file, standing as the instruction for this task, talking about env settings, installation, and some important notes.

Part III. Algorithm

We provide three types of MARL algorithms as our baselines including:

Independent Learning: IQL DDPG PG A2C TRPO PPO

Centralized Critic: COMA MADDPG MAAC MAPPO MATRPO HATRPO HAPPO

Value Decomposition: VDN QMIX FACMAC VDAC VDPPO

Here is a chart describing the characteristics of each algorithm:

Algorithm Support Task Mode Need Global State Action Learning Mode Type
IQL Mixed No Discrete Independent Learning Off Policy
PG Mixed No Both Independent Learning On Policy
A2C Mixed No Both Independent Learning On Policy
DDPG Mixed No Continuous Independent Learning Off Policy
TRPO Mixed No Both Independent Learning On Policy
PPO Mixed No Both Independent Learning On Policy
COMA Mixed Yes Both Centralized Critic On Policy
MADDPG Mixed Yes Continuous Centralized Critic Off Policy
MAA2C Mixed Yes Both Centralized Critic On Policy
MATRPO Mixed Yes Both Centralized Critic On Policy
MAPPO Mixed Yes Both Centralized Critic On Policy
HATRPO Cooperative Yes Both Centralized Critic On Policy
HAPPO Cooperative Yes Both Centralized Critic On Policy
VDN Cooperative No Discrete Value Decomposition Off Policy
QMIX Cooperative Yes Discrete Value Decomposition Off Policy
FACMAC Cooperative Yes Continuous Value Decomposition Off Policy
VDAC Cooperative Yes Both Value Decomposition On Policy
VDPPO Cooperative Yes Both Value Decomposition On Policy

Current Task & Available algorithm mapping: Y for available, N for not suitable, P for partially available on some scenarios. (Note: in our code, independent algorithms may not have I as prefix. For instance, PPO = IPPO)

Env w Algorithm IQL PG A2C DDPG TRPO PPO COMA MADDPG MAAC MATRPO MAPPO HATRPO HAPPO VDN QMIX FACMAC VDAC VDPPO
LBF Y Y Y N Y Y Y N Y Y Y Y Y P P P P P
RWARE Y Y Y N Y Y Y N Y Y Y Y Y Y Y Y Y Y
MPE P Y Y P Y Y P P Y Y Y Y Y Y Y Y Y Y
SMAC Y Y Y N Y Y Y N Y Y Y Y Y Y Y Y Y Y
MetaDrive N Y Y Y Y Y N N N N N N N N N N N N
MAgent Y Y Y N Y Y Y N Y Y Y Y Y N N N N N
Pommerman Y Y Y N Y Y P N Y Y Y Y Y P P P P P
MaMujoco N Y Y Y Y Y N Y Y Y Y Y Y N N Y Y Y
GRF Y Y Y N Y Y Y N Y Y Y Y Y Y Y Y Y Y
Hanabi Y Y Y N Y Y Y N Y Y Y Y Y N N N N N

Part IV. Getting started

Install Ray

pip install ray==1.8.0 # version sensitive

Add patch of MARLlib

cd patch
python add_patch.py

Y to replace source-packages code

Attention: Above is the common installation. Each environment needs extra dependency. Please read the installation instruction in envs/base_env/install.

Examples

python marl/main.py --algo_config=MAPPO [--finetuned] --env-config=smac with env_args.map_name=3m

--finetuned is optional, force using the finetuned hyperparameter

We provide an introduction to the code directory to help you get familiar with the codebase:

  • top level directory structure

This picture is in image/code-MARLlib.png

  • MARL directory structure

This picture is in image/code-MARL.png.png

  • ENVS directory structure

This picture is in image/code-ENVS.png.png

Part V. Contribute New Environment

MARLlib is designed to be friendly to incorporate new environment. Besides the ten we already implemented, we support nearly all kinds of MARL environments. Before the contribution, you need to know:

Things you have to do:

Things you do not have to do:

  • modify the MARLlib data processing pipeline
  • provide a unique runner or controller
  • worry about the data logging

As the ten environments we already included have covered such a great diversity in action space, observation space, agent-env interact style, task mode, additional information like action mask, etc. The best practice to incorporate your own environment is find an existing similar one and provide a same interface.

Part VI. Bug Shooting

Most RLlib related error on MARL are fixed by our patch file.

Here we only list the common bugs not RLlib related. (Mostly is your mistake)

  • observation/action out of space bug:

    • make sure the observation/action space defined in env init function
      • has same data type with env returned data (e.g., float32/64)
      • env returned data range is in the space scope (e.g., box(-2,2))
    • the returned env observation contained the required key (e.g., action_mask/state)
  • Action NaN is invaild bug

    • this is common bug espectially in continues control problem, carefully finetune the algorithm's hyperparameter
      • smaller learning rate
      • set some action value bound

License

The MIT License

marllib's People

Contributors

theohhhu avatar fortymiles avatar wwxfromtju avatar hhhusiyi-monash avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.