Giter Club home page Giter Club logo

gym-microrts-paper-sb3's Introduction

Gym-μRTS with Stable-Baselines3/PyTorch

This repo contains an attempt to reproduce Gridnet PPO with invalid action masking algorithm to play μRTS using Stable-Baselines3 library. Apart from reproducibility, this might open access to a diverse set of well tested algorithms, and toolings for training, evaluations, and more.

Original paper: Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games.

Original code: gym-microrts-paper.

demo.gif

Install

Prerequisites:

  • Python 3.7.1+
  • Java 8.0+
  • FFmpeg (for video capturing)
git clone https://github.com/kachayev/gym-microrts-paper-sb3
cd gym-microrts-paper-sb3
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Note that I use newer version of gym-microrts compared to the one that was originally used for the paper.

Training

To traing an agent:

$ python ppo_gridnet_diverse_encode_decode_sb3.py

If everything is setup correctly, you'll see typicall SB3 verbose logging:

Using cuda device
---------------------------------
| microrts/          |          |
|    avg_exec_time   | 0.00409  |
|    num_calls       | 256      |
|    total_exec_time | 1.05     |
| time/              |          |
|    fps             | 560      |
|    iterations      | 1        |
|    time_elapsed    | 10       |
|    total_timesteps | 6144     |
---------------------------------
-----------------------------------------
| microrts/               |             |
|    avg_exec_time        | 0.00321     |
|    num_calls            | 512         |
|    total_exec_time      | 1.64        |
| time/                   |             |
|    fps                  | 164         |
|    iterations           | 2           |
|    time_elapsed         | 74          |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.001475019 |
|    clip_fraction        | 0.0575      |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.46       |
|    explained_variance   | 0.00712     |
|    learning_rate        | 0.00025     |
|    loss                 | 0.0579      |
|    n_updates            | 4           |
|    policy_gradient_loss | -0.0032     |
|    value_loss           | 0.261       |
-----------------------------------------

By default, all settings are set as close as possible to the original implementation from the paper as possible. Thought the script supports flexible params:

$ python ppo_gridnet_diverse_encode_decode_sb3.py \
  --total-timesteps 10_000 \
  --bot-envs coacAI=8 randomBiasedAI=8 \
  --num-selfplay-envs 12 \
  --batch-size 2048 \
  --n-epochs 10

A trained agent is automatically saved to agents/ folder (or any other folder provided as --exp-folder parameter). Now you can use enjoy.py to test it out in action:

$ python enjoy.py \
  --agent-file agents/ppo_gridnet_diverse_encode_decode_sb3__1__1640241051.zip \
  --max-steps 1_000
  --bot-envs randomBiasedAI=1

As soon as correctness of the implementation is verified, I will provide details on how to use RL Baselines3 Zoo for training and evaluations.

Implementational Caveats

A few notes / pain points regarding the implementation of the alrogithms, and the process of integrating it with stable-baselines3:

  • Gym does not ship a space for "array of multidiscrete" use case (let's be honest, it's not very common). But it gives an option for defining your space when necessary. A new space, when defined, is not easy to integrate into SB3. In a few different places SB3 raises NotImplementedError facing unknown space (example 1, example 2).
  • Seems like switching to fully rolled out MutliDiscrete space definition has a significant performance penalty. Still investigating if this can be improved.
  • Invalid masking is implemented by passing masks into observations from the wrapper (the observation space is replaced with gym.spaces.Dict to hold both observations and masks). By doing it this way, masks are now available for policy, and fit rollout buffer layout. Masking is implemented by setting logits into -inf (or to a rather small number).

Look for xxx(hack) comments in the code for more details.

gym-microrts-paper-sb3's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.