Giter Club home page Giter Club logo

multiagent-traffic-control-with-transformers's Introduction

MultiAgent Traffic Control Using Transformers

Recent advances in multi-agent reinforcement learning have been largely limited training one model from scratch for every new task. This limitation occurs due to the restriction of the model architecture related to fixed input and output di- mensions, which hinder the experience accumulation and transfer of the learned agent over tasks across diverse levels of difficulty. Among these Multi-Agent Traffic Control is a hot topic research. So, here I have implemented the UNIVERSAL MULTI - AGENT REINFORCEMENT LEARNING VIA POLICY DECOUPLING WITH TRANSFORMERS (UPDET) on a custom build Multiagent Traffic Env.

Environment Used

As said before the enviroment which I have used is a custom build environemnt of Traffic control with varied difficulities and number of agents. The demo of the env is shown below.

In this project I have kept number of agent = 10 and difficulty as medium. One can change the difficulty and number of agents here (add link to file here).

Model Architecture


Diagram representing the pipeline of the whole model used for training


The right part of he figure is the UPDeT algorithm which is a replacement of the traditional RNN/LSTM based base model. The main reason of this replacement is the advantages of transformers based model especially UPDET as compared to the RNN/LSTM based models as explained in the paper[1].
The main advantages of this are :
  1. UPDeT-based MARL framework outperforms RNN-based frameworks by a large margin in terms of final performance on state-of-the-art centralized functions.
  2. This model has strong transfer capability and can handle a number of different tasks at a time.
  3. The model accelerates the transfer learning speed (total steps cost) to make it roughly 10 times faster compared to RNN-based models in most scenarios.
The left side of the figure is the Qmix mixer architecture as shown in paper[2]. The QMix can be replaced by other networks like COMA, QTRAN, VDN and IQL. But in some researches we found QMix to outperform these listed models, so, I used the Qmix out of all options.

Result and Inference :

Baseline :

I have compared the results of the UPDET with a well known Multi Agent Communication algorithm - Graph convolutional reinforcement learning, namely DGN, is instantiated based on deep Q network and trained end-to-end. The main reason of using DGN is it's based on GNN (Graph Neural Networks and has Multi-Head Attention).

For DGN codes, go to DGN folder.

UPDeT
Episode Mean Rewards
Episode Mean Win rates
Episode Mean No. of Collisions
UPDeT
Episode Mean Rewards
Episode Mean Win rates
Episode Mean No. of Collisions

The plots of UPDeT at left and right are using the hidden states as the masking in Self-Attentions for Transformers. (Click on the images to Zoom In)


UPDeT
Episode Mean Rewards
Episode Mean Win rates
Episode Mean No. of Collisions

The plots of UPDeT at left and right are using the self masking as the masking in Self-Attentions for Transformers. (Click on the images to Zoom In)

Comparision with DGN :

DGN
Episode Mean Rewards
Episode Mean Win rates
Episode Mean No. of Collisions

References :

[1] Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang
[2] QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
[3] Counterfactual Multi-Agent Policy Gradients Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
[4] QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, Yung Yi
[5] Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control Tianshu Chu, Jie Wang, Lara Codecà, and Zhaojian Li
[6] Value-Decomposition Networks For Cooperative Multi-Agent Learning Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel
[7] Attention Is All You Need: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
[8] Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu
[9] GRAPH CONVOLUTIONAL REINFORCEMENT LEARNING Jiechuan Jiang1, Chen Dun2∗, Tiejun Huang1 & Zongqing Lu Peking University, Rice University

multiagent-traffic-control-with-transformers's People

Contributors

ayush-ishan avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.