lizhi-sjtu / marl-code-pytorch Goto Github PK

View Code? Open in Web Editor NEW

355.0 2.0 45.0 609 KB

Concise pytorch implements of MARL algorithms, including MAPPO, MADDPG, MATD3, QMIX and VDN.

License: MIT License

Python 100.00%

mappo mpe qmix reinforcement-learning smac vdn maddpg matd3

marl-code-pytorch's Introduction

MARL-code-pytorch

Concise pytorch implements of MARL algorithms, including MAPPO, MADDPG, MATD3, QMIX and VDN.

Requirements

python==3.7.9
numpy==1.19.4
pytorch==1.5.0
tensorboard==0.6.0
gym==0.10.5
Multi-Agent Particle-World Environment(MPE)
SMAC-StarCraft Multi-Agent Challenge

Trainning results

1. MAPPO in MPE (discrete action space)

2. MAPPO in StarCraft II(SMAC)

3. QMIX and VDN in StarCraft II(SMAC)

4. MADDPG and MATD3 in MPE (continuous action space)

Some Details

In order to facilitate switching between discrete action space and continuous action space in MPE environments, we make some small modifications in MPE source code.

1. make_env.py

We add an argument named 'discrete' in 'make_env.py',which is a bool variable.

2. environment.py

We also add an argument named 'discrete' in 'environment.py'.

3. How to create a MPE environment?

If your want to use discrete action space mode, you can use 'env=make_env(scenario_name, discrete=True)'
If your want to use continuous action space mode, you can use 'env=make_env(scenario_name, discrete=False)'

marl-code-pytorch's People

Contributors

Stargazers

Watchers

marl-code-pytorch's Issues

是否支持使用自创环境？

请问mappo如何改为连续动作？

Discrete action

(MARL) root@5e4fd369caa6:~/code/MARL-code-pytorch/4.MADDPG_MATD3_MPE# python3 MADDPG_MATD3_main.py --algorithm MADDPG
Traceback (most recent call last):
File "MADDPG_MATD3_main.py", line 142, in
runner = Runner(args, env_name=env_names[env_index], number=1, seed=0)
File "MADDPG_MATD3_main.py", line 19, in init
self.env = make_env(env_name, discrete=False) # Continuous action space
TypeError: make_env() got an unexpected keyword argument 'discrete'

Then, I try to remove this argument'discrete' ------self.env = make_env(env_name). There appear this problem:

(MARL) root@5e4fd369caa6:~/code/MARL-code-pytorch/4.MADDPG_MATD3_MPE# python3 MADDPG_MATD3_main.py --algorithm MADDPG
Traceback (most recent call last):
File "MADDPG_MATD3_main.py", line 142, in
runner = Runner(args, env_name=env_names[env_index], number=1, seed=0)
File "MADDPG_MATD3_main.py", line 23, in init
self.args.action_dim_n = [self.env.action_space[i].shape[0] for i in range(self.args.N)] # actions dimensions of N agents
File "MADDPG_MATD3_main.py", line 23, in
self.args.action_dim_n = [self.env.action_space[i].shape[0] for i in range(self.args.N)] # actions dimensions of N agents
IndexError: tuple index out of range

Could anyone help me?

运行错误

a_n = [agent.choose_action(obs, noise_std=0) for agent, obs in zip(self.agent_n, obs_n)] # We do not add noise when evaluating
TypeError: 'NoneType' object is not iterable

obs_n = None

怎么解决

Anyone knows how to run trained models?

Hello, I got this code to work and I trained models. I can't find the way how to see them how well they perform tho. Could anyone help me navigate to run the models?

Thank you

An error occurred after the environment name was changed to simple_adversary

I change the env_name to simple_adversary, then I got the problem below:
Traceback (most recent call last):
File "D:/WorkSpace/PycharmWorkSpaces/MARL-code-pytorch-main/MARL-code-pytorch-main/1.MAPPO_MPE/MAPPO_MPE_main.py", line 149, in
runner.run()
File "D:/WorkSpace/PycharmWorkSpaces/MARL-code-pytorch-main/MARL-code-pytorch-main/1.MAPPO_MPE/MAPPO_MPE_main.py", line 54, in run
self.evaluate_policy() # Evaluate the policy every 'evaluate_freq' steps
File "D:/WorkSpace/PycharmWorkSpaces/MARL-code-pytorch-main/MARL-code-pytorch-main/1.MAPPO_MPE/MAPPO_MPE_main.py", line 70, in evaluate_policy
episode_reward, _ = self.run_episode_mpe(evaluate=True)
File "D:/WorkSpace/PycharmWorkSpaces/MARL-code-pytorch-main/MARL-code-pytorch-main/1.MAPPO_MPE/MAPPO_MPE_main.py", line 90, in run_episode_mpe
a_n, a_logprob_n = self.agent_n.choose_action(obs_n, evaluate=evaluate) # Get actions and the corresponding log probabilities of N agents
File "D:\WorkSpace\PycharmWorkSpaces\MARL-code-pytorch-main\MARL-code-pytorch-main\1.MAPPO_MPE\mappo_mpe.py", line 163, in choose_action
obs_n = torch.tensor(obs_n, dtype=torch.float32) # obs_n.shape=(N，obs_dim)
ValueError: expected sequence of length 8 at dim 1 (got 10)

maddpg在mpe:simple_spread中不收敛

没有改动任何参数，这是什么原因呢

关于MADDPG的critic

tensorboard==0.6.0

Is this version number correct?
ERROR: Could not find a version that satisfies the requirement tensorboard==0.6.0 (from versions: 1.6.0rc0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.13.1, 1.14.0, 1.15.0, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.4.0, 2.4.1, 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.9.1, 2.10.0)
ERROR: No matching distribution found for tensorboard==0.6.0

关于MAPPO的episode

MAPPO中是否无法设置每一次epoch的episode呢？

the way to update actor and critic is quite different from the orignal mappo, i wonder this difference have benen tested yet

self.args.action_dim_n = [self.env.action_space[i].n for i in range(self.args.N)]  # actions dimensions of N agents

AttributeError: 'Box' object has no attribute 'n'

mappo测试和训练差异较大

移植该仓库代码，出现一个问题，就是训练的时候reward确实增大，但是测试时候reward很低，几乎没有上升趋势？有人出现过这个问题吗

MAPPO如何Decentrolised Execute

您好非常感谢这版MAPPO代码，

但是我有一个疑问，MAPPO中的Actor输入是所有Agent的Observation，输出是所有Agent的动作概率，那他如何Decentrolised Execute呢？