cyanrain7 / trpo-in-marl Goto Github PK

License: MIT License

Python 99.30% Shell 0.70%

trpo-in-marl's Issues

Confused about the results of IPPO and MAPPO.

I notice that in your code the multiagent mujoco environment is an MDP setting. Thus, the inputs of critics of IPPO and MAPPO are the same. I expect the performances to be similar but the results in the figure are not. Are there other factors I'm ignoring? I am looking forward to your reply. Thank you!

gym error

when using DummyVecEnv, the env class has no '_get_obs' properties
Traceback (most recent call last):
File "train/train_mujoco.py", line 163, in
main(sys.argv[1:])
File "train/train_mujoco.py", line 136, in main
envs = make_train_env(all_args)
File "train/train_mujoco.py", line 35, in make_train_env
return ShareDummyVecEnv([get_env_fn(0)])
File "../envs/env_wrappers.py", line 712, in init
self.envs = [fn() for fn in env_fns]
File "../envs/env_wrappers.py", line 712, in
self.envs = [fn() for fn in env_fns]
File "train/train_mujoco.py", line 25, in init_env
env = MujocoMulti(env_args=env_args)
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 104, in init
self.share_obs_size = self.get_state_size()
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 204, in get_state_size
return len(self.get_state()[0])
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 191, in get_state
state = self.env._get_obs()
File "/home/spaci/anaconda3/envs/test/lib/python3.7/site-packages/gym/core.py", line 228, in getattr
raise AttributeError(f"attempted to get missing private attribute '{name}'")
AttributeError: attempted to get missing private attribute '_get_obs'

what to do with a dead agent

Hello, I would like to ask, when an agent dies, but the environment does not end, the data of the dead agent will continue to be collected at this time, but how to process the data during training, and the decision of a dead agent will affect the subsequent How to deal with the impact of the update of the agent, will it affect the Multi-Agent Advantage Decomposition Lemma

About the number of Critic Networks

This is a very helpful work, but I have a question about the code: in the code, HAPPO_Policy seems to build a Critic network for each agent, but in the paper there seems to be only one total Critic network. Does this affect the experimental results?

self.actor = Actor(args, self.obs_space, self.act_space, self.device) self.critic = Critic(args, self.share_obs_space, self.device)

Looking forward to your reply, thank you.

Questions about visualization

Hi,
I wonder if you had tried to visualize the starCradt game with your trained model. I tried to set the parameter '--user-render', it didn't work. How should I visualize it?

look forward to your reply.

Some questions about HAPPO implementation

The above is the equation(10) in the paper, but I can't find it in the current implementation.

In the file <happo_trainer.py>

And <separated_buffer.py>

I can not find any preprocessing to advantages like equation (10) in your paper.

I would appreciate knowing how iterative updates in the algorithm1 are represented in the code.

conflicting dependicies and distribution of some packages not found

Hi,
I am trying to install the packages in the requirements file but I got errors, like matplotlib and tensorflow versions are not found and conflicting dependicies; Any clue how to fix this?
Thank you

muti_env_error

when i run the train_mujoco.sh, the error generate:
NotImplementedError
Traceback (most recent call last):
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 163, in
main(sys.argv[1:])
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 136, in main
envs = make_train_env(all_args)
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 37, in make_train_env
return ShareSubprocVecEnv([get_env_fn(i) for i in range(all_args.n_rollout_threads)])
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/../envs/env_wrappers.py", line 360, in init
self.n_agents = self.remotes[0].recv()
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 384, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

can anyone help me?
thanks

The question about critic loss

The hatrpo works well in my environment, but why critic loss will increase? Addtionally, I think the "kl_threshold" is a important parameter . Could you please tell me how to tune the parameter? My parameter settings and experiment results are as follows.
Looking forward to your reply. Thank you.
critic_lr: 5e-3
opti_eps: 1e-5
kl_threshold: 0.0001
gamma: 0.99
use_linear_lr_decay: True

The Script code runs wrong when applying the HATRPO algorithm with 【rnn】 network.

Hello, I try to run your code with the hatrpo algorithm and 【rnn】 network. Specially, I add "--use_recurrent_policy" in both scripts: train_smac.sh and train_mujoco, modify the algo='hatrpo'. However, both the scripts code go wrong and return errors as below:

RuntimeError: the derivative for '_cudnn_rnn_backward' is not implemented. Double backwards is not supported for CuDNN RNNs due to limitations in the CuDNN API. To run double backwards, please disable the CuDNN backend temporarily while running the forward pass of your RNN. For example:
with torch.backends.cudnn.flags(enabled=False):
output = model(inputs)

dependency issue

Hi,
I tried the repo's recommendation method to install the dependencies as written in requirements.txt, but the python 3.9 is not compatible with tensorflow 2.0.0, as well as other related packages. I am not sure if the environment setting is correct as described in the requirements.txt, will the codes run under newer version of tensorflow and other packages? could you please update the package dependencies?

Do you have PyMARL implementation?

Hi, I found many MARL methods were implemented with PyMARL. Do you have PyMARL implementation of this repo?

Question about HAPPO performance in StarCraftII

When I ran HAPPO in the StarCraftII, I found that HAPPO the performance is poor at 3s5z_ vs_3s6z map, far from MAPPO. The parameters I use are:

--n_training_threads 32 --n_rollout_threads 8 --num_mini_batch 1 --episode_length 400
--num_env_steps 10000000 --ppo_epoch 5 --use_value_active_masks --use_eval --eval_episodes 32 --use_recurrent_policy

and the default parameters are used for the rest.

The result is an average of 5 times, and the shadow represents the 95% confidence interval

We look forward to your reply.

Question about observation and state in multi-agent mujoco tasks

I have a question on multi-agent mujoco tasks.
For an agent, the global state and its observation seem to be the same in mujoco_multi.py.
Have I misunderstood the code？

I have some questions about the adjustment of experiment parameters.

I ran the default experiment "ant-v2, 2x4" and used the default parameters to get the results in the first picture. Later, I modified the parameters (n_rollout_threads: 24, num_mini_batch: 4, ppo_epoch: 40) and got the results in the second picture.

I have also made other modifications to the experimental parameters, but I have not achieved the performance shown in the article in the experiment of "ant-v2, 2x4" provided by the code.
So I want to ask if there are some rules or skills in parameter adjustment of HAPPO / HATRPO algorithm.

I found a bug in file 'utils/util.py'. If we use discrete action space in 'runners\separated\mujoco_runner.py' and store it's transition in buffer, we will get a bug. Because the act_shape is a constant value.

I test it in MPE

I found that the action value exceeds the limit

Hello, I found that the action value is not between [-1,+1].

How do you use global information and local information in multi-agent mujoco?

I notice that in your multi-agent mujoco environment codes,

def get_obs(self):
    """ Returns all agent observat3ions in a list """
    state = self.env._get_obs()
    obs_n = []
    for a in range(self.n_agents):
        agent_id_feats = np.zeros(self.n_agents, dtype=np.float32)
        agent_id_feats[a] = 1.0
        # obs_n.append(self.get_obs_agent(a))
        # obs_n.append(np.concatenate([state, self.get_obs_agent(a), agent_id_feats]))
        # obs_n.append(np.concatenate([self.get_obs_agent(a), agent_id_feats]))
        obs_i = np.concatenate([state, agent_id_feats])
        obs_i = (obs_i - np.mean(obs_i)) / np.std(obs_i)
        obs_n.append(obs_i)
    return obs_n

def get_state(self, team=None):
    # TODO: May want global states for different teams (so cannot see what the other team is communicating e.g.)
    state = self.env._get_obs()
    share_obs = []
    for a in range(self.n_agents):
        agent_id_feats = np.zeros(self.n_agents, dtype=np.float32)
        agent_id_feats[a] = 1.0
        # share_obs.append(np.concatenate([state, self.get_obs_agent(a), agent_id_feats]))
        state_i = np.concatenate([state, agent_id_feats])
        state_i = (state_i - np.mean(state_i)) / np.std(state_i)
        share_obs.append(state_i)
    return share_obs

They all use self.env._get_obs() and will return the same obs information, so in your codes, what the differences between get_obs() and get_state(), and how do you use global information and local information in your algorithm?

cyanrain7 / trpo-in-marl Goto Github PK

trpo-in-marl's Issues

Recommend Projects

Recommend Topics

Recommend Org