cyanrain7 / trpo-in-marl Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
I notice that in your code the multiagent mujoco environment is an MDP setting. Thus, the inputs of critics of IPPO and MAPPO are the same. I expect the performances to be similar but the results in the figure are not. Are there other factors I'm ignoring? I am looking forward to your reply. Thank you!
when using DummyVecEnv, the env class has no '_get_obs' properties
Traceback (most recent call last):
File "train/train_mujoco.py", line 163, in
main(sys.argv[1:])
File "train/train_mujoco.py", line 136, in main
envs = make_train_env(all_args)
File "train/train_mujoco.py", line 35, in make_train_env
return ShareDummyVecEnv([get_env_fn(0)])
File "../envs/env_wrappers.py", line 712, in init
self.envs = [fn() for fn in env_fns]
File "../envs/env_wrappers.py", line 712, in
self.envs = [fn() for fn in env_fns]
File "train/train_mujoco.py", line 25, in init_env
env = MujocoMulti(env_args=env_args)
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 104, in init
self.share_obs_size = self.get_state_size()
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 204, in get_state_size
return len(self.get_state()[0])
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 191, in get_state
state = self.env._get_obs()
File "/home/spaci/anaconda3/envs/test/lib/python3.7/site-packages/gym/core.py", line 228, in getattr
raise AttributeError(f"attempted to get missing private attribute '{name}'")
AttributeError: attempted to get missing private attribute '_get_obs'
Hello, I would like to ask, when an agent dies, but the environment does not end, the data of the dead agent will continue to be collected at this time, but how to process the data during training, and the decision of a dead agent will affect the subsequent How to deal with the impact of the update of the agent, will it affect the Multi-Agent Advantage Decomposition Lemma
This is a very helpful work, but I have a question about the code: in the code, HAPPO_Policy seems to build a Critic network for each agent, but in the paper there seems to be only one total Critic network. Does this affect the experimental results?
self.actor = Actor(args, self.obs_space, self.act_space, self.device) self.critic = Critic(args, self.share_obs_space, self.device)
Looking forward to your reply, thank you.
Hi,
I wonder if you had tried to visualize the starCradt game with your trained model. I tried to set the parameter '--user-render', it didn't work. How should I visualize it?
look forward to your reply.
The above is the equation(10) in the paper, but I can't find it in the current implementation.
In the file <happo_trainer.py>
I can not find any preprocessing to advantages like equation (10) in your paper.
I would appreciate knowing how iterative updates in the algorithm1 are represented in the code.
Hi,
I am trying to install the packages in the requirements file but I got errors, like matplotlib and tensorflow versions are not found and conflicting dependicies; Any clue how to fix this?
Thank you
when i run the train_mujoco.sh, the error generate:
NotImplementedError
Traceback (most recent call last):
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 163, in
main(sys.argv[1:])
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 136, in main
envs = make_train_env(all_args)
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 37, in make_train_env
return ShareSubprocVecEnv([get_env_fn(i) for i in range(all_args.n_rollout_threads)])
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/../envs/env_wrappers.py", line 360, in init
self.n_agents = self.remotes[0].recv()
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 384, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
can anyone help me?
thanks
The hatrpo works well in my environment, but why critic loss will increase? Addtionally, I think the "kl_threshold" is a important parameter . Could you please tell me how to tune the parameter? My parameter settings and experiment results are as follows.
Looking forward to your reply. Thank you.
critic_lr: 5e-3
opti_eps: 1e-5
kl_threshold: 0.0001
gamma: 0.99
use_linear_lr_decay: True
Hello, I try to run your code with the hatrpo algorithm and 【rnn】 network. Specially, I add "--use_recurrent_policy" in both scripts: train_smac.sh and train_mujoco, modify the algo='hatrpo'. However, both the scripts code go wrong and return errors as below:
RuntimeError: the derivative for '_cudnn_rnn_backward' is not implemented. Double backwards is not supported for CuDNN RNNs due to limitations in the CuDNN API. To run double backwards, please disable the CuDNN backend temporarily while running the forward pass of your RNN. For example:
with torch.backends.cudnn.flags(enabled=False):
output = model(inputs)
Hi,
I tried the repo's recommendation method to install the dependencies as written in requirements.txt, but the python 3.9 is not compatible with tensorflow 2.0.0, as well as other related packages. I am not sure if the environment setting is correct as described in the requirements.txt, will the codes run under newer version of tensorflow and other packages? could you please update the package dependencies?
Hi, I found many MARL methods were implemented with PyMARL. Do you have PyMARL implementation of this repo?
When I ran HAPPO in the StarCraftII, I found that HAPPO the performance is poor at 3s5z_ vs_3s6z map, far from MAPPO. The parameters I use are:
--n_training_threads 32 --n_rollout_threads 8 --num_mini_batch 1 --episode_length 400
--num_env_steps 10000000 --ppo_epoch 5 --use_value_active_masks --use_eval --eval_episodes 32 --use_recurrent_policy
and the default parameters are used for the rest.
The result is an average of 5 times, and the shadow represents the 95% confidence interval
We look forward to your reply.
I have a question on multi-agent mujoco tasks.
For an agent, the global state and its observation seem to be the same in mujoco_multi.py.
Have I misunderstood the code?
I ran the default experiment "ant-v2, 2x4" and used the default parameters to get the results in the first picture. Later, I modified the parameters (n_rollout_threads: 24, num_mini_batch: 4, ppo_epoch: 40) and got the results in the second picture.
I have also made other modifications to the experimental parameters, but I have not achieved the performance shown in the article in the experiment of "ant-v2, 2x4" provided by the code.
So I want to ask if there are some rules or skills in parameter adjustment of HAPPO / HATRPO algorithm.
I notice that in your multi-agent mujoco environment codes,
def get_obs(self):
""" Returns all agent observat3ions in a list """
state = self.env._get_obs()
obs_n = []
for a in range(self.n_agents):
agent_id_feats = np.zeros(self.n_agents, dtype=np.float32)
agent_id_feats[a] = 1.0
# obs_n.append(self.get_obs_agent(a))
# obs_n.append(np.concatenate([state, self.get_obs_agent(a), agent_id_feats]))
# obs_n.append(np.concatenate([self.get_obs_agent(a), agent_id_feats]))
obs_i = np.concatenate([state, agent_id_feats])
obs_i = (obs_i - np.mean(obs_i)) / np.std(obs_i)
obs_n.append(obs_i)
return obs_n
def get_state(self, team=None):
# TODO: May want global states for different teams (so cannot see what the other team is communicating e.g.)
state = self.env._get_obs()
share_obs = []
for a in range(self.n_agents):
agent_id_feats = np.zeros(self.n_agents, dtype=np.float32)
agent_id_feats[a] = 1.0
# share_obs.append(np.concatenate([state, self.get_obs_agent(a), agent_id_feats]))
state_i = np.concatenate([state, agent_id_feats])
state_i = (state_i - np.mean(state_i)) / np.std(state_i)
share_obs.append(state_i)
return share_obs
They all use self.env._get_obs() and will return the same obs information, so in your codes, what the differences between get_obs() and get_state(), and how do you use global information and local information in your algorithm?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.