Giter Club home page Giter Club logo

trpo-in-marl's People

Contributors

cyanrain7 avatar znowu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

trpo-in-marl's Issues

About the number of Critic Networks

This is a very helpful work, but I have a question about the code: in the code, HAPPO_Policy seems to build a Critic network for each agent, but in the paper there seems to be only one total Critic network. Does this affect the experimental results?

self.actor = Actor(args, self.obs_space, self.act_space, self.device) self.critic = Critic(args, self.share_obs_space, self.device)

Looking forward to your reply, thank you.

gym error

when using DummyVecEnv, the env class has no '_get_obs' properties
Traceback (most recent call last):
File "train/train_mujoco.py", line 163, in
main(sys.argv[1:])
File "train/train_mujoco.py", line 136, in main
envs = make_train_env(all_args)
File "train/train_mujoco.py", line 35, in make_train_env
return ShareDummyVecEnv([get_env_fn(0)])
File "../envs/env_wrappers.py", line 712, in init
self.envs = [fn() for fn in env_fns]
File "../envs/env_wrappers.py", line 712, in
self.envs = [fn() for fn in env_fns]
File "train/train_mujoco.py", line 25, in init_env
env = MujocoMulti(env_args=env_args)
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 104, in init
self.share_obs_size = self.get_state_size()
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 204, in get_state_size
return len(self.get_state()[0])
File "../envs/ma_mujoco/multiagent_mujoco/mujoco_multi.py", line 191, in get_state
state = self.env._get_obs()
File "/home/spaci/anaconda3/envs/test/lib/python3.7/site-packages/gym/core.py", line 228, in getattr
raise AttributeError(f"attempted to get missing private attribute '{name}'")
AttributeError: attempted to get missing private attribute '_get_obs'

Confused about the results of IPPO and MAPPO.

I notice that in your code the multiagent mujoco environment is an MDP setting. Thus, the inputs of critics of IPPO and MAPPO are the same. I expect the performances to be similar but the results in the figure are not. Are there other factors I'm ignoring? I am looking forward to your reply. Thank you!

muti_env_error

when i run the train_mujoco.sh, the error generate:
NotImplementedError
Traceback (most recent call last):
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 163, in
main(sys.argv[1:])
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 136, in main
envs = make_train_env(all_args)
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 37, in make_train_env
return ShareSubprocVecEnv([get_env_fn(i) for i in range(all_args.n_rollout_threads)])
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/../envs/env_wrappers.py", line 360, in init
self.n_agents = self.remotes[0].recv()
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 384, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

can anyone help me?
thanks

what to do with a dead agent

Hello, I would like to ask, when an agent dies, but the environment does not end, the data of the dead agent will continue to be collected at this time, but how to process the data during training, and the decision of a dead agent will affect the subsequent How to deal with the impact of the update of the agent, will it affect the Multi-Agent Advantage Decomposition Lemma

Questions about visualization

Hi,
I wonder if you had tried to visualize the starCradt game with your trained model. I tried to set the parameter '--user-render', it didn't work. How should I visualize it?

look forward to your reply.

The Script code runs wrong when applying the HATRPO algorithm with 【rnn】 network.

Hello, I try to run your code with the hatrpo algorithm and 【rnn】 network. Specially, I add "--use_recurrent_policy" in both scripts: train_smac.sh and train_mujoco, modify the algo='hatrpo'. However, both the scripts code go wrong and return errors as below:

RuntimeError: the derivative for '_cudnn_rnn_backward' is not implemented. Double backwards is not supported for CuDNN RNNs due to limitations in the CuDNN API. To run double backwards, please disable the CuDNN backend temporarily while running the forward pass of your RNN. For example:
with torch.backends.cudnn.flags(enabled=False):
output = model(inputs)

How do you use global information and local information in multi-agent mujoco?

I notice that in your multi-agent mujoco environment codes,

def get_obs(self):
    """ Returns all agent observat3ions in a list """
    state = self.env._get_obs()
    obs_n = []
    for a in range(self.n_agents):
        agent_id_feats = np.zeros(self.n_agents, dtype=np.float32)
        agent_id_feats[a] = 1.0
        # obs_n.append(self.get_obs_agent(a))
        # obs_n.append(np.concatenate([state, self.get_obs_agent(a), agent_id_feats]))
        # obs_n.append(np.concatenate([self.get_obs_agent(a), agent_id_feats]))
        obs_i = np.concatenate([state, agent_id_feats])
        obs_i = (obs_i - np.mean(obs_i)) / np.std(obs_i)
        obs_n.append(obs_i)
    return obs_n

def get_state(self, team=None):
    # TODO: May want global states for different teams (so cannot see what the other team is communicating e.g.)
    state = self.env._get_obs()
    share_obs = []
    for a in range(self.n_agents):
        agent_id_feats = np.zeros(self.n_agents, dtype=np.float32)
        agent_id_feats[a] = 1.0
        # share_obs.append(np.concatenate([state, self.get_obs_agent(a), agent_id_feats]))
        state_i = np.concatenate([state, agent_id_feats])
        state_i = (state_i - np.mean(state_i)) / np.std(state_i)
        share_obs.append(state_i)
    return share_obs

They all use self.env._get_obs() and will return the same obs information, so in your codes, what the differences between get_obs() and get_state(), and how do you use global information and local information in your algorithm?

The question about critic loss

The hatrpo works well in my environment, but why critic loss will increase? Addtionally, I think the "kl_threshold" is a important parameter . Could you please tell me how to tune the parameter? My parameter settings and experiment results are as follows.
Looking forward to your reply. Thank you.
critic_lr: 5e-3
opti_eps: 1e-5
kl_threshold: 0.0001
gamma: 0.99
use_linear_lr_decay: True
image
image

I have some questions about the adjustment of experiment parameters.

I ran the default experiment "ant-v2, 2x4" and used the default parameters to get the results in the first picture. Later, I modified the parameters (n_rollout_threads: 24, num_mini_batch: 4, ppo_epoch: 40) and got the results in the second picture.
image
I have also made other modifications to the experimental parameters, but I have not achieved the performance shown in the article in the experiment of "ant-v2, 2x4" provided by the code.
So I want to ask if there are some rules or skills in parameter adjustment of HAPPO / HATRPO algorithm.

Question about HAPPO performance in StarCraftII

When I ran HAPPO in the StarCraftII, I found that HAPPO the performance is poor at 3s5z_ vs_3s6z map, far from MAPPO. The parameters I use are:

--n_training_threads 32 --n_rollout_threads 8 --num_mini_batch 1 --episode_length 400
--num_env_steps 10000000 --ppo_epoch 5 --use_value_active_masks --use_eval --eval_episodes 32 --use_recurrent_policy

and the default parameters are used for the rest.

The result is an average of 5 times, and the shadow represents the 95% confidence interval
3s5z_3s6z

We look forward to your reply.

Some questions about HAPPO implementation

45e703d34e3718481a732fbc49b7412
The above is the equation(10) in the paper, but I can't find it in the current implementation.

In the file <happo_trainer.py>
737eb5d5d035c88f44ec05ad85a9b1d

And <separated_buffer.py>
baa6d02351ad0e8c4ebeecfcf7962eb

I can not find any preprocessing to advantages like equation (10) in your paper.

I would appreciate knowing how iterative updates in the algorithm1 are represented in the code.

dependency issue

Hi,
I tried the repo's recommendation method to install the dependencies as written in requirements.txt, but the python 3.9 is not compatible with tensorflow 2.0.0, as well as other related packages. I am not sure if the environment setting is correct as described in the requirements.txt, will the codes run under newer version of tensorflow and other packages? could you please update the package dependencies?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.