yalidu / liir Goto Github PK
View Code? Open in Web Editor NEWLearning Individual Intrinsic Reward in MARL
Learning Individual Intrinsic Reward in MARL
In the experiments of LIIR, env steps are set to be about 10 million steps.
In the experiments of QMIX, env steps are set to be about 1 million steps.
Why the env steps are quite different? I don't know how to interpret the difference of env steps.
Could you please tell me why?
I have been able to get SMAC installed and working to train COMA/QMIX agents.
When I tried to run your code the first time I received this error:
File "src/main.py", line 33, in my_main
env_args['seed'] = _config["seed"]
sacred.utils.SacredError: The configuration is read-only in a captured function!
To fix this I added the following code to the main.py file:
SETTINGS['CONFIG']['READ_ONLY_CONFIG'] = False
now I am receiving:
ValueError: Unknown game version: 4.1.4. Known versions: ['latest'].
This is the same error I receive when I try to run the Variance Base Control code from
Any ideas?
Thank you for your time.
Edit: I am using version 4.11.3
Command Line: '"/home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 17341 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-xnha2j60/ -eglpath libEGL.so'
[INFO 15:17:39] absl Connecting to: ws://127.0.0.1:23439/sc2api, attempt: 0, running: True
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 23439 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-luffepfv/ -eglpath libEGL.so'
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 22064 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-68fdclnd/ -eglpath libEGL.so'
[INFO 15:17:39] absl Launching SC2: /home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64 -listen 127.0.0.1 -port 17708 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-yw5hsyts/ -eglpath libEGL.so
[INFO 15:17:39] absl Connecting to: ws://127.0.0.1:18539/sc2api, attempt: 0, running: True
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 18539 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-kzfowfm7/ -eglpath libEGL.so'
[INFO 15:17:39] absl Connecting to: ws://127.0.0.1:17863/sc2api, attempt: 0, running: True
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 17863 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-z1h9r644/ -eglpath libEGL.so'
[INFO 15:17:39] absl Connecting to: ws://127.0.0.1:17708/sc2api, attempt: 0, running: True
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 17708 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-yw5hsyts/ -eglpath libEGL.so'
Starting up...
Startup Phase 1 complete
Startup Phase 1 complete
[INFO 15:17:39] absl Connecting to: ws://127.0.0.1:21692/sc2api, attempt: 0, running: True
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 21692 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-flwn0za9/ -eglpath libEGL.so'
Starting up...
Starting up...
Starting up...
Starting up...
[INFO 15:17:39] absl Launching SC2: /home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64 -listen 127.0.0.1 -port 20292 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-i3e29i5s/ -eglpath libEGL.so
Starting up...
Starting up...
Starting up...
Starting up...
[INFO 15:17:39] absl Connecting to: ws://127.0.0.1:20292/sc2api, attempt: 0, running: True
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/gezhixin/pymarl-master/3rdparty/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 20292 -dataDir /home/gezhixin/pymarl-master/3rdparty/StarCraftII/ -tempDir /tmp/sc-i3e29i5s/ -eglpath libEGL.so'
Starting up...
Starting up...
Starting up...
Starting up...
Starting up...
Starting up...
Starting up...
Starting up...
Starting up...
Starting up...
Startup Phase 1 complete
Starting up...
Starting up...
Starting up...
Starting up...
Startup Phase 1 complete
Startup Phase 1 complete
Starting up...
Starting up...
Startup Phase 1 complete
Starting up...
Starting up...
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Starting up...
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
Startup Phase 1 complete
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:24288/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:20827/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:16825/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:23349/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:19037/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:23299/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:15751/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:18819/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:24116/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:16435/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:16153/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:24580/sc2api, attempt: 1, running: True
[INFO 15:17:40] absl Connecting to: ws://127.0.0.1:16831/sc2api, attempt: 1, running: True
File "C:\Users\xxx\Downloads\liir-master\liir-master\src\controllers\basic_controller.py", line 27, in forward
agent_inputs = self._build_inputs(ep_batch, t)
File "C:\Users\xxx\Downloads\liir-master\liir-master\src\controllers\basic_controller.py", line 92, in _build_inputs
inputs = th.cat([x.reshape(bs*self.n_agents, -1) for x in inputs], dim=1)
RuntimeError: error in LoadLibraryA
Thanks for any help!
Hi,
Thanks for your awesome work on MARL.
Here I still some questions after reading your paper.
As for equation 6, i am confused why here the Advantage Function is a decentralized critic for each agent but not centralized critic defined in equation 1 following rule CLDE(Centralized learning with decentralized execution). I guess in equation 6 the 'u' and 's' should be bold.
Besides in Algorithm 1 line 5 , i believe the replacement of equation 8 miss a policy with a log function.
Wonder if I misunderstand something and could your check for me pls?
Best.
C
First of all, thank you for interesting paper and published code!
I have a question regarding the training of the critics. As far as I understand you use state value functions and train them off-policy through temporal difference learning (https://github.com/yalidu/liir/blob/master/src/learners/liir_learner.py#L224). Could you please clarify how to manage expectation over reward in temporal difference target for state value function if states/actions come from different policy? It seems to me that you need to sample from current policy, take the corresponding action and feed it to unknown reward function. Am I missing something?
Hi,
Thanks for your awesome work on MARL.
Where are the codes for Visualizing the Learned Intrinsic Reward? Did you Visualize it by Replay? How to connect it with intrinsic reward?
First of all, thank you for your very interesting paper/method.
We tried to do an ablation and compare the default value of the meta rewards weight λ and zero value. Unfortunately, we managed neither match the results in the paper (bad performance and high variance for nonzero λ), nor get the improvement for default λ w.r.t. λ=0. Are there any specific tips or config discrepancies that could be responsible for that?
Plots are attached, last value in legend labels is number of seeds.
Hi,
I run liir in the Capture Target domain, where two agents have to capture a moving target simultaneously in a grid world with only +1 terminal reward. I performed decent hyper-parameter tuning, however, it doesn't learn anything.
I found the "mask_alive" (line68 liir_learner.py) made all available actions to be 0, which cause the log_pi_taken (line 99 liir_learner.py) to be 0 in the end. So there was no gradient at all. Is this a bug, or any other suggestion?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.