huangjialian / airl_mountaincar Goto Github PK
View Code? Open in Web Editor NEWAdversarial Inverse Reinforcement Learning Implement For Mountain Car
Adversarial Inverse Reinforcement Learning Implement For Mountain Car
您好,能否列出代码要求的依赖包版本,例如tensorflow版本,谢谢!
Hi Jia Lian,
I replicated your code in Pytorch and used the same number of hidden units and position only. My reward is also just a function of state not state-action. The reward I use to update the generator is also the discriminator logistic ratio.
The only difference between us might just be the number of trajectories I collect every iteration, learning rate, and regularization. For discriminator I do your random batch update. But for the generator I sweep through all the (state, action) pairs a few times. There could be minor bugs in my code but I trust my implementation.
But I got quite a weird behavior for the reward function.
itr 25:
itr 75:
itr 175:
itr 275:
itr 475:
itr 775:
You see the trend here. What could be the possible reason that this is happening?
Thanks,
Ran
你好
/2_airl.py 217行 报错,v_preds_next 没有赋值,是不是前面漏掉了
xxx:
您好,非常抱歉,过了这么久才来回复。感谢您的关注。造成你疑问的原因是我没有写清楚。
这里的逻辑是这样的:
首先需要两个Policy的原因是: 在PPO算法中我们想要把On-Policy的训练变成Off-Policy, 因此有了两个Policy。这样做的目的是加快训练过程。这两个Policy网络的不同之处就是网络的参数(权制,偏置)不同, Old_Policy的参数会滞后Policy一段时间。
与环境交互的应该是Old_Policy(我之前写错了,已更新), 它产生的数据用来训练Policy; 训练好多次Plocy之后(体现在2_airl.py
里面的if episode % 4 == 0
), 然后再使用PPO.assign_policy_parameters()
这一行
Line 224 in ec707ef
关于PPO算法,具体我主要参考的是台大李宏毅老师的视频
真的很感谢您的提问,这帮助我很多。
阿梁
2021.9.22
------------------ 原始邮件 ------------------
发件人: "" <******>;
发送时间: 2021年8月30日(星期一) 下午5:54
收件人: "Jack Huang"[email protected];
主题: 回复:请教github AIRL_MountainCar repo
您好,请问有任何进展吗?之前提到的第一个问题是我弄错了,抱歉!但第二个问题,old_policy update我还是不太明白。
xxx
---原始邮件---
发件人: "Jack Huang"[email protected]
发送时间: 2021年8月23日(周一) 晚上6:56
收件人: "xxx";
主题: 回复:请教github AIRL_MountainCar repo
谢谢您的邮件。
我最近会去看一下这个项目,确定一下这个问题。可能要等两天。这是很久之前写的程序,或许还需要更新一下。到时我再回复您。
------------------ 原始邮件 ------------------
发件人: "xxx" [email protected];
发送时间: 2021年8月23日(星期一) 中午1:13
收件人: "Jack Huang"[email protected];
主题: 请教github AIRL_MountainCar repo
Jack Huang,
您好!
我在github上看到了您AIRL_MountainCar的代码,想请问一下这个repo已经更新完整了吗? 特别是generator部分,感觉policy update有些细节不是很清楚,比如:
如果理解有错,烦请指出,十分希望能得到您的解答。
2021年8月23日
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.