Giter Club home page Giter Club logo

airl_mountaincar's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

violasox

airl_mountaincar's Issues

关于依赖包版本

您好,能否列出代码要求的依赖包版本,例如tensorflow版本,谢谢!

Weird reward behavior

Hi Jia Lian,

I replicated your code in Pytorch and used the same number of hidden units and position only. My reward is also just a function of state not state-action. The reward I use to update the generator is also the discriminator logistic ratio.

The only difference between us might just be the number of trajectories I collect every iteration, learning rate, and regularization. For discriminator I do your random batch update. But for the generator I sweep through all the (state, action) pairs a few times. There could be minor bugs in my code but I trust my implementation.

But I got quite a weird behavior for the reward function.
itr 25:
24_learned_rewards
itr 75:
74_learned_rewards
itr 175:
174_learned_rewards
itr 275:
274_learned_rewards
itr 475:
474_learned_rewards
itr 775:
774_learned_rewards

You see the trend here. What could be the possible reason that this is happening?

Thanks,
Ran

Old_Policy的更新是怎么回事?

xxx:
您好,非常抱歉,过了这么久才来回复。感谢您的关注。造成你疑问的原因是我没有写清楚。

这里的逻辑是这样的:

首先需要两个Policy的原因是: 在PPO算法中我们想要把On-Policy的训练变成Off-Policy, 因此有了两个Policy。这样做的目的是加快训练过程。这两个Policy网络的不同之处就是网络的参数(权制,偏置)不同, Old_Policy的参数会滞后Policy一段时间。

与环境交互的应该是Old_Policy(我之前写错了,已更新), 它产生的数据用来训练Policy; 训练好多次Plocy之后(体现在2_airl.py里面的if episode % 4 == 0), 然后再使用PPO.assign_policy_parameters()这一行

PPO.assign_policy_parameters()
来将更新后Policy参数存到Old_Policy网络里面。

关于PPO算法,具体我主要参考的是台大李宏毅老师的视频

真的很感谢您的提问,这帮助我很多。

阿梁
2021.9.22

------------------ 原始邮件 ------------------
发件人: "" <******>;
发送时间: 2021年8月30日(星期一) 下午5:54
收件人: "Jack Huang"[email protected];
主题: 回复:请教github AIRL_MountainCar repo

您好,请问有任何进展吗?之前提到的第一个问题是我弄错了,抱歉!但第二个问题,old_policy update我还是不太明白。

xxx

---原始邮件---
发件人: "Jack Huang"[email protected]
发送时间: 2021年8月23日(周一) 晚上6:56
收件人: "xxx";
主题: 回复:请教github AIRL_MountainCar repo

谢谢您的邮件。

我最近会去看一下这个项目,确定一下这个问题。可能要等两天。这是很久之前写的程序,或许还需要更新一下。到时我再回复您。

------------------ 原始邮件 ------------------
发件人: "xxx" [email protected];
发送时间: 2021年8月23日(星期一) 中午1:13
收件人: "Jack Huang"[email protected];
主题: 请教github AIRL_MountainCar repo

Jack Huang,

您好!

我在github上看到了您AIRL_MountainCar的代码,想请问一下这个repo已经更新完整了吗? 特别是generator部分,感觉policy update有些细节不是很清楚,比如:

  1. 在2_airl.py中PPO.train的input和定义的不一致,少了一个self.Old_Policy.obs
  2. 在2_airl.py中对于old_policy只进行了初始定义,后续应该还有update?

如果理解有错,烦请指出,十分希望能得到您的解答。

2021年8月23日


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.