Comments (1)
【Why is the expression of the value fuction in compute_policy_v (line 52) same as the state-action function in compute_policy_v (line37) ?】
The action is different, so even though the equation is the same, the meaning of such lines is different.
【It seems that the expression in the code ignore the transition probability P(s'|s,a)?】
for p, s_, r, _ in env.env.P[s][policy_a]
in this line is exactly the transition probability, so it is not ignored.
Feel free to reopen this issue if you have further question
from rlexample.
Related Issues (10)
- BEETLE Algorithm HOT 1
- Problems with code in pgb-pong-pytorch and pg-pong-pytorch HOT 1
- Policy loss computes gradients for value network HOT 1
- No module named 'gym.envs.atari' HOT 1
- 问题咨询
- The small problem in value_iteration code HOT 1
- policy iteration doesn't work for deterministic frozen_lake env HOT 1
- cliffwalk.py running issue
- Jupyter Notebook Vs Anaconda's Spyder
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rlexample.