Comments (6)
When calculating value_estimation_loss
, the returns are way too large:
# With RND: [7101.44531 7166.82178 7235.9126 ... 2192.41016 1505.95093 776.063354]
# Without RND: [-8.60589886 -8.08707809 -9.66196251 ... -2.82508373 -1.94054389 -1.00004435]
tf.print(returns)
# With RND: [1.00574807e-05 1.06366251e-05 1.37969992e-05 ... 1.24972794e-05 9.22966865e-06 1.47654901e-05]
# Without RND: [-3.9881561e-05 -4.0003295e-05 -3.92220318e-05 ... -3.9767292e-05 -4.233031e-05 -4.445585e-05]
tf.print(value_preds)
from agents.
In compute_return_and_advantage
, the normalized intrinsic rewards are too large compared to extrinsic rewards:
# Unnnormalized extrinsic reward
[2.4655962 1.55493808 -0.12013232 ... -0.0169174224 0.165178448 -0.241332144]
[-2.2789948 -1.25498295 0.27360487 ... -1.38796663 -1.61660671 4.28619528]
[-3.87851095 -0.429568112 0.417723715 ... 0.347376496 4.02928114 0.372619241]
...
[1.25972 1.44503868 -1.37598336 ... 0.0951891318 0.978246629 1.19230056]
[0.293478042 -0.386281192 -0.469306529 ... -2.24820375 -0.359491259 -1.34746885]
[1.23334634 -2.23048186 -1.25190639 ... -3.44087744 -1.34803867 -2.59871984]]
# Unnormalized intrinsic reward
[[6.36824608 5.42133617 5.31663656 ... 17.2304916 16.5008812 16.64258]
[14.4909668 15.7531471 17.2095242 ... 12.7710819 13.4693031 14.7130861]
[19.154789 19.1849155 19.2228661 ... 16.8683529 17.8014469 17.604372]
...
[12.4912157 12.3132076 12.2927904 ... 19.5716267 19.7414684 18.6168365]
[14.0771189 14.4516859 12.9586821 ... 24.6045246 24.6045246 24.6045246]
[13.0537968 10.8395157 10.9614 ... 26.7444553 26.7444553 26.7444553]]
# Normalized extrinsic reward
[[1 1 -1 ... -0.534975827 1 -1]
[-1 -1 1 ... -1 -1 1]
[-1 -1 1 ... 1 1 1]
...
[1 1 -1 ... 1 1 1]
[1 -1 -1 ... -1 -1 -1]
[1 -1 -1 ... -1 -1 -1]]
# Normalized intrinsic rewards
[[201.381607 171.437683 168.126801 ... 544.875916 521.80365 526.284546]
[458.244568 498.158203 544.212891 ... 403.857025 425.936737 465.268585]
[605.727539 606.680237 607.880371 ... 533.424133 562.931152 556.699097]
...
[395.006897 389.377777 388.732147 ... 618.909119 624.279968 588.716]
[445.157562 457.002411 409.78949 ... 778.063354 778.063354 778.063354]
[412.797272 342.775543 346.629883 ... 845.733887 845.733887 845.733887]]
from agents.
This might be the reason.
from agents.
Implemented _init_rnd_normalizer
, but it does not seem to fix the issue.
Average Return | Value Estimation Loss | |
---|---|---|
RND |
from agents.
With use_td_lambda_return==False
,
Average Return | Value Estimation Loss | |
---|---|---|
RND |
Average Returns | Average Value Prediction | |
---|---|---|
RND |
Similar results when use_gae==False
.
from agents.
Observation normalization was the issue. Reverted to using streaming normalizer for both PPO and RND for now, but the situation is more complex than I had expected.
from agents.
Related Issues (10)
- RNDPPO: MontezumaRevenge-v0 HOT 1
- tf.clip_by_value results in an error in _init_rnd_normalizer HOT 1
- How to use `replay_buffer.as_dataset()` for minibatches HOT 1
- Value loss explodes in Atari Venture for both PPO and RND
- Action out of bound when running PPO on Atari
- PPO train_eval_atari throws error
- RNDDQN Benchmark: LunarLander-v2
- RNDPPO Benchmark: LunarLander-v2
- RNDPPO Unnormalized vs Normalized Benchmark: LunarLander-v2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from agents.