The problem is gone when I don't normalize observation by dividing by 255. The high va

This might be the reason. <a target="_blank" rel="noopener noreferre

With use_td_lambda_return==False , <table role="ta

RND fails on LunarLander-v2 about agents HOT 6 CLOSED

seungjaeryanlee commented on June 19, 2024

RND fails on LunarLander-v2

from agents.

Comments (6)

seungjaeryanlee commented on June 19, 2024

When calculating value_estimation_loss, the returns are way too large:

# With RND: [7101.44531 7166.82178 7235.9126 ... 2192.41016 1505.95093 776.063354]
# Without RND: [-8.60589886 -8.08707809 -9.66196251 ... -2.82508373 -1.94054389 -1.00004435]
tf.print(returns)
# With RND: [1.00574807e-05 1.06366251e-05 1.37969992e-05 ... 1.24972794e-05 9.22966865e-06 1.47654901e-05]
# Without RND: [-3.9881561e-05 -4.0003295e-05 -3.92220318e-05 ... -3.9767292e-05 -4.233031e-05 -4.445585e-05]
tf.print(value_preds)

from agents.

seungjaeryanlee commented on June 19, 2024

In compute_return_and_advantage, the normalized intrinsic rewards are too large compared to extrinsic rewards:

# Unnnormalized extrinsic reward
[2.4655962 1.55493808 -0.12013232 ... -0.0169174224 0.165178448 -0.241332144]
 [-2.2789948 -1.25498295 0.27360487 ... -1.38796663 -1.61660671 4.28619528]
 [-3.87851095 -0.429568112 0.417723715 ... 0.347376496 4.02928114 0.372619241]
 ...
 [1.25972 1.44503868 -1.37598336 ... 0.0951891318 0.978246629 1.19230056]
 [0.293478042 -0.386281192 -0.469306529 ... -2.24820375 -0.359491259 -1.34746885]
 [1.23334634 -2.23048186 -1.25190639 ... -3.44087744 -1.34803867 -2.59871984]]

# Unnormalized intrinsic reward
[[6.36824608 5.42133617 5.31663656 ... 17.2304916 16.5008812 16.64258]
 [14.4909668 15.7531471 17.2095242 ... 12.7710819 13.4693031 14.7130861]
 [19.154789 19.1849155 19.2228661 ... 16.8683529 17.8014469 17.604372]
 ...
 [12.4912157 12.3132076 12.2927904 ... 19.5716267 19.7414684 18.6168365]
 [14.0771189 14.4516859 12.9586821 ... 24.6045246 24.6045246 24.6045246]
 [13.0537968 10.8395157 10.9614 ... 26.7444553 26.7444553 26.7444553]]

# Normalized extrinsic reward
[[1 1 -1 ... -0.534975827 1 -1]
 [-1 -1 1 ... -1 -1 1]
 [-1 -1 1 ... 1 1 1]
 ...
 [1 1 -1 ... 1 1 1]
 [1 -1 -1 ... -1 -1 -1]
 [1 -1 -1 ... -1 -1 -1]]

# Normalized intrinsic rewards
[[201.381607 171.437683 168.126801 ... 544.875916 521.80365 526.284546]
 [458.244568 498.158203 544.212891 ... 403.857025 425.936737 465.268585]
 [605.727539 606.680237 607.880371 ... 533.424133 562.931152 556.699097]
 ...
 [395.006897 389.377777 388.732147 ... 618.909119 624.279968 588.716]
 [445.157562 457.002411 409.78949 ... 778.063354 778.063354 778.063354]
 [412.797272 342.775543 346.629883 ... 845.733887 845.733887 845.733887]]

from agents.

seungjaeryanlee commented on June 19, 2024

This might be the reason.

from agents.

seungjaeryanlee commented on June 19, 2024

Implemented _init_rnd_normalizer, but it does not seem to fix the issue.

	Average Return	Value Estimation Loss
RND

from agents.

seungjaeryanlee commented on June 19, 2024

With use_td_lambda_return==False,

	Average Return	Value Estimation Loss
RND

	Average Returns	Average Value Prediction
RND

RND fails on LunarLander-v2 about agents HOT 6 CLOSED

Comments (6)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent