Giter Club home page Giter Club logo

Comments (6)

seungjaeryanlee avatar seungjaeryanlee commented on June 19, 2024

When calculating value_estimation_loss, the returns are way too large:

# With RND: [7101.44531 7166.82178 7235.9126 ... 2192.41016 1505.95093 776.063354]
# Without RND: [-8.60589886 -8.08707809 -9.66196251 ... -2.82508373 -1.94054389 -1.00004435]
tf.print(returns)
# With RND: [1.00574807e-05 1.06366251e-05 1.37969992e-05 ... 1.24972794e-05 9.22966865e-06 1.47654901e-05]
# Without RND: [-3.9881561e-05 -4.0003295e-05 -3.92220318e-05 ... -3.9767292e-05 -4.233031e-05 -4.445585e-05]
tf.print(value_preds)

from agents.

seungjaeryanlee avatar seungjaeryanlee commented on June 19, 2024

In compute_return_and_advantage, the normalized intrinsic rewards are too large compared to extrinsic rewards:

# Unnnormalized extrinsic reward
[2.4655962 1.55493808 -0.12013232 ... -0.0169174224 0.165178448 -0.241332144]
 [-2.2789948 -1.25498295 0.27360487 ... -1.38796663 -1.61660671 4.28619528]
 [-3.87851095 -0.429568112 0.417723715 ... 0.347376496 4.02928114 0.372619241]
 ...
 [1.25972 1.44503868 -1.37598336 ... 0.0951891318 0.978246629 1.19230056]
 [0.293478042 -0.386281192 -0.469306529 ... -2.24820375 -0.359491259 -1.34746885]
 [1.23334634 -2.23048186 -1.25190639 ... -3.44087744 -1.34803867 -2.59871984]]

# Unnormalized intrinsic reward
[[6.36824608 5.42133617 5.31663656 ... 17.2304916 16.5008812 16.64258]
 [14.4909668 15.7531471 17.2095242 ... 12.7710819 13.4693031 14.7130861]
 [19.154789 19.1849155 19.2228661 ... 16.8683529 17.8014469 17.604372]
 ...
 [12.4912157 12.3132076 12.2927904 ... 19.5716267 19.7414684 18.6168365]
 [14.0771189 14.4516859 12.9586821 ... 24.6045246 24.6045246 24.6045246]
 [13.0537968 10.8395157 10.9614 ... 26.7444553 26.7444553 26.7444553]]

# Normalized extrinsic reward
[[1 1 -1 ... -0.534975827 1 -1]
 [-1 -1 1 ... -1 -1 1]
 [-1 -1 1 ... 1 1 1]
 ...
 [1 1 -1 ... 1 1 1]
 [1 -1 -1 ... -1 -1 -1]
 [1 -1 -1 ... -1 -1 -1]]

# Normalized intrinsic rewards
[[201.381607 171.437683 168.126801 ... 544.875916 521.80365 526.284546]
 [458.244568 498.158203 544.212891 ... 403.857025 425.936737 465.268585]
 [605.727539 606.680237 607.880371 ... 533.424133 562.931152 556.699097]
 ...
 [395.006897 389.377777 388.732147 ... 618.909119 624.279968 588.716]
 [445.157562 457.002411 409.78949 ... 778.063354 778.063354 778.063354]
 [412.797272 342.775543 346.629883 ... 845.733887 845.733887 845.733887]]

from agents.

seungjaeryanlee avatar seungjaeryanlee commented on June 19, 2024

This might be the reason.

image

from agents.

seungjaeryanlee avatar seungjaeryanlee commented on June 19, 2024

Implemented _init_rnd_normalizer, but it does not seem to fix the issue.

Average Return Value Estimation Loss
RND image image

from agents.

seungjaeryanlee avatar seungjaeryanlee commented on June 19, 2024

With use_td_lambda_return==False,

Average Return Value Estimation Loss
RND image image
Average Returns Average Value Prediction
RND image image

Similar results when use_gae==False.

from agents.

seungjaeryanlee avatar seungjaeryanlee commented on June 19, 2024

Observation normalization was the issue. Reverted to using streaming normalizer for both PPO and RND for now, but the situation is more complex than I had expected.

from agents.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.