Comments (8)
A bit hard to figure out based on just this. What did you set as the target entropy? An alpha of 5.56742e+08
seems rather large.
from rlkit.
You are right, that is not an underflow... that is divergence (its a +)...
Didn't set it up explicitly so that would be: self.target_entropy = -np.prod((1,)).item()
from rlkit.
Are you using discrete actions? That heuristic wouldn't work in that case
from rlkit.
It behaves like discrete yes. I didn't change the policy to account for optimizing for Softmax because I was not able to figure out how to derive the temperature based sampling/exploration from the equations. Everybody sais it is easy, but no one shows how to do so :D (ex. openai/spinningup#22 )
So I hacked it instead, making the environment to understand the continuous actions as discrete signals. It was bound to have some 'side-effect'. And now that you noticed it is actually diverging it makes sense. For a typical 3 states (softmax(3)
), what target entropy would you suggest to try?
from rlkit.
For discrete actions, you should choose a positive number that's less than log(# of actions).
To compute the entropy, you should look up the definition of entropy. For discrete actions it's sum of p log(p).
from rlkit.
First time with an entropy-based algorithm, so clueless on that... Any accessible writeup that you know would be great to read.
Would you go with close to log(3)
(~0.477) or closer to zero instead?
from rlkit.
OK, now that I changed the entropy to 0.35 I dont see the divergent behavior, but what I do see is a collapse on the deterministic policy results. The strange fact is that if I restart the process loading the last policy I get behaviors similar to those found in the exploration phase of the epoch. Sounds like a bug in the evaluation part, could that be?
from rlkit.
The evaluation code and exploration code are the same. They just use the DataCollector
. Note that if you're loading up the policy, you might be loading the evaluation policy, which is deterministic. It sounds like the SAC loss issue has been resolved, so I'm closing this issue.
from rlkit.
Related Issues (20)
- There is no "CustomMDPPathCollector" anywhere... HOT 1
- rlkit/torch/networks/stochastic not installed HOT 1
- unable to create the conda environment with linux-cpu-env.yml HOT 2
- Issue SMAC algorithm HOT 4
- multi-GPU optimised implementations for running algorithms HOT 1
- Doubt on Q-function loss in AWAC HOT 1
- Question about VAEPolicy in rlkit.torch.sac.policies HOT 2
- CustomMDPPathCollector is not found HOT 2
- Doubt on advantage calculation to update the policy on AWAC.
- Position Control with mujoco-py
- Cannot reproduce the results of IQL on antmaze HOT 1
- High Memory & Disk Requirement for SMAC HOT 1
- Skew-fit gaussian_identity_variance
- AWAC doesn't profit from offline data HOT 4
- IQL: make checkpoints public
- Could someone provide right environment installation procedure? HOT 4
- Python3.5 is not suitable for this project! HOT 1
- Why I could not see result file๏ผ
- SAC log_alpha different from paper HOT 1
- IQL results different with the paper HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rlkit.