Comments (9)
@araffin Thank you for requesting an issue! I'm considering more algorithms based on Advantage-Weighted Critic which is already implemented in d3rlpy (not quite tested yet). So I'll update this issue when they are ready!
from d3rlpy.
@araffin I'm implementing AWAC algorithm on awac branch.
https://github.com/takuseno/d3rlpy/blob/awac/d3rlpy/algos/awac.py
When performance reports are available, I'll share them here.
from d3rlpy.
I've merged awac branch to master now. I've confirmed it somehow works, however, I did not thorough evaluation because it's a bit busy now.
from d3rlpy.
I was implementing AWAC based on the paper that says they made AWAC based on TD3. However, I realized that they did it based on SAC instead. For now, TD3-based AWAC performs good enough at offline training, but not good enough at finetuning. I'm suspecting that the reason is the exploration issue.
from d3rlpy.
Thanks for the update =)
However, I realized that they did it based on SAC instead. For now, TD3-based AWAC performs good enough at offline training, but not good enough at finetuning. I'm suspecting that the reason is the exploration issue.
You mean that what is written in the paper is different from what is implemented?
I also noticed some differences in your implementation. You use random actions to estimate the Value, where as they use the deterministic output of the policy for it.
EDIT: it seems that you updated that part but you are still sampling instead of using the mean/deterministic output
I also contacted the author recently and he told me that "the only notable thing is that we do not output the standard deviation (actually the log variance) of the policy with neural network layers, they are just learned parameters themselves (1 per action dimension). This prevents the variances from overfitting." (so like for PPO vs SAC)
from d3rlpy.
@araffin Thank you for checking the update! I noticed the logstd parameter days ago. For the latest version, the performance is much better. I'll merge this in this week.
from d3rlpy.
@araffin Thank you for checking the update! I noticed the logstd parameter days ago. For the latest version, the performance is much better. I'll merge this in this week.
Good to hear =) (I'm been trying it too and it looks good)
minor remark, you may consider python raw strings for the docstrings:
r""":math:`\alpha/(\sqrt{v} + \epsilon)`"""
this avoids the use of too many backslashes ;)
from d3rlpy.
Oh, I did not know that! Thank you! I'll use this to remove doubled slashes.
from d3rlpy.
@araffin Thanks for your advice!
from d3rlpy.
Related Issues (20)
- [QUESTION] Using n_frames for Tabular data HOT 2
- How can I accelerate my training [HELP] HOT 2
- [BUG] Question / Possible bug - observation data changing when loaded into TransitionMiniBatch() HOT 9
- [BUG] CQL Crashing HOT 3
- [Question] Question about log parameters HOT 2
- [BUG] current overwriting of transitions in the buffer causes problems in to_mdp_dataset() HOT 6
- [QUESTION] Early stopping HOT 2
- [REQUEST] Vectorized / multiple env support HOT 2
- [BUG] Error in fitted q evaluation example HOT 3
- [REQUEST] How can I use Transformer to be a Encoder? HOT 3
- [QUESTION] What's meaning about discounted_sum_of_advantage_scorer? HOT 3
- [REQUEST] Adding Cal-QL HOT 2
- [Question] - Increasing model capacity HOT 2
- Models' Logits return, HOT 2
- [Question] How to append online transitions with pre-existing d4rl buffer in finetuning training? HOT 10
- [REQUEST] For next release, add the option to specify the ensemble reduction method as parameter of QFunc Factory
- Discrete version of MOPO / COMBO
- [REQUEST] Multi dimensional action space HOT 9
- TD3PlusBC.predict fails HOT 2
- [REQUEST] There is some typo in the math description of IQL algorithm HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from d3rlpy.