Giter Club home page Giter Club logo

Comments (9)

takuseno avatar takuseno commented on May 18, 2024 2

@araffin Thank you for requesting an issue! I'm considering more algorithms based on Advantage-Weighted Critic which is already implemented in d3rlpy (not quite tested yet). So I'll update this issue when they are ready!

from d3rlpy.

takuseno avatar takuseno commented on May 18, 2024 2

@araffin I'm implementing AWAC algorithm on awac branch.
https://github.com/takuseno/d3rlpy/blob/awac/d3rlpy/algos/awac.py

When performance reports are available, I'll share them here.

from d3rlpy.

takuseno avatar takuseno commented on May 18, 2024 2

I've merged awac branch to master now. I've confirmed it somehow works, however, I did not thorough evaluation because it's a bit busy now.

from d3rlpy.

takuseno avatar takuseno commented on May 18, 2024

I was implementing AWAC based on the paper that says they made AWAC based on TD3. However, I realized that they did it based on SAC instead. For now, TD3-based AWAC performs good enough at offline training, but not good enough at finetuning. I'm suspecting that the reason is the exploration issue.

from d3rlpy.

araffin avatar araffin commented on May 18, 2024

Thanks for the update =)

However, I realized that they did it based on SAC instead. For now, TD3-based AWAC performs good enough at offline training, but not good enough at finetuning. I'm suspecting that the reason is the exploration issue.

You mean that what is written in the paper is different from what is implemented?

I also noticed some differences in your implementation. You use random actions to estimate the Value, where as they use the deterministic output of the policy for it.

EDIT: it seems that you updated that part but you are still sampling instead of using the mean/deterministic output

I also contacted the author recently and he told me that "the only notable thing is that we do not output the standard deviation (actually the log variance) of the policy with neural network layers, they are just learned parameters themselves (1 per action dimension). This prevents the variances from overfitting." (so like for PPO vs SAC)

from d3rlpy.

takuseno avatar takuseno commented on May 18, 2024

@araffin Thank you for checking the update! I noticed the logstd parameter days ago. For the latest version, the performance is much better. I'll merge this in this week.

from d3rlpy.

araffin avatar araffin commented on May 18, 2024

@araffin Thank you for checking the update! I noticed the logstd parameter days ago. For the latest version, the performance is much better. I'll merge this in this week.

Good to hear =) (I'm been trying it too and it looks good)

minor remark, you may consider python raw strings for the docstrings:

r""":math:`\alpha/(\sqrt{v} + \epsilon)`"""

this avoids the use of too many backslashes ;)

from d3rlpy.

takuseno avatar takuseno commented on May 18, 2024

Oh, I did not know that! Thank you! I'll use this to remove doubled slashes.

from d3rlpy.

takuseno avatar takuseno commented on May 18, 2024

@araffin Thanks for your advice!

from d3rlpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.