Giter Club home page Giter Club logo

implicit-q-learning's Introduction

Implicit-Q-Learning (IQL)

PyTorch implementation of the implicit Q-learning algorithm IQL (Paper)

Currently only implemented for online learning. Offline RL version with D4RL will be updated soon.

Run

python train.py

Results

Continuous IQL

Pendulum

alt-text

Discrete IQL

CartPole

alt-text

Reference

Original JAX implementation: IQL

Help and issues:

Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.

Author

  • Sebastian Dittert

Feel free to use this code for your own projects or research.

@misc{IQL,
  author = {Dittert, Sebastian},
  title = {PyTorch Implementation of Implicit-Q-Learning (IQL)},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/BY571/Implicit-Q-Learning}},
}

implicit-q-learning's People

Contributors

by571 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

implicit-q-learning's Issues

Question for runtime

Hi,
Thanks for sharing the implementation code.

I have a question about IQL experimental runtime on PyTorch.
Actually, I tried to re-implement it with tensorflow-keras. But the runtime is quite slow. (on HalfCheetah-medium-v2 with GTX 1080TI)

If you don't mind, could you share the the overall runtime on that environment or computing resource you use?
Thanks in advance.

scalar observation

Sebastian, thank you for this great code. I am trying to run some examples here (starting from offline training of antmaz) however I receive an error about "assert np.isscalar(low) and np.isscalar(high)" from the BOX space which is returned from line 18th of single_precision.py". is there something I may missed ?
thank you

bad result on Antmaze enviornment

It works well on mujoco environments, but not on antmze environment .It did not work even if I changed the parameters according to the paper(expectile=0.9, temperature=10). Can you help me please?

offline training

Hi! Is offline training now fully supported? I am confused because I see the train_offline script but in the README I see that you say that offline training is not implemented. Maybe not with the D4RL dataset, but should it work for any dataset of experiences (s,a,r,s',d)?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.