implicit-q-learning's Introduction

Implicit-Q-Learning (IQL)

PyTorch implementation of the implicit Q-learning algorithm IQL (Paper)

Currently only implemented for online learning. Offline RL version with D4RL will be updated soon.

Run

python train.py

Results

Continuous IQL

Pendulum

Discrete IQL

CartPole

Reference

Original JAX implementation: IQL

Help and issues:

Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.

Author

Sebastian Dittert

Feel free to use this code for your own projects or research.

@misc{IQL,
  author = {Dittert, Sebastian},
  title = {PyTorch Implementation of Implicit-Q-Learning (IQL)},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/BY571/Implicit-Q-Learning}},
}

implicit-q-learning's People

Contributors

Stargazers

Watchers

implicit-q-learning's Issues

Question for runtime

Hi,
Thanks for sharing the implementation code.

I have a question about IQL experimental runtime on PyTorch.
Actually, I tried to re-implement it with tensorflow-keras. But the runtime is quite slow. (on HalfCheetah-medium-v2 with GTX 1080TI)

If you don't mind, could you share the the overall runtime on that environment or computing resource you use?
Thanks in advance.

Sebastian, thank you for this great code. I am trying to run some examples here (starting from offline training of antmaz) however I receive an error about "assert np.isscalar(low) and np.isscalar(high)" from the BOX space which is returned from line 18th of single_precision.py". is there something I may missed ?
thank you

bad result on Antmaze enviornment

It works well on mujoco environments, but not on antmze environment .It did not work even if I changed the parameters according to the paper(expectile=0.9, temperature=10). Can you help me please?

offline training

Hi! Is offline training now fully supported? I am confused because I see the train_offline script but in the README I see that you say that offline training is not implemented. Maybe not with the D4RL dataset, but should it work for any dataset of experiences (s,a,r,s',d)?

Thank you!

Recommend Projects

by571 / implicit-q-learning Goto Github PK