Giter Club home page Giter Club logo

offlinerl-kit's Introduction

offlinerl-kit's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

offlinerl-kit's Issues

COMBO code

Hello! Why don't you compute conservative loss on the real_batch in line 163 of policy/model-based/combo.py?

[Question] About weight decay in MOPO implementation

Hi @yihaosun1124,
Thank you for your amazing works.

I'm reproducing MOPO results in D4RL tasks using your code. I see that you use the weight decay and L2 regularization for learning the ensemble transition functions. I just wonder about the reason for this implementation, and could we remove the decay loss (l2 regularization) or it's mandatory? Could you please help me about this question?

Again, thank you so much for your repository.
Best,
Linh

Conservative Loss Calculations at combo.py Line 181 and cql.py Line 150

During conservative loss calculations, a potential error has been identified in the following code snippets:

  • combo.py at Line 181
  • cql.py at Line 150:
next_obs_pi_value1, next_obs_pi_value2 = self.calc_pi_values(tmp_next_obss, tmp_obss)

Proposed Correction:

next_obs_pi_value1, next_obs_pi_value2 = self.calc_pi_values(tmp_next_obss, tmp_next_obss)

I might be wrong or misinterpret this as well. Could someone confirm if this change is necessary for accurate implementation?

MOReL reproduction

Thanks for your elegant code. Do you have a plan for MOReL reproduction?

Failed to run plotter.py

Hi @yihaosun1124
Thanks for sharing your great work!
I failed to run python run_example/plotter.py --algos "rambo" --task "hopper-medium-replay-v2", after following your setup instruction.
The outputs are:

Traceback (most recent call last):
  File "run_example/plotter.py", line 173, in <module>
    csv_file = merge_csv(path, args.query_file, args.query_x, args.query_y)
  File "run_example/plotter.py", line 42, in merge_csv
    assert len(results) > 0
AssertionError

After reading plotter.py, I'm not sure what this assert is for. Would you check from your side?

Best,
Levi

Log sharing

Hi,

Thanks for contributing this great library!

Would you mind sharing some of the logs for users to get a sense of how large different loss values are supposed to be and compare with our reproduced results? In particular, could you share some log files for MOPO and RAMBO, especially for the Hopper environment?

Thanks again!

Dependency version control in setup.py

An elegant library! Currently setup.py lists all the dependencies without version limits, and some of them (e.g. ray) update frequently. It would be great if the tested setup versions (including Python version) can be added.

Best

Environment requirements

I can't seem to get environment for the code working despite running 'python setup.py install'. Which python version are you using? Are you using mujoco or mujoco_py? Can we have a full requirements.txt?

Thank you

some questions about rambo.py

Thanks guys for providing such an elegant and well performed coda base. I have one question about Line 196-199 of rambo.py.

    all_loss = self._adv_weight * adv_loss + sl_loss
    self._dynmics_adv_optim.zero_grad()
    all_loss.backward()
    self._dynmics_adv_optim.step()

I guess adv_loss is the Model Gradient (as defined in the paper) from its computations rule, but it's then added as an adversary loss to the overall loss. So why is it?

I'm looking forward to your replies. Wish you guys all the best.

Reproducing COMBO

Hi,
I noticed that your logs and tables are missing the COMBO results. Were you able to reproduce the results in the paper?

Thank you very much for your helpful codebase,
Kaustubh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.