Giter Club home page Giter Club logo

jax-rl's People

Contributors

quangr avatar

Watchers

 avatar

jax-rl's Issues

Add recurrent neural network

I guess there are no standard implement of lstm version ppo. First we should focus on the training implement
implement of cleanrl :
just save initial_lstm_state, and burn in with prefix data in buffer

The impact of random shuffle

Should write a report whether random shuffle help improve the performance, some researchers believes that shuffle buffer data will lead to less covariance, which will lead to better gradient approximation (and help avoid catastrophic forgetting?)

if comment out jax.random.permutation(subkey, x) in HalfCheetah-v3 env we will get
Nan, Inf or huge value in CTRL at ACTUATOR 0. The simulation is unstable. Time = 1.1500

reproduce ppo benchmark

I can't find a way to make ppo compariable to tianshou benchmark, especially in half-cheetah env, where we can't acheive half of score..

Benchmark:

Tianshou: Hopper-v3: 2609.3+-700.8 Half-Cheetah-v3: 5783.9+-1244.0

My: Hopper-v3:1683+-307 Half-Cheetah-v3: 1926+-254

Where goes wrong?

So far I have test following assumption

  • Did the done step being treated right?

Result: add masking in ppo step and make using the value bootstrap not improve much

  • Did the envpool version matter?

Result: change different version won't help.

  • Did the learning decay not working right? since the tianshou using 3m step, so the learning rate will only decay to 2/3 in 1m step.

Result: Setting learning at a constant result or setting total step to 3m not improve much.

  • Did the action remap correct?

Result: copy remap method from tianshou, still not work

  • Did the grad step correct?

Result: When Use exact data from tianshou, the loss produced by them are same.

  • Did the rollout phase have problem? Did the random phase random enough?

Result: Don't know how to test this.

It turns out that we need a observation normalizer

Add env wrapper

The problem for now:

  • Maybe should make wrappers' state frozen
  • envpool seem not consider handler change after reset , so now we will have two return handle by xla() and reset()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.