quangr / jax-rl Goto Github PK
View Code? Open in Web Editor NEWjax version of ppo algorithm in mujoco enviroment, achieve SOTA(tianshou)
jax version of ppo algorithm in mujoco enviroment, achieve SOTA(tianshou)
I guess there are no standard implement of lstm version ppo. First we should focus on the training implement
implement of cleanrl :
just save initial_lstm_state, and burn in with prefix data in buffer
Should write a report whether random shuffle help improve the performance, some researchers believes that shuffle buffer data will lead to less covariance, which will lead to better gradient approximation (and help avoid catastrophic forgetting?)
if comment out jax.random.permutation(subkey, x)
in HalfCheetah-v3 env we will get
Nan, Inf or huge value in CTRL at ACTUATOR 0. The simulation is unstable. Time = 1.1500
I can't find a way to make ppo compariable to tianshou benchmark, especially in half-cheetah env, where we can't acheive half of score..
Benchmark:
Tianshou: Hopper-v3: 2609.3+-700.8 Half-Cheetah-v3: 5783.9+-1244.0
My: Hopper-v3:1683+-307 Half-Cheetah-v3: 1926+-254
Where goes wrong?
So far I have test following assumption
Result: add masking in ppo step and make using the value bootstrap not improve much
Result: change different version won't help.
Result: Setting learning at a constant result or setting total step to 3m not improve much.
Result: copy remap method from tianshou, still not work
Result: When Use exact data from tianshou, the loss produced by them are same.
Result: Don't know how to test this.
It turns out that we need a observation normalizer
The problem for now:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.