ikostrikov / walk_in_the_park Goto Github PK

License: MIT License

Python 100.00%

walk_in_the_park's Introduction

A Walk in the Park

Code to replicate A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning, which contains code for training a simulated or real A1 quadrupedal robot to walk. Project page: https://sites.google.com/berkeley.edu/walk-in-the-park

Installation

Install dependencies:

pip install -r requirements.txt

To install the robot SDK, first install the dependencies in the README.md

To build, run:

cd real/third_party/unitree_legged_sdk
mkdir build
cd build
cmake ..
make

Finally, copy the built robot_interface.XXX.so file to this directory.

Training

Example command to run simulated training:

MUJOCO_GL=egl XLA_PYTHON_CLIENT_PREALLOCATE=false python train_online.py --env_name=A1Run-v0 \
                --utd_ratio=20 \
                --start_training=1000 \
                --max_steps=100000 \
                --config=configs/droq_config.py

To run training on the real robot, add --real_robot=True

walk_in_the_park's People

Contributors

Stargazers

Watchers

walk_in_the_park's Issues

[Question] Actor updates Q function mean vs min

Hi,
in

walk_in_the_park/rl/agents/sac/sac_learner.py

Line 127 in 40321ec

q = qs.mean(axis=0)

you update the actor based on the mean over all Q functions.
In the SB3 implementation of SAC, the minimum over all Q functions is used https://github.com/DLR-RM/stable-baselines3/blob/5ef10c8e69b52e1376e6c2c636737d6dd528dda1/stable_baselines3/sac/sac.py#L265
Was this a design decision or are both methods viable?
Thanks,
Jakob

Trying to reproduce the results but failing, unfortunately.

I have been working for a few weeks trying to adapt your technique to a RL algorithm I've been developing. Nothing fancy, I was already in the process of testing techniques on a simple algorithm trying to do exactly what your paper claims, to speed up training time. I've examined the paper inside and out, and the code and examined REDQ and its code, and the DROQ paper. So I believe that I understand what's going on, but I must be doing something wrong because I only see a marginal increase in performance w/ my algorithm.
Trouble is I see a huge increase in update time and it's leading me to wonder if I've implemented this correctly. In fact I had to push the model updates outside of step to make it feasible to train the model. The model attempts to learn to play the old Nintendo Entertainment System or NES game, bubble bobble. I find it takes about half an hour for it to get marginally better and based on my experiments I actually have no clue how long it takes to become adept @ the game.
Idk if I should've tried emailing this to you, but I have a github repository setup for this if you'd be able to look it over and give me some pointers, I'd really appreciate it. It also explains in far more detail what I've done to make this work @ the level it currently is, which is the best I've been able to make it. If not that's alright, thank you for taking the time to read this, the repo is called, Shikamaru5/LNDQ-bubble_bot and I have made sure to try and include the credit in the repo so that others understand your work is present in it, as well as others.

Trying to reproduce the results but failing: flax.errors.ScopeParamShapeError

I'm trying to execute the code of this project and I get the following error:

kernel = self.param('kernel', flax.errors.ScopeParamShapeError: Inconsistent shapes between value and initializer for parameter "kernel"

Does anyone know how to solve this problem？

Thank you very much.

Question about the paper/implementation

Hello,
thanks for sharing and open sourcing the work.
After a quick read of the paper, I had several questions:

did you do an ablation of UTD? in my experiments, UTD=10 may already be enough (at least with TQC, see below) and one major detail is the policy delay (as done in REDQ or DROPQ)
did you consider using TQC ? (SAC + distributional critic, it may remove the number of multiple critics too and usually yields better resuls than SAC, see https://sb3-contrib.readthedocs.io/en/master/modules/tqc.html#results and https://github.com/SamsungLabs/tqc_pytorch)
are you using a low-pass filter on the real robot? Have you considered not using one as in https://proceedings.mlr.press/v164/raffin22a.html? (also learning directly on real robot: https://www.youtube.com/watch?v=f_FmDFrYkPM)
or how do you ensure you are not breaking the robot by sending high-frequency commands (with larger value in motor damping?)

I have a working implementation of TQC + DropQ using Stable-Baselines3 that I can also share ;) (I can do a PR on request, and it will probably part of SB3 soon)
SB3 branch: https://github.com/DLR-RM/stable-baselines3/tree/feat/dropq
SB3 contrib branch: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/tree/feat/dropq
Training script: https://github.com/araffin/walk_in_the_park/blob/feat/sb3/train_sb3.py

EDIT: SBX = SB3 + Jax is available here: https://github.com/araffin/sbx (with TQC, DroQ and SAC-N)

W&B example run: https://wandb.ai/araffin/a1/runs/2ln32rqx?workspace=user-araffin

Unable to initialize backend

Hello:
There are no problems with the environment configuration process.
But when I run the program, there is a warning, as follows:

I0919 21:58:29.074738 139995721590592 xla_bridge.py:350] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0919 21:58:29.120949 139995721590592 xla_bridge.py:350] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA Host I0919 21:58:29.121373 139995721590592 xla_bridge.py:350] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'

I found that the program cannot call the GPU normally, although tensorflow can call the GPU.

Could you give me some advice?
Thank you!

Ubuntu: 22.04 LTS
NVIDIA Driver Version: 515.65.01
CUDA Version: 11.7
Python: 3.8.13

ikostrikov / walk_in_the_park Goto Github PK

walk_in_the_park's Introduction

A Walk in the Park

Installation

Training

walk_in_the_park's People

Contributors

Stargazers

Watchers

Forkers

walk_in_the_park's Issues

[Question] Actor updates Q function mean vs min

Trying to reproduce the results but failing, unfortunately.

Trying to reproduce the results but failing: flax.errors.ScopeParamShapeError

Question about the paper/implementation

Unable to initialize backend

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent