thu-ml / srpo Goto Github PK

View Code? Open in Web Editor NEW

28.0 6.0 0.0 606 KB

Codes accompanying the paper "Score Regularized Policy Optimization through Diffusion Behavior" (ICLR 2024).

License: MIT License

Python 100.00%

diffusion generative offline reinforcement-learning score-based-models behavior-regularization srpo rl d4rl

srpo's Introduction

Score Regularized Policy Optimization through Diffusion Behavior

Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu

D4RL experiments

Requirements

Installations of PyTorch, MuJoCo, and D4RL are needed.

Running

Download the pretrained behavior and critic checkpoints from here and store them under ./SRPO_model_factory/.

You can also choose to pretrain the behavior and the critic model yourself. Respectively run

TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_behavior.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed}

TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_critic.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed}

Finally, run

TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_policy.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed} --actor_load_path ./SRPO_model_factory/${TASK}-baseline-seed${seed}/behavior_ckpt200.pth --critic_load_path ./SRPO_model_factory/${TASK}-baseline-seed${seed}/critic_ckpt150.pth

License

MIT

srpo's People

Contributors

Stargazers

Watchers

srpo's Issues

A suggestion about README.md And A strange phenomenon

Hi!
I ran this project in Ubuntu20.04 successfully by using TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_behavior.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed} and TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_critic.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed}

Thus,I think your system is Ubuntu.

But when I ran these code in Windows10, some errors have happened.So I have a suggestion about README.md.

If someone want use Windows to run this project,he may use this code $env:TASK="halfcheetah-medium-v2"; $env:seed=0; python -u train_behavior.py --expid $env:TASK-baseline-seed$env:seed --env $env:TASK --seed $env:seed and $env:TASK="halfcheetah-medium-v2"; $env:seed=0; python -u train_critic.py --expid $env:TASK-baseline-seed$env:seed --env $env:TASK --seed $env:seed

Similarly,if he wants to train the policy,just rewrite the code as mentioned above.

Finally，I want to know,when I run this code successfully,the terminal show me that

and WandB has a figure

The progress bar for training behavior has consistently remained at 0%, suggesting that the behavior is not being trained. Is this a normal occurrence?

An error maybe about system

Hi!bro:
As a green hand,I want to ask a problem.Could you tell me how to do it?
When I use this code$ TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_behavior.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed},terminal tells me that `$ : 无法将“$”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写，如果包括路径，请确保路径正确，然后再试一次。
所在位置行:1 字符: 1
所在位置行:1 字符: 33

$ TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_behavior.py
- CategoryInfo : ObjectNotFound: (seed=0:String) [], CommandNotFoundException
- FullyQualifiedErrorId : CommandNotFoundException`

I think that this error may be caused by system.My system is Windows10,and the environment is python3.7.
Your system may be Ubuntu20.04?

Question about policy output

Hi!

I was wondering the following, the Dirac policy only returns actions in the range of -1 and 1, since the activation function in the last layer is the tanh function. Thus, I think for environments that have an action space, that is outside of that range, your method might under perform, or am I missing something?

Maybe it would be convenient to scale the policy output by the action space of the environment.

thu-ml / srpo Goto Github PK

srpo's Introduction

Score Regularized Policy Optimization through Diffusion Behavior

D4RL experiments

Requirements

Running

License

srpo's People

Contributors

Stargazers

Watchers

srpo's Issues

A suggestion about README.md And A strange phenomenon

An error maybe about system

Question about policy output

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent