Giter Club home page Giter Club logo

srpo's Introduction

Score Regularized Policy Optimization through Diffusion Behavior

Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu

image info

D4RL experiments

Requirements

Installations of PyTorch, MuJoCo, and D4RL are needed.

Running

Download the pretrained behavior and critic checkpoints from here and store them under ./SRPO_model_factory/.

You can also choose to pretrain the behavior and the critic model yourself. Respectively run

TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_behavior.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed}
TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_critic.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed}

Finally, run

TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_policy.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed} --actor_load_path ./SRPO_model_factory/${TASK}-baseline-seed${seed}/behavior_ckpt200.pth --critic_load_path ./SRPO_model_factory/${TASK}-baseline-seed${seed}/critic_ckpt150.pth

License

MIT

srpo's People

Contributors

chendrag avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

srpo's Issues

A suggestion about README.md And A strange phenomenon

Hi!
I ran this project in Ubuntu20.04 successfully by using TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_behavior.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed} and TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_critic.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed}

Thus,I think your system is Ubuntu.

But when I ran these code in Windows10, some errors have happened.So I have a suggestion about README.md.

If someone want use Windows to run this project,he may use this code $env:TASK="halfcheetah-medium-v2"; $env:seed=0; python -u train_behavior.py --expid $env:TASK-baseline-seed$env:seed --env $env:TASK --seed $env:seed and $env:TASK="halfcheetah-medium-v2"; $env:seed=0; python -u train_critic.py --expid $env:TASK-baseline-seed$env:seed --env $env:TASK --seed $env:seed

Similarly,if he wants to train the policy,just rewrite the code as mentioned above.

Finally,I want to know,when I run this code successfully,the terminal show me that

6a127f430c8abf65936f5302b38a029

and WandB has a figure

ab4d1c45331e06ee6f6031646ae228a

The progress bar for training behavior has consistently remained at 0%, suggesting that the behavior is not being trained. Is this a normal occurrence?

An error maybe about system

Hi!bro:
As a green hand,I want to ask a problem.Could you tell me how to do it?
When I use this code$ TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_behavior.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed},terminal tells me that `$ : 无法将“$”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保路径正确,然后再试一次。
所在位置 行:1 字符: 1
所在位置 行:1 字符: 33

  • $ TASK="halfcheetah-medium-v2"; seed=0; python3 -u train_behavior.py
    • CategoryInfo : ObjectNotFound: (seed=0:String) [], CommandNotFoundException
    • FullyQualifiedErrorId : CommandNotFoundException`

I think that this error may be caused by system.My system is Windows10,and the environment is python3.7.
Your system may be Ubuntu20.04?

Question about policy output

Hi!

I was wondering the following, the Dirac policy only returns actions in the range of -1 and 1, since the activation function in the last layer is the tanh function. Thus, I think for environments that have an action space, that is outside of that range, your method might under perform, or am I missing something?

Maybe it would be convenient to scale the policy output by the action space of the environment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.