Giter Club home page Giter Club logo

nfwpo_final_code's Introduction

Revisiting Action-Constrained RL via Frank-Wolfe

This repo contains code accompaning the paper, Escaping from Zero Gradient: Revisiting Action-Constrained ReinforcementLearning via Frank-Wolfe Optimization (UAI 2021). It includes code for running the NFWPO algorithm presented in the paper, and other baseline methods such as DDPG+OptLayer, DDPG+Projection, DDPG+Reward Shaping, SAC+Projection, PPO+Projection, TRPO+Projection, FOCOPS.

Dependencies

This code requires the following:

Usage

To run the code, enter the directory of the corresponding environment, and run the following command: (Change ALGORITHM_NAME to the corresponding algorithm, which includes NFWPO, DDPG_Projection, DDPG_RewardShaping, DDPG_OptLayer, SAC_Projection)

python3 [ALGORITHM_NAME].py

(To run other baselines such as PPO+Projection, TRPO+Projection, and FOCOPS, please refer to the description below.)

Following are the examples for running the experiments in Ubuntu.

BIKE SHARING SYSTEMS

To run the experiments metioned in Secion 4.1, please follow the instructions below:

A. Evaluating FWPO with tabular parameterization:

  1. Enter the directory BSS-3:
cd BSS-3
  1. Set the random seed arg_seed between 0-4 in Line 25 of NFWPO.py.
  2. Use the following command to train NFWPO:
python3 NFWPO.py
  1. To run other baseline methods, set the random seed arg_seed between 0-4 in DDPG_Projection.py, DDPG_RewardShaping.py, and run the corresponding command:
python3 DDPG_Projection.py
python3 DDPG_RewardShaping.py
  1. The result is shown in Figure 1.

B. Evaluating NFWPO:

  1. Enter the directory BSS-5:
cd BSS-5
  1. Set the random seed arg_seed between 0-4 in NFWPO.py.
  2. Use the following command to train NFWPO:
python3 NFWPO.py
  1. To run other baseline methods, set the random seed arg_seed between 0-4 in DDPG_Projection.py, DDPG_RewardShaping.py, and set random_seed in DDPG_OptLayer. Then run the corresponding command:
python3 DDPG_Projection.py
python3 DDPG_RewardShaping.py
python3 DDPG_OptLayer.py
  1. The result is shown in Figure 2.

UTILITY MAXIMIZATION OFCOMMUNICATION NETWORKS

  1. Enter the directory NSFnet/src/gym:
cd NSFnet/src/gym
  1. Set the random seed arg_seed between 0-4 in NFWPO.py.
  2. Use the following command to train NFWPO:
python3 NFWPO.py
  1. To run other baseline methods, set the random seed arg_seed between 0-4 in DDPG_Projection.py, DDPG_RewardShaping.py, and set random_seed in DDPG_OptLayer. Then run the corresponding command:
python3 DDPG_Projection.py
python3 DDPG_RewardShaping.py
python3 DDPG_OptLayer.py
  1. The result is shown in Figure 3.

MUJOCO CONTINUOUS CONTROL TASKS

To run the experiments metioned in Secion 4.3, please first enter the directory Reacher for Reacher with nonlinear constraints, and enter Halfcheetah-State for Halfcheetah with state-dependent constraints.

cd Reacher
cd Halfcheetah-State

To run NFWPO, DDPG+Projection, DDPG+Reward Shaping, DDPG+OptLayer, please refer to the description in the previous experiment.

To run SAC+Projection, use the following command:

python3 SAC+Projection

To run TRPO+Projection, PPO+Projection:

# For Halfcheetah-state task
python3 PPO_TRPO_Projection/PPO_Projection_Halfcheetah_State_Relate_gym.py 
python3 PPO_TRPO_Projection/TRPO_Projection_Halfcheetah_State_Relate_gym.py

# For Reacher task
python3 PPO_TRPO_Projection/PPO_Projection_Reacher_State_Relate_gym.py 
python3 PPO_TRPO_Projection/TRPO_Projection_Reahcer_State_Relate_gym.py

To run FOCOPS:

# For Halfcheetah-state task
python3 FOCOPS/focops_main_cheetah.py

# For Reacher task
python3 python3 FOCOPS/focops_main_reacher.py

The result is shown in Figure 4-5.

ADDITIONAL EXPERIMENT

To run the experiments metioned in Appendix D.3, please first enter the directory Halfcheetah-CAPG.

cd Halfcheetah-CAPG

Then run the following commands for corresponding baselines:

# CAPG+PPO
python3 CAPG_PPO_Halfcheetah_bound_constraints.py

# CAPG+TRPO
python3 CAPG_TRPO_Halfcheetah_bound_constraints.py

The result is shown in Figure 6.

nfwpo_final_code's People

Contributors

upupsheep avatar redway1225 avatar

Stargazers

Yuchen Wu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.