Giter Club home page Giter Club logo

tnnls-calf's Introduction

Critic As a Lyapunov Function: A Reinforcement Learning approach with Guaranteed Environment Stability

This repository contains the code for reproducing the experiments for the paper "Critic As a Lyapunov Function: A Reinforcement Learning approach with Guaranteed Environment Stability".

Table of contents

Setting the environment

Given code was developed and tested with Python version 3.9.16 on Ubuntu 20/22, we strongly advise to perform all the experiments with this specified python version.

It is reasonable to run experiments in virtual environment. Our core team uses pyenv for managing the virtual environments. We provide brief guide here how to install it. But we strongly recommend to refer to original readme for details. However, there is another way to create virtual environment. You can use either you want.

Installing pyenv

The tutorial is the summary of original pyenv readme and works on Ubuntu.

  1. Install dependiencies
sudo apt install build-essential libssl-dev zlib1g-dev \
	libbz2-dev libreadline-dev libsqlite3-dev curl \
	libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
  1. Run curl https://pyenv.run | bash

  2. Execute the following command if you use zsh shell.

  echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
  echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
  echo 'eval "$(pyenv init -)"' >> ~/.zshrc
  exec zsh

Or if you have simple bash terminal execute

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

We stronly recommend to refer to pyenv readme for details.

  1. Restart your terminal

  2. cd to the root of the repo

  3. Run pyenv install 3.9.16 to install the specific python version.

  4. Run pyenv virtualenv 3.9.16 env-name-you-want

  5. Run pyenv local env-name-you-want

Manual creation of virtual environments

If you don't have python3.9-venv please install it via

    sudo apt install python3.9-venv

Then create the environment via

    python3.9 -m venv env
    source env/bin/activate

Run experiments

In the root of repo before running experiments set environment variables

pip install -r requirements.txt --no-cache-dir 
cd playground

Below we present how to run algorithms in manuscript on all the environmens. For every run the code generates the specific folder, where all the run artifacts are stored (observations, total objectives, etc.). This folder will be generated in playground/multirun.

SARSA

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=2tank +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=3wrobot_ni +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=inv_pendulum +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=kin_point +seed=6

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=cartpole +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=lunar_lander +seed=5

DQN

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=2tank +seed=4

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=3wrobot_ni +seed=11

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=inv_pendulum +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=kin_point +seed=6

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=cartpole +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=lunar_lander +seed=3

SDPG

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=acpg system=2tank +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=acpg system=3wrobot_ni scenario=episodic_reinforce +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=acpg system=inv_pendulum scenario=episodic_reinforce +seed=4

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py scenario=episodic_reinforce controller/actor/model=acpg_kin_point_elem_wise controller=acpg system=kin_point +seed=2

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py scenario=episodic_reinforce controller=acpg system=cartpole +seed=4

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=acpg system=lunar_lander scenario=episodic_reinforce +seed=1

DDPG

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=2tank scenario=episodic_reinforce +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py scenario=episodic_reinforce controller=ddpg system=3wrobot_ni +seed=18

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=inv_pendulum scenario=episodic_reinforce +seed=3

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=kin_point scenario=episodic_reinforce +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=cartpole scenario=episodic_reinforce +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=lunar_lander scenario=episodic_reinforce +seed=1

RPO

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=2tank +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=3wrobot_ni +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=kin_point +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=lunar_lander +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=cartpole +seed=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=inv_pendulum +seed=1

CALF Stabilizing Policy

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=calf_ex_post initial_conditions=ic_kin_point_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=calf_ex_post initial_conditions=ic_inv_pendulum_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=calf_ex_post initial_conditions=ic_3wrobot_ni_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=calf_ex_post initial_conditions=ic_2tank_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-1 python preset_cartpole.py controller=calf_ex_post system=cartpole +controller.safe_only=True +seed=1 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=calf_ex_post initial_conditions=ic_lunar_lander_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1

CALF

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=calf_ex_post initial_conditions=ic_2tank_stochastic +seed=14 controller.actor.predictor.prediction_horizon=0

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=calf_predictive initial_conditions=ic_3wrobot_ni_stochastic +seed=1 controller.actor.predictor.prediction_horizon=0

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=calf_ex_post initial_conditions=ic_inv_pendulum_stochastic +seed=2 controller.actor.predictor.prediction_horizon=0

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=calf_predictive initial_conditions=ic_kin_point_stochastic +seed=1 controller.actor.predictor.prediction_horizon=0 

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=calf_ex_post initial_conditions=ic_lunar_lander_stochastic +seed=1 controller.actor.predictor.prediction_horizon=0

PYTHONPATH=$(pwd)/src-1 python preset_cartpole.py system=cartpole controller=calf_ex_post 

MPC with horizon 2

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=cartpole controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1

MPC with horizon 5

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=cartpole controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1

MPC with horizon 8

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1

PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=cartpole controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1

tnnls-calf's People

Contributors

bolychevanton avatar yaremenko8 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.