Critic As a Lyapunov Function: A Reinforcement Learning approach with Guaranteed Environment Stability
This repository contains the code for reproducing the experiments for the paper "Critic As a Lyapunov Function: A Reinforcement Learning approach with Guaranteed Environment Stability".
Given code was developed and tested with Python version 3.9.16 on Ubuntu 20/22, we strongly advise to perform all the experiments with this specified python version.
It is reasonable to run experiments in virtual environment. Our core team uses pyenv for managing the virtual environments. We provide brief guide here how to install it. But we strongly recommend to refer to original readme for details. However, there is another way to create virtual environment. You can use either you want.
The tutorial is the summary of original pyenv readme and works on Ubuntu.
- Install dependiencies
sudo apt install build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
-
Run
curl https://pyenv.run | bash
-
Execute the following command if you use zsh shell.
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc
exec zsh
Or if you have simple bash terminal execute
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
We stronly recommend to refer to pyenv readme for details.
-
Restart your terminal
-
cd
to the root of the repo -
Run
pyenv install 3.9.16
to install the specific python version. -
Run
pyenv virtualenv 3.9.16 env-name-you-want
-
Run
pyenv local env-name-you-want
If you don't have python3.9-venv please install it via
sudo apt install python3.9-venv
Then create the environment via
python3.9 -m venv env
source env/bin/activate
In the root of repo before running experiments set environment variables
pip install -r requirements.txt --no-cache-dir
cd playground
Below we present how to run algorithms in manuscript on all the environmens.
For every run the code generates the specific folder, where all the
run artifacts are stored (observations, total objectives, etc.).
This folder will be generated in playground/multirun
.
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=2tank +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=3wrobot_ni +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=inv_pendulum +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=kin_point +seed=6
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=cartpole +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=sarsa system=lunar_lander +seed=5
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=2tank +seed=4
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=3wrobot_ni +seed=11
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=inv_pendulum +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=kin_point +seed=6
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=cartpole +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=dqn system=lunar_lander +seed=3
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=acpg system=2tank +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=acpg system=3wrobot_ni scenario=episodic_reinforce +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=acpg system=inv_pendulum scenario=episodic_reinforce +seed=4
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py scenario=episodic_reinforce controller/actor/model=acpg_kin_point_elem_wise controller=acpg system=kin_point +seed=2
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py scenario=episodic_reinforce controller=acpg system=cartpole +seed=4
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=acpg system=lunar_lander scenario=episodic_reinforce +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=2tank scenario=episodic_reinforce +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py scenario=episodic_reinforce controller=ddpg system=3wrobot_ni +seed=18
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=inv_pendulum scenario=episodic_reinforce +seed=3
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=kin_point scenario=episodic_reinforce +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=cartpole scenario=episodic_reinforce +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=ddpg system=lunar_lander scenario=episodic_reinforce +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=2tank +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=3wrobot_ni +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=kin_point +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=lunar_lander +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=cartpole +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py controller=rpo system=inv_pendulum +seed=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=calf_ex_post initial_conditions=ic_kin_point_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=calf_ex_post initial_conditions=ic_inv_pendulum_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=calf_ex_post initial_conditions=ic_3wrobot_ni_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=calf_ex_post initial_conditions=ic_2tank_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-1 python preset_cartpole.py controller=calf_ex_post system=cartpole +controller.safe_only=True +seed=1 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=calf_ex_post initial_conditions=ic_lunar_lander_stochastic +controller.safe_only=True +seed=1 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=calf_ex_post initial_conditions=ic_2tank_stochastic +seed=14 controller.actor.predictor.prediction_horizon=0
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=calf_predictive initial_conditions=ic_3wrobot_ni_stochastic +seed=1 controller.actor.predictor.prediction_horizon=0
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=calf_ex_post initial_conditions=ic_inv_pendulum_stochastic +seed=2 controller.actor.predictor.prediction_horizon=0
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=calf_predictive initial_conditions=ic_kin_point_stochastic +seed=1 controller.actor.predictor.prediction_horizon=0
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=calf_ex_post initial_conditions=ic_lunar_lander_stochastic +seed=1 controller.actor.predictor.prediction_horizon=0
PYTHONPATH=$(pwd)/src-1 python preset_cartpole.py system=cartpole controller=calf_ex_post
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=cartpole controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=2 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=cartpole controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=5 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=2tank controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=3wrobot_ni controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=inv_pendulum controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=kin_point controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=lunar_lander controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1
PYTHONPATH=$(pwd)/src-2 python preset_endpoint.py system=cartpole controller=mpc +seed=1 controller.actor.predictor.prediction_horizon=8 scenario.N_episodes=1