AMMI-RL

RL Implementation for Continuous Control

This project was initiated in the RL course of Fall 2021 at The African Master's in Machine Intelligence (AMMI) as a course-project where we implemented the SAC algorithm (Haarnoja et al.) for continuous control tasks. It is now an open project where we care to design code bases and benchmarks for RL algorithms in order to ease the development of new algorithms. We are designing this repo based on existing repositories as well as original papers to produce better general implementations for a selected set of algorithms.

Algorithms

Algorithms we are re-implementing/plannning to re-implement:

Algorithms	Model	Value	On Policy	MPC	Progress	Reference
VPG	False	V(GAE)	True	False	🟢	Sutton et al., 1999
NPG	False	V(GAE)	True	False	🔴	Kakade, 2001
PPO	False	V(GAE)	True	False	🟢	Schulman et al., 2017
SAC	False	2xQ	False	False	🟢	Haarnoja et al., 2018
PETS	True	None	None	True	🔴	Chua et al., 2018
MB-PPO	True	V(GAE)	True	False	🟢	Similar~Rajeswaran et al., 2020
MB-SAC	True	2xQ	False	False	🟢	Janner et al., 2019
MOVOQ	True	N/A	N/A	N/A	🟡	N/A
MoPAC	True	2xQ	False	True	🟣	Morgan et al., 2021
MPC-SAC	True	V(GAE)/2xQ	False	True	🔴	Omer et al., 2021

🟢 Done || 🟡 Now || 🟣 Next || 🔴 No plan

Generalized Network Hyperparameters

We aim to finetune our implementations to work with a generalized set of hyperparametrs across different algorithms. We are working with the following hyperparameters in the mean time:

☑️	Network	Arch	Act	LRate	MFOV	MFOQ	MBOV	MBOQ
	Policy	[2x128]	Tanh	3e-4	🟩	🟨	🟩	🟥
	Policy	[2x256]	ReLU	3e-4	🟥	🟩	⬜️	🟩
✅	Policy	[2x256]	PReLU	3e-4	🟩	🟩	🟩	🟦
	V	[2x128]	Tanh	1e-3	🟩	⬜️	🟩	⬜️
	V	[2x128]	PReLU	1e-3	🟩	⬜️	🟨	⬜️
✅	V	[2x256]	PReLU	3e-4	🟩	⬜️	🟩	⬜️
	Q	[2x256]	ReLU	3e-4	⬜️	🟩	⬜️	🟩
✅	Q	[2x256]	PReLU	3e-4	⬜️	🟩	⬜️	🟦
✅	V-Model	[2x512]	ReLU	1e-3	⬜️	⬜️	🟩	🟥
✅	Q-Model	[4x200]	Swish	3e-4	⬜️	⬜️	🟥	🟩

🟩 Best || 🟨 Good || 🟥 Bad || 🟦 In progress

Experiments and Results

In thoe following we evaluate our code on the following environments. Download gifs from this Google drive folder at drive. Results are averaged across 3 random seeds, and smoothed with 0.75 Exponential Moving Average.

Locomotion Tasks

Hopper-v2	Walker2d-v2

HalfCheetah-v2	Ant-v2

Manipulation Tasks

DClaw Valve Turning	ShadowHand Cube Re-orientation

How to use this code

Installation

Ubuntu 20.04

Move into AMMI-RL/ directory, and then run the following:

conda create -n ammi-rl python=3.8

pip install -e .

pip install numpy torch wandb gym

If you want to run MuJoCo Locomotion tasks, and ShadowHand, you should install MuJoCo first (it's open sourced until 31th Oct), and then install mujoco-py:

sudo apt-get install ffmpeg

pip install -U 'mujoco-py<2.1,>=2.0'

If you are using A local GPU of Nvidia and want to record MuJoCo environments issue link, run:

unset LD_PRELOAD

MacOS

Move into AMMI-RL/ directory, and then run the following:

conda create -n ammi-rl python=3.8

pip install -e .

pip install numpy torch wandb gym

If you want to run MuJoCo Locomotion tasks, and ShadowHand, you should install MuJoCo first (it's open sourced until 31th Oct), and then install mujoco-py:

brew install ffmpeg gcc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco200/bin

pip install -U 'mujoco-py<2.1,>=2.0'

If you are using A local GPU of Nvidia and want to record MuJoCo environments issue link, run:

unset LD_PRELOAD

Run an experiment

Move into AMMI-RL/ directory, and then:

python experiment.py -cfg <cfg_file-.py> -seed <int>

for example:

python experiment.py -cfg sac_hopper -seed 1

Evaluate an Agent

To evaluate a saved policy model, run the following command:

python evaluate_agent.py -env <env_name> -alg <alg_name> -seed <int> -EE <int>

for example:

python evaluate_agent.py -env Walker2d-v2 -alg SAC -seed 1 -EE 5

AMMI-RL Team

(last name alphabetical order) | contribution

Rami Ahmed | VPG, PPO, SAC, MB{PPO, SAC}
Wafaa Mohammed | SAC
Ruba Mutasim | SAC
MohammedElfatih Salah | SAC

AMMI-RL Advisors

Bilal Piot, Corentin Tallec and Florian Strub (During RL Course Fall 2021)
Vlad Mnih, Eszter Vértes and Theophane Weber (During Rami's AMMI project)

Acknowledgement

This repo was inspired by many great repos, mostly the following ones (not necessarily in order):

ramiribat / ammi-rl Goto Github PK

ammi-rl's Introduction