Giter Club home page Giter Club logo

human-rl's Introduction

Human intervention reinforcement learning

Code for final project of the course Reinforcement Learning 2021 at Skoltech. It is a modification of the research code for the paper "Trial without Error: Towards Safe Reinforcement Learning via Human Intervention" (arxiv) (2017)

Overview

This repository contains the code for human intervention reinforcement learning in Atari environments (based on OpenAI's Gym). The humanrl package contains various Gym environment wrappers and utilities that allow modifying Atari environments to include catastrophes.

Compared to the original code of the paper, this repository has the following updates:

  • A lot of updates to docker image in order to run it.
  • Modification of humanrl/pong_catastrophe.py file for a two-sided symmetric catastrophe for Pong environment.
  • New code for FreeWay Atari environment with modification of several files. The main are humanrl/freeway_catastrophe.py and universe_starter_agent/envs.py.

scripts/human_feedback.py is a script that allows a human to intervene during offline or online training of an RL agent.

Installation and use

Firstly, you need to prepare docker container. To use, first install docker: https://docs.docker.com/engine/installation/

To build and start the docker image:

docker build -t base_new -f base.docker .
docker build -t main -f main.docker .

On Ubuntu:

docker run --name=human-rl -t -i -v /var/run/docker.sock:/var/run/docker.sock --net=host -v `pwd`:/mnt/human-rl/ main

On OS X (works on 10.12.2):

docker run --privileged -p 5901:5900 -v /usr/bin/docker:/user/bin/docker -v /var/run/docker.sock:/var/run/docker.sock -v `pwd`:/mnt/human-rl -e DOCKER_NET_HOST=172.17.0.1 -t -i main
open vnc://localhost:5901

Which launches a command line version of the docker container

and to restart the docker container later:

docker start human-rl

docker attach human-rl

(Note: the -v /var/run/docker.sock:/var/run/docker.sock --net=host options are necessary to allow the universe to use automatic remotes. This may not work outside of ubuntu. In this case, you may need to manually start universe remotes and point openai gym at them, see https://github.com/openai/universe/blob/master/doc/remotes.rst#how-to-start-a-remote)

It also opens a vnc server on port 5900. To view gym environments, you can run the training from the vnc session. (password is openai) To attach to a vnc session, you need to install vncviewer. I used TigerVNC:

apt install tigervnc-viewer

Then you may attach to vnc session:

vncviewer localhost:5900 (password is openai).

There you can view your training and label episodes in order to train blocker.

Training

The following pipeline should be done one-by-one.

No penalties/blocking (just saving frames)

To run A3C without any catastrophe penalties/blocking:

cd universe_starter_agent
python train.py --num-workers 4 --env-id Pong --log-dir $log_dir --catastrophe_reward 0

The script train.py starts the workers (and is not modified from the original). This calls worker.py which creates a gym env and runs A3C on the env. The script envs.py is where the env is constructed and where catastrophe wrappers are added. In the above command (where we didn't set a catastrophe_type argument), the only wrapper used is frame.FrameSaveWrapper, which just saves the frames.

Note: in envs.py, the function make_env converts 'Pong' to 'PongDeterministic-v3' and does the same for the other games. The deterministic versions are easier.

Label frames

It is in case you use human labelling. Start vncviewer: vncviewer localhost:5900 (password is openai).

Start labeling mode: python scripts/human_feedback.py --label_mode block -i 2 -f $logdir/episodes -o $logdir/labels

Then you need to press 'b' to block action.

Train classifier and blocker

Using classifier and blocker heuristics. You can modify code to use human-labelled data as training data.

python train/pong_classifier1.py
python train/pong_blocker1.py

Penalties using catastrophe labeller

Penalties can be provided either by a hand-coded labeller or a TF classifier. For the Pong trained classifiers for blocker and catastrophe classifier we use:

cd universe_starter_agent
python train.py --num-workers=4 --env-id Freeway --log-dir $log_dir --catastrophe_reward -1 --blocker_file $blocker_file --classifier_file $classifier_file --catastrophe_type 1 --blocking_mode action_replacement

Other info

See the human feedback README for directions on providing human feedback with the OpenAI universe starter agent.

See the catastrophe wrapper for a general purpose way to add catastrophes to Gym environments.

human-rl's People

Contributors

dfrolova avatar gsastry avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.