DeepRL-Pong

Deep Reinforcement Learning bot playing Pong game based on: https://arxiv.org/abs/1312.5602

(our agent is green)

Requirements
Quick start
- Using docker
- Building environment locally
Confing
Running the code
Training statistics
Status
Credits

Requirements

Python - version 3.8.10
gym[atari]
PyTorch - version 1.8.1
numpy
comet-ml

Quick start

Using docker

You can run the environment using provided Dockerfile,

docker compose up

after that Jupyter Notebook will be run at localhost:7777.

Building environment locally

You can also build the environment locally. All packages needed to run our code are listed in the requirements.txt file. It's convenient to use virtualenv, nice virtualenv tutorial here

which python3.8
mkvirtualenv -p <path to python3> <name>
workon <name>

pip install --upgrade pip
pip install -r requirements.txt

Downloading ROM (https://github.com/openai/atari-py#roms):

Download it from http://www.atarimania.com/rom_collection_archive_atari_2600_roms.html

unrar x Roms.rar
unzip ROMS.zip
python -m atari_py.import_roms ROMS

CometML api key

To log training parameters to the comet-ml tag you should also run:

export COMET_ML_API_KEY=your-comet-api-key

Confing

We're using the YAML config file config.yml. You can set there both network parameters and environment settings, like the device on which you want to run the code or CometML settings.

If you want to use a pre-trained model you have to specify the LOAD_MODEL parameter as a name of the model_dump.pth file in models/saved_models directory, for example:

LOAD_MODEL: model_episode_5700.pth

Running the code

To test if your environment is set correctly you can run a simple demo:

python gym_demo.py

To run the training you have to modify the config.yml file and then run.

python main.py --mode train

To test your model you can speficy LOAD_MODEL parameter as described here, and the run:

python main.py --mode test

You can also observe a single game played by your model by running:

python main.py --mode demo

Architecture

Training statistics

During the training model logs all useful statistics to the CometML. You should set up the workspace, project name, tag, and name in the config file. Example of training statistics look like this:

All metrics visible here: https://www.comet.ml/thefebrin/deep-rl-pong/view/new

Saved models and Results

In /models/saved_models we keep our models.

Our first model was trained with Adam.

(rl) febrin@laptop:~/Desktop/DeepRL-Pong(master)$ python eval_saved_model.py --model-name model_episode_5700.pth --n-games 100 --frame-skipping 4
Evaluating model: model_episode_5700.pth on 100 games.
Validating model...: 100%|████████████████████████████████████████████████████████████████████████| 100/100 [08:11<00:00,  4.92s/it]
Model: model_episode_5700.pth | n_games: 100 | Average score: -12.13 | Min: -19.0 | Max: -3.0

Then we trained it for more games using SGD and gained slightly better results.

(rl) febrin@laptop:~/Desktop/DeepRL-Pong(master)$ python eval_saved_model.py --model-name model_episode_6350_sgd.pth --n-games 100 --frame-skipping 4
Evaluating model: model_episode_6350_sgd.pth on 100 games.
Validating model...: 100%|████████████████████████████████████████████████████████████████████████| 100/100 [07:39<00:00,  4.59s/it]
Model: model_episode_6350_sgd.pth | n_games: 100 | Average score: -11.83 | Min: -20.0 | Max: 2.0

We changed the architecture. Now the DQN model consists of two models, standard one and a target model. More info here: https://towardsdatascience.com/getting-an-ai-to-play-atari-pong-with-deep-reinforcement-learning-47b0c56e78ae

Dual model after 4500 games.

(rl) febrin@laptop:~/Desktop/DeepRL-Pong(master)$ python eval_saved_model.py --model-name model_dual_4500.pth --n-games 100 --frame-skipping 3
Evaluating model: model_dual_4500.pth on 100 games.
Validating model...: 100%|████████████████████████████████████████████████████████████████████████| 100/100 [12:00<00:00,  7.21s/it]
Model: model_dual_4500.pth | n_games: 100 | Average score: -8.15 | Min: -19.0 | Max: 9.0

Final dual model after 6200 games performed worst than the one before. Although the winning ratio is not the best the model was doing about 1300-2200 steps per game.

(rl) febrin@laptop:~/Desktop/DeepRL-Pong(master)$ python eval_saved_model.py --model-name model_dual_6200.pth --n-games 100 --frame-skipping 3
Evaluating model: model_dual_6200.pth on 100 games.
Validating model...: 100%|████████████████████████████████████████████████████████████████████████| 100/100 [08:33<00:00,  5.13s/it]
Model: model_dual_6200.pth | n_games: 100 | Average score: -11.27 | Min: -17.0 | Max: -2.0

Status

Project: finished

valakhosravi / deeprl-pong Goto Github PK

deeprl-pong's Introduction

DeepRL-Pong

Table of contents

Requirements

Quick start

Using docker

Building environment locally

CometML api key

Confing

Running the code

Architecture

Training statistics

Saved models and Results

Status

Credits

deeprl-pong's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent