Giter Club home page Giter Club logo

bombermanrl's Introduction

BombermanRL

DTU course 2456 Deep learning project. We've chosen the Pommerman reinforcement learning project. The environment used is the playground enviroment used for the NIPS 2018 Pommerman competition (https://www.pommerman.com).

Reproduce results:

In order to reproduce the results presented in the final handin, you should run the notebook evolutionarystrategies/ReproduceResults.ipynb

Motivation

Reinforcement learning is still a field in rapid development. A currently on-going competition on NIPS is to explore Multi-agent Reinforcement learning. We want to participiate in this current on-going exploration and study different methodics and their performance for this given problem.

Further motivation for why multi-agent learning is interesting can be seen in their explainer page; 'Accomplishing tasks with infinitely meaningful variation is common in the real world and difficult to simulate. Competitive multi-agent learning enables this.'

Background

We'll start by using Ross Wightman's PyTorch model as a starting point. With his model he succeeds to beat three SimpleAgents 95% of the games using policy gradient learning.

We believe that evolutionary learning will be a good approach to the Pommerman problem. Policy gradient is usually better than evolutionary learning if the actual reward is calculated each time an action is taken. This is because the variance of the evaluation of evolutionary learning per action is high. This environment only calculates the expected reward per action and the actual reward is only calculated when the game ends. Thus the variance of the expected reward is less impactful of the actual reward. The evaluation time of evolutionary learning is a lot lower than policy gradiant. Therefore, more training evaluations can be done and presumably a higher performance can be achieved in the same time from evolutionary learning.

To test our hypothesis we'll try to train an agent using evolutionary learning and compare our results to the results obtained by Ross Wightman.

Milestones

The overall goal with the project is to make a submission to the NIPS 2018 competition (Deadline November 21st).

The subgoals to our agents are the following:

  • Train a consistent FFA agent to beat three RandomAgents on average more than 50% of the times
  • Train a consistent FFA agent to beat one RandomAgent and two SimpleAgents on average more than 50% of the times
  • Train a consistent FFA agent to beat three SimpleAgents on average more than 50% of the times

Here the RandomAgents are agents taking completely random actions and SimpleAgents are benchmark agents given by the Pommerman community as a benchmark on how good an agent should be before a submission.

When we have a succesfull agent for the FFA enviroment, we'll expand it to the Team environment, which is the official NIPS 2018 Competition environment.

Furthermore if we succeed on the subgoals it would be interesting to add imitation learning to our agents in order to make two agents collaborate while still following the rules of the NIPS 2018 competetion.

bombermanrl's People

Contributors

fbohu avatar s134265 avatar christianingwersen avatar cdglissov avatar

Watchers

 avatar  avatar  avatar

Forkers

s134265 cdglissov

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.