jubra97 / speedos Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 3.1 MB

Approach to solve the task of the InformatiCup2021.

License: MIT License

Python 99.62% Dockerfile 0.38%

python minimax-algorithm lightcycle informaticup informaticup2021 augsburg-university docker mesa spe-ed team-speedos

speedos's Introduction

Speedos

Welcome to the world of Spe_ed, where Team Speedos rules

This repository contains our approach to the InformatiCup 2021 competition. Here we will give you a brief introduction to the underlying algorithm as well as the project structure and architecture. Furthermore, we will guide you through the installation process and provide a code example to get you started.

Algorithm

We are using a multi-player extension of the well known Minimax-Algorithm - called Multi-Minimax. The evaluation function for non-final game states is mainly based on the so called Voronoi-Heuristic. In a nutshell, the Voronoi-Heuristic calculates how many cells each player could potentially reach ahead of every other player. The returned value is then calculated as the difference betweeen the amount of cells that the maximizing and minimizing players can reach first. This encourages our player to control as much space as possible and to corral opponents. In order to deal with the variable time limit for action responses we implemented Depth-First Iterative Deepening. Furthermore, we enhanced our approach with extensions, such as Alpha-Beta-Pruning or Wall-Hugging.

Project Structure & Architecture

The project is split into the following packages: core, evaluation and scripts. The core package is the functional core of our project - it contains a model of Spe_ed and all of the different player algorithms (agents); the evaluation package contains everything that can be used to evaluate agents; the scripts package contains application oriented scripts, such as a script for the online execution of our agents. The figure below shows the dependency hirarchy between the upper mentioned packages. The basic idea behind the structure is that a user can simply extend or use the scripts and evaluation tools without having to worry about the concrete core-implementation.

The model can be viewed as a black-box replica of the game Spe_ed. It provides the exact same interfaces as the original so that algorithms can use both game instances without any adjustments. The model is created within the agent-based modelling framework Mesa which provides additional tools for visualization and data science. A simplified class diagram of the model architecture can be seen in the figure below. Most importantly it contains the SpeedModel and an abstract SpeedAgent. The SpeedModel implements game rules and is used to control the execution of a game instance (e.g. create, run or step forward). An instance of the SpeedModel class is mainly used by Muli-Minimax-agents to simulate future actions and scenarios. The SpeedAgent class provides the abstract method act(state). Every functional agent is a subclass of SpeedAgent that implements this function. act receives a game state and returns an action. A detailed description of the game state format and possible actions can be found in the InformatiCup repository. In case you want to implement an agent yourself, you could also take a look at agents we already implemented.

The res folder contains evaluation results and recorded games from the original game. These records are mainly used to test the model and make sure that it functions exactly like the original game. All other core software parts are also tested with unit tests. Tests are placed in a subfolder (called tests) within the respectively tested source folder - as its standard for python unit tests.

Getting Started

Installing Requirements

Execute the following comands in the project directory:

pip install -r requirements.txt
python setup.py build_ext --inplace

How to use Docker

We use docker to deploy our software. If you want to deploy a specific version you can bould a docker image with the following commands (if you already have a local repository you can skip the git clone command):

git clone https://github.com/jubra97/speedos.git
cd speedos
docker build -t speedosagent .

If you just want to use the latest version you can simply use the pre-built docker image that we provide:

docker pull docker.pkg.github.com/jubra97/speedos/speedos-agent:latest
docker tag docker.pkg.github.com/jubra97/speedos/speedos-agent:latest speedosagent

To start the docker container execute the following commands:

docker run -e URL="wss://msoll.de/spe_ed" -e KEY="IXT57ZEJMO6VFKF3KBZFB4LSEXBMWJ72VEYO2B6WT25UOXEIEAEN25XO" -e TIME_URL="https://msoll.de/spe_ed_time" speedosagent

Start Coding

The following code snippet shows how easy it is to create and run a fully functional game with different agents:

model = SpeedModel(60, 60, 2, agent_classes=[RandomAgent, SlidingWindowVoronoiMultiMiniMaxAgent], verbose=True)
model.run_model()

You can also have a look at the scripts folder to see how we used the projects core to deploy, test and evaluate our software.

Contact, Contribution & Further Use

We welcome everyone to contribute to our project and will gladly receive and answer any suggestions or questions that you might have. We encourage you to create GitHub-Issues in case of bug encounters or feature suggestions. In other cases the best way to contact us is via e-mail.

Our code is free to use for everybody under the conditions stated in our license. However, we would like to kindly ask you to acknowledge our work if you wish to use it for research, educational or commercial purposes.

speedos's People

Contributors

Stargazers

Watchers

speedos's Issues

Additional Gym Wrappers

We need Gym wrappers for:

Self-Play
1vHeuristics

The basic composition should be similar to the existing wrapper. Only the amount of agents and the handling of actions and observations has to altered.

Logging

depth reached

Unit Tests

Especially for the utils methods.

It would be nice if the model validation would be a unit test as well.

Json export in own model

Fix/Complete NStepSurvivalAgent

Write an interface to play real games with our agents

automatically record games (this is also useful to enhance the model validation)

Implement Trace-Aware Voronoi

Implement Tree of Chambers

Write test for compare_grid_with_cells

Time Analysis

optimize send time for DeadlineAware Agent
optimize multiminimax algorithm

Investigating Chess algorithms

Investigating the methods behind algorithms for Chess (e.g. Stockfish or AlphaZero) or other games.

Dockerizing

Refactor SpeedAgent.move()

The function ist kinda ugly. The loop (going through every speed step) can be eliminated since the new changes and some other valiables/calls/junctions are probably unnecessary.

Validate changes with the unit test from src/model/tests/test_model.py !!

Alpha-Beta-Search

Implementing a minimax algorithm with alpha-beta pruning.

DRL Adjustments

It's not possible to apply vanilla DRL algorithms to the original problem state. Following lists name some adjustments that could make DRL (more) suitable:

Approaches to deal with time-dependent and non-visual information:

recurrent Network
additional input neurons for player speeds, directions and the 6-round timer

Approaches to deal with the large and variant state space:

Cell-Normalization to obstacle or player (represent every static obstacle as number x and every active player as number y)
Multiple feature planes instead of cell-normalization (this is what AlphaZero hat built in)
CNN (It is probably impossible to use a DNN)
Resizing the input "image" to the CNN to a fixed size
Only observing a local region
Local region + a mini-map (resized view of the whole map to provide information about the general position of obstacles and other players)

Spezielle Spielstellung

Was passiert, wenn folgende Situation auftritt:

Der rote Spieler crasht ja in einen alten Trace des blauen Spielers und wird somit definitiv inaktiv. Beide Spieler crashen aber theoretisch auch auf dem Feld links neben dem roten Spieler.

Crashen beide oder nur rot?

In unserem Modell crashen Beide, weil wir davon ausgehen, dass eine Aktion immer komplett durchgeführt wird und bei einem Kollisionspunkt nicht abgeschnitten wird.

Optimizations

Stop Evaluation if its known that the result will be under/over max_move/min_move. Evaluation can be stopped because it would not change the result of max(max_move, result) or min(min_move, result) respectively.

Serverzeitabgleich

Bug: Agent sometimes speeds up in endgame

Mulit-MiniMax variations and improvements

Implement Multi-Minimax

Research on Self-Play

There are probably many necessary tweaks to DRL algorithms for self-play.

depth-first iterative deepening

https://cse.sc.edu/~mgv/csce580f11/gradPres/korf_IDAStar_1985.pdf

https://www.sciencedirect.com/science/article/abs/pii/0004370285900840

Q-Learning implementation

Is there a decent library that offers a Q-Learning implementation (applicable to Gym environments)?
If not: Implement Q-Learning for Gym environments.

Q-Learning will not work on full scale approaches but could maybe be used for an algorithm that only observes a local sector. Furthermore, Q-Learning could be used for testing/debugging and should perform better on small instances of the SingleAgentSpeedEnv than DRL-algos.

Resolve TODO in SpeedOneVoronoi

Parallelisatsasisiertation

Heuristics

A list of heuristic ideas that could be implemented:

OneStepSurvivalAgent (simulates every possible next state for every possible action combination and selects the action with the highest possibility of survival for this step)
NStepSurvivalAgent (should replace the OneStepSurvivalAgent; when N > 2, it should only consider routes that the agent is still alive in)
SelectiveNStepAgent (has a rating measurement for states and only continues with the M best paths per step N; this might correlate with Mini-Max Searches and Alpha-Beta Pruning)
VotingAgent (an agent that takes different agents as initial args and calls their action()-methods every step and then uses a (weighted) voting mechanism to decide on an action to execute)

Additional ideas for improvements of upper mentioned heuristics:

An option for a time limit NStep-based agents (returns the best solution found within a given time limit)
Winning move detection (every agent that uses the simulation to evaluate game paths should check whether a considered path is winning - this is very cheap)
Veto Mechanism for non-simulation-based agents and the VotingAgent (simulate the chosen action and use a veto on the chosen action if it inevitably leads to elimination)
Rating Measurement for states (necessary for the SelectiveNStepAgent and probably usefull for a lot of approaches; basic rating parameter ideas: opponents alive, self alive; advanced rating ideas: using artificial potential fields or a DQN)

Model Efficiency Evaluation

Use an analysis tool to evaluate the performance and potential bottlenecks of the model.

(low priority)

Voronoi - Special Cases

Case 1:
The algorithm does not find a way out of the enclosed space because it disregards particles that reach the exit point with a lower speed, since they are slower. The particles that reaches the exit point first (in 3 steps) is trapped.

Solutions:

Use the non-approximating method (keep every particle)
Keep slower particles with a timestamp-difference-threshold (still leads to non-optimal solutions)
Also execute a reachable-cells-algorithm that only moves particles with speed 1 and merge results (this would produce a worst/slowest case approximation that finds cases such as the one in the picture above; however, it does not regard wall jumps)

Case 2:
The algorithms would consider the cell that both can reach in 4 steps to be a battlefield. However, red can cut blue off. Thus, this cell should be part of reds region.

Notes:

This problem is probably more general and does not only apply to battlefields but also fields that are calculated to be in blues region.
This problem does not occur in the game without different speed levels because the cut-off-point would also be reached simultaneously.

Solutions:

First generate a region map that calculates the time step that an agent can occupy a cell with a trace (not only the position). Then generate the normal region map (that only regards positions) but particles collide with traces (from the trace region map) that have a lower timestamp.

Final jsons to images

Send Approach(es) to InfoCup

Time Management

Incremental deepening + execute MultiMiniMax in a task and stop when time is over

Refactoring

code quality

Occupation-Based Algos

Basic Idea:

Evaluate a position by assigning an occupier to each cell. An occupier is a player that can reach the cell fastest out of all players. A cell can be occupied by multiple players, meaning they could reach the cell within the same amount of steps.
The resulting occupation map would show regions that a player "occupies". This could be used for 1) state evaluation (e.g. a larger occupied region is preferable) and 2) decision making based on the occupation map (e.g. choose actions along occupation borders to preserve occupied regions).

I made this up but I guess there is already smth like that out there.

Returning a scalar value between -1 and 1
Agent not active -> -1
Taking several evaluation parameters/methods that all map a position to a score in [-1, 1] and calculating a weighted average for the total score
Finding out whether the agent or other agents are trapped
Creating an occupation map -> Every player receives a normalized score that is proportional to its 1) reachable cells and 2) occupied cells. Normalization could look the following way: Calculate the reachability score of player p as (2 * (x_p - x_worst))/x_best -1 with the highest amount of reachable cells out of all players x_best, the lowest amount of reachable cells out of all players x_worst and the reachable cells x_p of player p. Output is in [-1, 1]. The same can be applied to occupied cells. Note, that it is important to choose N as mentioned above instead of the total amount of cells because this would falsely award small scores in the end-game (where a smaller amount of cells is reachable)
Performing an Artificial Potential Field evaluation (this would probably consider factors that are similar to the ones that the occupation map approach considers)