am-vrp's Introduction

Attention Model for Vehicle Routing Problems

Tensorflow 2.0 implementation of Attention, Learn to Solve Routing Problems! article.

Dmitry Eremeev, Alexey Pustynnikov

This work was done as part of a final project for DeepPavlov course: Advanced Topics in Deep Reinforcement learning.

Code of the full project (dynamic version) is located at https://github.com/d-eremeev/ADM-VRP

Enviroment:

Current enviroment implementation is located in Enviroment.py file - AgentVRP class.

The class contains information about current state and actions that were done by agent.

Main methods:

step(action): transit to a new state according to the action.
get_costs(dataset, pi): returns costs for each graph in batch according to the paths in action-state space.
get_mask(): returns a mask with available actions (allowed nodes).
all_finished(): checks if all games in batch are finished (all graphes are solved).

Let's connect current terms with RL language (small dictionary):

State: $X$ - graph instance (coordinates, demands, etc.) together with information in which node agent is located.
Action: $\pi_t$ - decision in which node agent should go.
Reward: The (negative) tour length.

Model Training:

AM is trained by policy gradient using REINFORCE algorithm with baseline.

Baseline

Baseline is a copy of model with fixed weights from one of the preceding epochs.
Use warm-up for early epochs: mix exponential moving average of model cost over past epochs with baseline model.
Update baseline at the end of epoch if the difference in costs for candidate model and baseline is statistically-significant (t-test).
Baseline uses separate dataset for this validation. This dataset is updated after each baseline renewal.

Files Description:

Enviroment.py - enviroment for VRP RL Agent
layers.py - MHA layers for encoder
attention_graph_encoder.py - Graph Attention Encoder
attention_graph_decoder.py - Graph Attention Decoder
attention_model.py - Attention Model
reinforce_baseline.py - class for REINFORCE baseline
train.py - defines training loop, that we use in train_with_checkpoint.ipynb
train_with_checkpoint.ipynb - from this file one can start training or continue training from chechpoint
generate_data.py - various auxiliary functions for data creation, saving and visualisation
results folder: folder name is ADM_VRP_{graph_size}_{batch_size}. There are training logs, learning curves and saved models in each folder

Training procedure:

Open train_with_checkpoint.ipynb and choose training parameters.
All outputs would be saved in current directory.

am-vrp's People

Contributors

Stargazers

Watchers

am-vrp's Issues

tf.where

In the function decoder_mha, which is in file attention_graph_decoder.py, the code does the following operation:
mask = mask[:, tf.newaxis, :, :]
The purpose of this operation is to expand mask's dimension so that following codes can work:
compatibility = tf.where(mask,
tf.ones_like(compatibility) * (-np.inf),
compatibility
)
However, I encountered the problem that , the dimension of "mask" is different from compatibility and it cannot wrok because the shape of the mask is [bacth_szie, 1, seq_len_q, seq_len_k] while the shape of the compatibility is [batch_size, num_heads, seq_len_q, seq_len_k].
How can I solve this problem?

Recommend Projects

alexeypustynnikov / am-vrp Goto Github PK

am-vrp's Introduction

Attention Model for Vehicle Routing Problems

Tensorflow 2.0 implementation of Attention, Learn to Solve Routing Problems! article.

Dmitry Eremeev, Alexey Pustynnikov

Enviroment:

Model Training:

Files Description:

Training procedure:

am-vrp's People

Contributors

Stargazers

Watchers

Forkers

am-vrp's Issues

tf.where

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent