hi, Can you help me to understand how to extend this problem to multi-depot proble

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to extend this to multi depot problem? about adm-vrp HOT 6 CLOSED

d-eremeev commented on July 27, 2024

How to extend this to multi depot problem?

from adm-vrp.

Comments (6)

JaswanthBadvelu commented on July 27, 2024

Did you figure out anything? Actually even I am trying to do the same thing along with Time windows

from adm-vrp.

Monikasinghjmi commented on July 27, 2024

@JaswanthBadvelu No, not yet

from adm-vrp.

JaswanthBadvelu commented on July 27, 2024

I am using this model in my project, I have a few doubts, Do you mind connecting with me here on LinkedIn so we can discuss more?
https://www.linkedin.com/in/jaswanth-badvelu/
Thanks

from adm-vrp.

d-eremeev commented on July 27, 2024

@Monikasinghjmi If you mean multi-depot problem for a single agent, you could consider changing masking procedure along with adding embeddings for all depots in graph attention encoder.

There is function get_mask in environment.py where we mask all visited nodes and nodes with demand greater than current available capacity of the agent (along with some additional logic for dynamical version of the model presented in the paper) and decide whether it is allowed to go to depot. This mask is used in MHA decoding process in attention_dynamic_model.py. Also there is mask defined in get_att_mask in environment.py. This one is used for graph encoding (before decoding steps) in the same file.

Make sure these functions keep all the depots unmasked at the appropriate times so that attention mechanisms could properly encode all depots and then decide which one should we choose.

from adm-vrp.

Monikasinghjmi commented on July 27, 2024

@d-eremeev thanks for the explanation. My understanding is that the REINFORCE algo used for calculating the cost function has to be updated for MDVRP problem. Is this correct??

from adm-vrp.

d-eremeev commented on July 27, 2024

@Monikasinghjmi I'm not sure what do you mean by "has to be updated".
REINFORCE is a classical policy gradient algorithm. Basically, we want to extremize the expected return of the whole episode: sum of the rewards (~ length) over the whole trajectory multiplied by corresponding probabilities. REINFORCE is a Monte-Carlo method, which "tells" us a "convinient" form of the gradient of expected return. In this sense, it should not be changed.

Of course, there are several components in the formula: length of your path + probabilities of nodes, returned by decoder. Also, there is a "baseline" added, which involves a copy of a model from one of the preceeding epochs. If you change components, then in that sense REINFORCE with baseline would be "updated".

For educational purposes, you might want to check for ex. the following link:
RL — Policy Gradient Explained by Jonathan Hui.

from adm-vrp.

How to extend this to multi depot problem? about adm-vrp HOT 6 CLOSED

Comments (6)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent