Comments (6)
Did you figure out anything? Actually even I am trying to do the same thing along with Time windows
from adm-vrp.
@JaswanthBadvelu No, not yet
from adm-vrp.
I am using this model in my project, I have a few doubts, Do you mind connecting with me here on LinkedIn so we can discuss more?
https://www.linkedin.com/in/jaswanth-badvelu/
Thanks
from adm-vrp.
@Monikasinghjmi If you mean multi-depot problem for a single agent, you could consider changing masking procedure along with adding embeddings for all depots in graph attention encoder.
There is function get_mask
in environment.py where we mask all visited nodes and nodes with demand greater than current available capacity of the agent (along with some additional logic for dynamical version of the model presented in the paper) and decide whether it is allowed to go to depot. This mask is used in MHA decoding process in attention_dynamic_model.py. Also there is mask defined in get_att_mask
in environment.py. This one is used for graph encoding (before decoding steps) in the same file.
Make sure these functions keep all the depots unmasked at the appropriate times so that attention mechanisms could properly encode all depots and then decide which one should we choose.
from adm-vrp.
@d-eremeev thanks for the explanation. My understanding is that the REINFORCE algo used for calculating the cost function has to be updated for MDVRP problem. Is this correct??
from adm-vrp.
@Monikasinghjmi I'm not sure what do you mean by "has to be updated".
REINFORCE is a classical policy gradient algorithm. Basically, we want to extremize the expected return of the whole episode: sum of the rewards (~ length) over the whole trajectory multiplied by corresponding probabilities. REINFORCE is a Monte-Carlo method, which "tells" us a "convinient" form of the gradient of expected return. In this sense, it should not be changed.
Of course, there are several components in the formula: length of your path + probabilities of nodes, returned by decoder. Also, there is a "baseline" added, which involves a copy of a model from one of the preceeding epochs. If you change components, then in that sense REINFORCE with baseline would be "updated".
For educational purposes, you might want to check for ex. the following link:
RL — Policy Gradient Explained by Jonathan Hui.
from adm-vrp.
Related Issues (3)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adm-vrp.