Comments (2)
Hi, thanks for your interest.
For (1)(2), please refer to section 4.1
In short, we are using learnable weight coefficients lambda to decide where to add the adapter/whether we need to add an adapter. c is the initialization value of lambda, as a hyperparameter to tune.
lambda is changed/learned during the training process, and yeah, it is the weight for its corresponding module. After training, in each transformer layer, we select the module with the biggest weight. (i) If the selected one is an old module, the new task will share this module with the corresponding old tasks. (ii) If the selected one is a new module, we add a new module in this layer for the new task.
For (3), adapters for the old task are fixed during the decision stage.
During the training stage, if some of the old tasks are replayed (see the last paragraph of section 4.2 for details), (i) For the setting of similar tasks, we updated all adapter modules of the replayed tasks. (ii) For the setting of dissimilar tasks, we updated the adapter modules of the replayed tasks that are shared with the current task.
(In the code, this is related to the option "--partial transfer")
(4) Yes. And we also assume we know the identifiers for all adapter baselines.
from adaptive-compositional-modules.
Thank you for your clear reply. I have understood !
from adaptive-compositional-modules.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adaptive-compositional-modules.