vita-group / random-moe-as-dropout Goto Github PK

[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang

License: Apache License 2.0

Python 98.62% Shell 1.38%

transformer dropout moe self-slimmable

random-moe-as-dropout's People

Contributors

Stargazers

Watchers

Forkers

piyushi-0 jxzhangjhu

random-moe-as-dropout's Issues

Question about BERT config used in the paper

Hi, I'm curious about the config for BERT model used in the paper. Out of 12 layers of BERT which layers use MoE FFN layer?
Also are you planning share training script and configs for BERT/RoBERTa?

Could you please share the training SMoE-Dropout script for BERT and RoBERTa model?

Hi @Kyriection , thank you for your great work. Could you please share the training SMoE-Dropout script for BERT and RoBERTa model? Thank you.

About the code of modulization stage

Hi, @Kyriection Thanks for the exciting work.
I notice that you split the MLP of the model to get some smaller MLPs as MoE. But I didn't find any code about this stage in this repository. Did I miss something? Would you give some details about this stage?
Thanks a lot.

what's the random routing policy of SMoE-Dropout?

what's the random routing policy of SMoE-Dropout?
I read the paper and can not find any detailed description of it.
Are you using the standard dropout as the routing strategy during training?
Or why is your method called SMoE-Dropout?
Appreciate to get you answer.

vita-group / random-moe-as-dropout Goto Github PK

random-moe-as-dropout's People

Contributors

Stargazers

Watchers

Forkers

random-moe-as-dropout's Issues

Question about BERT config used in the paper

Could you please share the training SMoE-Dropout script for BERT and RoBERTa model?

About the code of modulization stage

what's the random routing policy of SMoE-Dropout?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent