vita-group / random-moe-as-dropout Goto Github PK
View Code? Open in Web Editor NEW[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang
License: Apache License 2.0