Giter Club home page Giter Club logo

awesome-mixture-of-experts-papers's Introduction

Awesome-Mixture-of-Experts-Papers

Awesome-Mixture-of-Experts-Papers is a curated list of Mixture-of-Experts (MoE) papers in recent years. Star this repository, and then you can keep abreast of the latest developments of this booming research field.

Thanks to all the people who made contributions to this project. We strongly encourage the researchers to make pull request (e.g., add missing papers, fix errors) and help the others in this community!

Algorithm

2022

  • Unified Scaling Laws for Routed Language Models [pdf] arXiv 2022

    Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu, Karen Simonyan

  • Designing Effective Sparse Expert Models [pdf] arXiv 2022

    Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus

  • Mixture-of-Experts with Expert Choice Routing [pdf] arXiv 2022

    Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew Dai, Zhifeng Chen, Quoc Le, James Laudon

  • Taming Sparsely Activated Transformer with Stochastic Experts [pdf] ICLR 2022

    Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao

2021

  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [pdf] arXiv 2021

    William Fedus, Barret Zoph, Noam Shazeer

  • GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding [pdf] ICLR 2021

    Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

  • GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [pdf] arXiv 2021

    Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathy Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu, Zhifeng Chen, Claire Cui

  • Scaling Vision with Sparse Mixture of Experts. [pdf] NeurIPS 2021

    Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby

  • Scalable Transfer Learning with Expert Models [pdf] ICLR 2021

    Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

  • Efficient Large Scale Language Modeling with Mixtures of Experts [pdf] arXiv 2021

    Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

  • DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning [pdf] NeurIPS 2021

    Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed H. Chi

  • Hash Layers For Large Sparse Models [pdf] NeurIPS 2021

    Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

  • BASE Layers: Simplifying Training of Large, Sparse Models. [pdf] ICML 2021

    Mike Lewis, Shruti Bhosale, Tim Dettmers, Naman Goyal, Luke Zettlemoyer

  • M6-T: Exploring Sparse Expert Models and Beyond [pdf] arXiv 2021

    An Yang, Junyang Lin, Rui Men, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Jiamang Wang, Yong Li, Di Zhang, Wei Lin, Lin Qu, Jingren Zhou, Hongxia Yang

  • Dense-to-Sparse Gate for Mixture-of-Experts [pdf] arXiv 2021

    Xiaonan Nie, Shijie Cao, Xupeng Miao, Lingxiao Ma, Jilong Xue, Youshan Miao, Zichao Yang, Zhi Yang, Bin Cui

  • Sparse MoEs meet Efficient Ensembles [pdf] arXiv 2021

    James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton

2019

  • Mixture Models for Diverse Machine Translation: Tricks of the Trade. [pdf] ICML 2019

    Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

2017

  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. [pdf] ICLR 2017

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean

System

2022

  • DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale [pdf] arXiv 2022

    Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He

2021

  • FastMoE: A Fast Mixture-of-Expert Training System [pdf] arXiv 2021

    Jiaao He, Jiezhong Qiu, Aohan Zeng, Zhilin Yang, Jidong Zhai, Jie Tang

Application

2021

  • Video Recommendation with Multi-gate Mixture of Experts Soft Actor Critic [pdf] SIGIR 2021

    Dingcheng Li, Xu Li, Jun Wang, Ping Li

  • Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference [pdf] EMNLP 2021

    Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Minh-Thang Luong, Orhan Firat

  • MSSM: A Multiple-level Sparse Sharing Model for Efficient Multi-Task Learning [pdf] SIGIR 2021

    Ke Ding, Xin Dong, Yong He, Lei Cheng, Chilin Fu, Zhaoxin Huan, Hai Li, Tan Yan, Liang Zhang, Xiaolu Zhang, Linjian Mo

  • DEMix Layers: Disentangling Domains for Modular Language Modeling [pdf] arXiv 2021

    Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer

2020

  • Multitask Mixture of Sequential Experts for User Activity Streams.[pdf] SIGKDD 2020

    Zhen Qin, Yicheng Cheng, Zhe Zhao, Zhe Chen, Donald Metzler, Jingzheng Qin

2018

  • Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts [pdf] SIGKDD 2018

    Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, Ed Chi

Contributed by Xiaonan Nie, Xupeng Miao, Qibin Liu and Hetu team members.

awesome-mixture-of-experts-papers's People

Contributors

codecaution avatar hsword avatar lqbdd avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.