Skip to content

Latest commit

 

History

History
35 lines (34 loc) · 13.4 KB

efficient_moe.md

File metadata and controls

35 lines (34 loc) · 13.4 KB

Efficient MOE

Title & Authors Introduction Links
SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Zhixu Du, Shiyu Li, Yuhao Wu, Xiangyu Jiang, Jingwei Sun, Qilin Zheng, Yongkai Wu, Ang Li, Hai "Helen" Li, Yiran Chen
image Paper
Star
Fast Inference of Mixture-of-Experts Language Models with Offloading
Artyom Eliseev, Denis Mazur
image Github
Paper
Star
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás, Piotr Piękos, Kazuki Irie, Jürgen Schmidhuber
image Github
Paper
Star
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. (DK)Panda
image Github
Paper
Star
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina
image Github
Paper
Star
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci
image Github
Paper
Star
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, Hongsheng Li
image Github
Paper
Enhancing Efficiency in Sparse Models with Sparser Selection
Yuanhang Yang, Shiyi Qi, Wenchao Gu, Chaozheng Wang, Cuiyun Gao, Zenglin Xu
image Github
Paper
Star
Prompt-prompted Mixture of Experts for Efficient LLM Generation
Harry Dong, Beidi Chen, Yuejie Chi
image Github
Paper
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai, Juyong Jiang, Le Qin, Junwei Cui, Sunghun Kim, Jiayi Huang
image Paper
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts
Alexandre Muzio, Alex Sun, Churan He
image Paper
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda
image Paper
Publish
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping
Chenyu Jiang, Ye Tian, Zhen Jia, Shuai Zheng, Chuan Wu, Yida Wang
image Paper
Publish
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, Christopher Carothers
image Paper
Star
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo, Zhenglin Cheng, Xiaoying Tang, Tao Lin
image Github
Paper
Publish
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
Taehyun Kim, Kwanseok Choi, Youngmock Cho, Jaehoon Cho, Hyuk-Jae Lee, Jaewoong Sim
image Paper
Star
Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Shwai He, Daize Dong, Liang Ding, Ang Li
image Github
Paper
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
Jing Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang
image Paper
Star
Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark
Pingzhi Li, Xiaolong Jin, Yu Cheng, Tianlong Chen
image Github
Paper
Star
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang
image Github
Paper
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao
image Paper
Star
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, Xiaojuan Qi
image Github
Paper
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian, Fengcun Li, Xiangyang Ji, Xiaoyu Zhao, Jianchao Tan, Kefeng Zhang, Xunliang Cai
Paper
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Xin He, Shunkang Zhang, Yuxin Wang, Haiyan Yin, Zihao Zeng, Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Ivor Tsang, Ong Yew Soon
image Paper
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song, Zihang Zhong, Rong Chen
image Paper
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference
Peng Tang, Jiacheng Liu, Xiaofeng Hou, Yifei Pu, Jing Wang, Pheng-Ann Heng, Chao Li, Minyi Guo
image Paper
Star
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization
Jingming Guo, Yan Liu, Yu Meng, Zhiwei Tao, Banglan Liu, Gang Chen, Xiang Li
image Github
Paper
Star
MoE-I2: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Yuanlin Duan, Wenqi Jia, Miao Yin, Yu Cheng, Bo Yuan
image Github
Paper
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Andrii Skliar, Ties van Rozendaal, Romain Lepert, Todor Boinovski, Mart van Baalen, Markus Nagel, Paul Whatmough, Babak Ehteshami Bejnordi
image Paper
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Andrii Skliar, Ties van Rozendaal, Romain Lepert, Todor Boinovski, Mart van Baalen, Markus Nagel, Paul Whatmough, Babak Ehteshami Bejnordi
image Paper
Star
Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning
Mingyu Cao, Gen Li, Jie Ji, Jiaqi Zhang, Xiaolong Ma, Shiwei Liu, Lu Yin
image Github
Paper