Skip to content

Latest commit

 

History

History
126 lines (125 loc) · 55.4 KB

pruning.md

File metadata and controls

126 lines (125 loc) · 55.4 KB

Network Pruning

Title & Authors Introduction Links
Star Publish Type
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar, Dan Alistarh
image Github paper
Star Publish Type
LLM-Pruner: On the Structural Pruning of Large Language Models
Xinyin Ma, Gongfan Fang, Xinchao Wang
image Github paper
Star Publish Type
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Zhangyang Wang
image Github
Paper
StarPublish Type
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song
image Github
Paper
Star Publish Type
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun, Zhuang Liu, Anna Bair, J. Zico Kolter
image Github
Paper
Star Publish Type
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen
image Github
Paper
StarPublish
Plug-and-Play: An Efficient Post-training Pruning Method for Large Language Models
Yingtao Zhang, Haoli Bai, Haokun Lin, Jialin Zhao, Lu Hou, Carlo Vittorio Cannistraci
image Github
Paper
StarPublish Type
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang
image Github
Paper
StarPublish Type
NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models
Jongwoo Ko, Seungjoon Park, Yujin Kim, Sumyeong Ahn, Du-Seong Chang, Euijai Ahn, Se-Young Yun
image Github
Paper
StarPublish
LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang
image Github
Paper
Type
Pruning Large Language Models via Accuracy Predictor
Yupeng Ji, Yibo Cao, Jiucai Liu
image Paper
Type
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang
image Paper
StarType
Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity
Lu Yin, Shiwei Liu, Ajay Jaiswal, Souvik Kundu, Zhangyang Wang
image Github
Paper
StarType
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Mykola Pechenizkiy, Yi Liang, Zhangyang Wang, Shiwei Liu
image Github
Paper
Type
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Song Guo, Jiahang Xu, Li Lyna Zhang, Mao Yang
image Github
Paper
Star Type
Sparse Finetuning for Inference Acceleration of Large Language Models
Eldar Kurtic, Denis Kuznedelev, Elias Frantar, Michael Goin, Dan Alistarh
image Github
Paper
Type
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar
image Paper
Type
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite
image Paper
StarPublish Type
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
Hang Shao, Bei Liu, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian
image Github
Paper
Star Type
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, Luming Liang
image Github
Paper
Star Type
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
Björn Deiseroth, Max Meuer, Nikolas Gritsch, Constantin Eichenberg, Patrick Schramowski, Matthias Aßenmacher, Kristian Kersting
image Github
Paper
Star Type
Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models
Rocktim Jyoti Das, Liqun Ma, Zhiqiang Shen
image Github
Paper
Star
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs Type
Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, Rongrong Ji
image Github
Paper
Type E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang
image Paper
Star Type
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
Max Zimmer, Megi Andoni, Christoph Spiegel, Sebastian Pokutta
image Github
Paper
Star
Fast and Optimal Weight Update for Pruned Large Language Models Type
Vladimír Boža
image Github
Paper
Star Type
Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
Adib Hasan, Ileana Rugina, Alex Wang
image Github
Paper
Star Type
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman
image Github
Paper
Type
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao
image Paper
ReLU2 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun
image Paper
Star
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery, Steven Kolawole, Jean-Francois Kagey, Virginia Smith, Graham Neubig, Ameet Talwalkar
image Github
Paper
Star
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia et al
image Github
Paper
Project
NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models
Shengrui Li, Xueting Han, Jing Bai
image Paper
Learn To be Efficient: Build Structured Sparsity in Large Language Models
Haizhong Zheng, Xiaoyan Bai, Beidi Chen, Fan Lai, Atul Prakash
image Paper
Star Publish Type
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Bo-Kyeong Kim, Geonmin Kim, Tae-Ho Kim, Thibault Castells, Shinkook Choi, Junho Shin, Hyoung-Kyu Song
image Github
Paper
Star
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song, Kyungseok Oh, Taesu Kim, Hyungjun Kim, Yulhwa Kim, Jae-Joon Kim
image Github
Paper
HiRE: High Recall Approximate Top-k Estimation for Efficient LLM Inference
Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli
image Paper
LaCo: Large Language Model Pruning via Layer Collapse
Yifei Yang, Zouying Cao, Hai Zhao
image Paper
Star
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song, Xu Han, Zhengyan Zhang, Shengding Hu, Xiyu Shi, Kuai Li et al
image Github
Paper
[Model-7B] [Model-13B]
Star
EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao, Yiyu Shi, Rongrong Ji
image Github
Paper
Star
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo
image Github
Paper
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Xin Men, Mingyu Xu, Qingyu Zhang, Bingning Wang, Hongyu Lin, Yaojie Lu, Xianpei Han, Weipeng Chen
image Paper
Efficient Pruning of Large Language Model with Adaptive Estimation Fusion
Jun Liu, Chao Wu, Changdi Yang, Hao Tang, Haoye Dong, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi Wang
image Paper
Star Type
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie et al
image Github
Paper
Project
Compressing Large Language Models by Streamlining the Unimportant Layer
Xiaodong Chen, Yuxuan Hu, Jing Zhang
image Paper
Star
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind
Hongchuan Zeng, Hongshen Xu, Lu Chen, Kai Yu
image Github
Paper
Star
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy
Yijin Liu, Fandong Meng, Jie Zhou
image Github
Paper
StarPublish Type
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
Guangyan Li, Yongqiang Tang, Wensheng Zhang
image Github
Paper
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee, Donghyun Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini
image Paper
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer et al
image Paper
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Abhinav Agarwalla, Abhay Gupta, Alexandre Marques, Shubhra Pandit, Michael Goin, Eldar Kurtic, Kevin Leong, Tuan Nguyen, Mahmoud Salem, Dan Alistarh, Sean Lie, Mark Kurtz
image Paper
Dependency-Aware Semi-Structured Sparsity of GLU Variants in Large Language Models
Zhiyu Guo, Hidetaka Kamigaito, Taro Wanatnabe
image Paper
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro
image Paper
StarPublish Type
LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models
Juan Pablo Munoz, Jinjie Yuan, Yi Zheng, Nilesh Jain
image Github
Paper
StarPublish Type
Shears: Unstructured Sparsity with Neural Low-rank Adapter Search
Juan Pablo Munoz, Jinjie Yuan, Nilesh Jain
image Github
Paper
StarPublish
Pruning as a Domain-specific LLM Extractor
Nan Zhang, Yanchi Liu, Xujiang Zhao, Wei Cheng, Runxue Bao, Rui Zhang, Prasenjit Mitra, Haifeng Chen
image Github
Paper
StarPublish
Language-Specific Pruning for Efficient Reduction of Large Language Models
Maksym Shamrai
Github
Paper
Star
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie et al
image Github
Paper
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang, Yawei Li, Xinpeng Wang, Qianli Shen, Barbara Plank, Bernd Bischl, Mina Rezaei, Kenji Kawaguchi
image Paper
Star
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang, Maryam Mehri Dehnavi
image Github
Paper
Star
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li
image Github
Paper
Large Language Model Pruning
Hanjuan Huang, Hao-Jia Song, Hsing-Kuo Pao
image Paper
Type Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh
Paper
VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
Oshin Dutta, Ritvik Gupta, Sumeet Agarwal
image Paper
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Yixin Song, Haotong Xie, Zhengyan Zhang, Bo Wen, Li Ma, Zeyu Mi, Haibo Chen
image Paper
Model
StarPublish
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models
Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu
image Github
Paper
Star
MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations
Zixiao Wang, Jingwei Zhang, Wenqian Zhao, Farzan Farnia, Bei Yu
image Github
Paper
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models
Xiang Meng, Kayhan Behdin, Haoyue Wang, Rahul Mazumder
image Paper
Optimization-based Structural Pruning for Large Language Models without Back-Propagation
Yuan Gao, Zujing Liu, Weizhong Zhang, Bo Du, Gui-Song Xia
image Paper
Star
ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M Rush, Safeen Huda, Mohamed S Abdelfattah
image Github
Paper
Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization
Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee
image Paper
Publish
Learning Neural Networks with Sparse Activations
Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath, Raghu Meka
Paper
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen
image Paper
Publish
Structured Pruning for Large Language Models Using Coupled Components Elimination and Minor Fine-tuning
Honghe Zhang, XiaolongShi XiaolongShi, Jingwei Sun, Guangzhong Sun
image Paper
Star
BlockPruner: Fine-grained Pruning for Large Language Models
Longguang Zhong, Fanqi Wan, Ruijun Chen, Xiaojun Quan, Liangzhi Li
image Github
Paper
Publish
Flextron: Many-in-One Flexible Large Language Model
Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov
image Paper
StarPublish
Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations
Bowen Shen, Zheng Lin, Daren Zha, Wei Liu, Jian Luan, Bin Wang, Weiping Wang
image Github
Paper
Star Type
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, Vivek Srikumar
Github
Paper
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Hongyu Wang, Shuming Ma, Ruiping Wang, Furu Wei
image Paper
Reconstruct the Pruned Model without Any Retraining
Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang
image Paper
MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi
image Paper
Star
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov
image Github
Paper
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li, Yijun Dong, Qi Lei
image Paper
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Guohao Jian, Yuezhou Hu, Jun Zhu, Jianfei Chen
image Paper
A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Pengxiang Zhao, Hanyu Hu, Ping Li, Yi Zheng, Zhefeng Wang, Xiaoming Yuan
image Paper
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum
image Paper
Star
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu
image Github
Paper
LLM Pruning and Distillation in Practice: The Minitron Approach
Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov
image Paper
Language-specific Calibration for Pruning Multilingual Language Models
Simon Kurz, Zhixue Zhao, Jian-Jia Chen, Lucie Flek
Paper
Star
PAT: Pruning-Aware Tuning for Large Language Models
Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du
image Github
Paper
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Jaeseong Lee, seung-won hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, Yuxiong He
image Paper
Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models
Bishwash Khanal, Jeffery M. Capone
image Paper
KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models
Bo Lv, Quan Zhou, Xuanang Ding, Yan Wang, Zeming Ma
image Paper
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang, Vardan Papyan
Paper
Star
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang, Minghua Ma, Zekun Wang, Jingchang Chen, Huiming Fan, Liping Shan, Qing Yang, Dongliang Xu, Ming Liu, Bing Qin
image Github
Paper
Publish
Search for Efficient Large Language Models
Xuan Shen, Pu Zhao, Yifan Gong, Zhenglun Kong, Zheng Zhan, Yushu Wu, Ming Lin, Chao Wu, Xue Lin, Yanzhi Wang
image Paper
Star Publish
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Gongfan Fang, Hongxu Yin, Saurav Muralidharan, Greg Heinrich, Jeff Pool, Jan Kautz, Pavlo Molchanov, Xinchao Wang
image Github
Paper
StarPublish Type Type
SQFT: Low-cost Model Adaptation in Low-precision Sparse Foundation Models
Juan Pablo Munoz, Jinjie Yuan, Nilesh Jain
image Github
Paper
Mitigating Copy Bias in In-Context Learning through Neuron Pruning
Ameen Ali, Lior Wolf, Ivan Titov
image Paper
StarPublish
Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning
Abhinav Bandari, Lu Yin, Cheng-Yu Hsieh, Ajay Kumar Jaiswal, Tianlong Chen, Li Shen, Ranjay Krishna, Shiwei Liu
image Github
Paper
LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models
David Hoffmann, Kailash Budhathoki, Matthaeus Kleindessner
image Paper
Publish
DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models
Shangqian Gao, Chi-Heng Lin, Ting Hua, Tang Zheng, Yilin Shen, Hongxia Jin, Yen-Chang Hsu
image Paper
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou
image Paper
StarPublish
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang
image Github
Paper
Publish
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa, Ganesh Venkatesh, Nish Sinnadurai, Sean Lie
image Paper
Self-calibration for Language Model Quantization and Pruning
Miles Williams, George Chrysostomou, Nikolaos Aletras
image Paper
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji, Yang Xiang, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang
Paper
Star
Pruning Foundation Models for High Accuracy without Retraining
Pu Zhao, Fei Sun, Xuan Shen, Pinrui Yu, Zhenglun Kong, Yanzhi Wang, Xue Lin
Github
Paper
FedSpaLLM: Federated Pruning of Large Language Models
Guangji Bai, Yijiang Li, Zilinghan Li, Liang Zhao, Kibaek Kim
image Paper
Star
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search
Oliver Sieberling, Denis Kuznedelev, Eldar Kurtic, Dan Alistarh
image Github
Paper
Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs
Kang Zhao, Tao Yuan, Han Bao, Zhenfeng Su, Chang Gao, Zhaofeng Sun, Zichen Liang, Liping Jing, Jianfei Chen
image Paper
Star
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment
Ge Yang, Changyi He, Jinyang Guo, Jianyu Wu, Yifu Ding, Aishan Liu, Haotong Qin, Pengliang Ji, Xianglong Liu
image Github
Paper
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
Danyal Aftab, Steven Davy
image Paper
Star
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo, Chenyang Song, Xu Han, Yingfa Chen, Chaojun Xiao, Zhiyuan Liu, Maosong Sun
image Github
Paper
AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis
Zichen Song, Yuxin Wu, Sitan Huang, Zhongfeng Kang
image Paper
Star
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu
image Github
Paper
StarPublish
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
Yonggan Fu, Zhongzhi Yu, Junwei Li, Jiayi Qian, Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, Yingyan Celine Lin
image Github
Paper
Scaling Law for Post-training after Model Pruning
Xiaodong Chen, Yuxuan Hu, Jing Zhang, Xiaokang Zhang, Cuiping Li, Hong Chen
Paper
Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity
Zichen Song, Sitan Huang, Yuxin Wu, Zhongfeng Kang
image Paper
Star
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu, Hao Cheng, Yujie Fang, Zeyu Wang, Jiaheng Wei, Dongwei Xu, Qi Xuan, Xiaoniu Yang, Zhaowei Zhu
image Github
Paper
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich, Tomer Ronen, Talor Abramovich, Nir Ailon, Nave Assaf, Mohammad Dabbah et al
image Paper
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici, Davide Belli, Mart van Baalen, Amir Jalalirad, Andrii Skliar, Bence Major, Markus Nagel, Paul Whatmough
image Paper