Compressa.ai

All

13 repositories

compressa-deploy
Public
1•0•0•2•Updated Jul 28, 2025Jul 28, 2025
compressa-perf
Public
Python
•
MIT License
•1•1•0•0•Updated Jul 25, 2025Jul 25, 2025
compressa-ai.github.io
Public
HTML
•0•0•0•0•Updated Jul 11, 2025Jul 11, 2025
vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•9.6k•0•0•0•Updated Oct 26, 2024Oct 26, 2024
langchain_compressa
Public
Python
•
MIT License
•1•0•0•0•Updated Jul 18, 2024Jul 18, 2024
qlora
Public
QLoRA: Efficient Finetuning of Quantized LLMs
Jupyter Notebook
•
MIT License
•858•0•0•0•Updated Nov 20, 2023Nov 20, 2023
llm-awq
Public
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python
•
MIT License
•269•1•0•0•Updated Nov 20, 2023Nov 20, 2023
OmniQuant
Public
OmniQuant is a simple and powerful quantization technique for LLMs.
Python
•67•0•0•0•Updated Nov 8, 2023Nov 8, 2023
rulm
Public
Language modeling and instruction tuning for Russian
Jupyter Notebook
•
Apache License 2.0
•49•0•0•0•Updated Oct 18, 2023Oct 18, 2023
AutoAWQ
Public
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference.
C++
•
MIT License
•285•0•0•0•Updated Oct 16, 2023Oct 16, 2023
smoothquant
Public
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python
•
MIT License
•182•0•0•0•Updated Oct 13, 2023Oct 13, 2023
peft
Public
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Python
•
Apache License 2.0
•2k•0•0•0•Updated Sep 25, 2023Sep 25, 2023
neural-compressor
Public
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
Python
•
Apache License 2.0
•280•0•0•0•Updated Aug 16, 2023Aug 16, 2023