Skip to content
Change the repository type filter

All

    Repositories list

    • 1002Updated Jul 28, 2025Jul 28, 2025
    • Python
      1100Updated Jul 25, 2025Jul 25, 2025
    • HTML
      0000Updated Jul 11, 2025Jul 11, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      9.6k000Updated Oct 26, 2024Oct 26, 2024
    • Python
      1000Updated Jul 18, 2024Jul 18, 2024
    • qlora

      Public
      QLoRA: Efficient Finetuning of Quantized LLMs
      Jupyter Notebook
      858000Updated Nov 20, 2023Nov 20, 2023
    • llm-awq

      Public
      AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
      Python
      269100Updated Nov 20, 2023Nov 20, 2023
    • OmniQuant

      Public
      OmniQuant is a simple and powerful quantization technique for LLMs.
      Python
      67000Updated Nov 8, 2023Nov 8, 2023
    • rulm

      Public
      Language modeling and instruction tuning for Russian
      Jupyter Notebook
      49000Updated Oct 18, 2023Oct 18, 2023
    • AutoAWQ

      Public
      AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference.
      C++
      285000Updated Oct 16, 2023Oct 16, 2023
    • [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
      Python
      182000Updated Oct 13, 2023Oct 13, 2023
    • peft

      Public
      🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
      Python
      2k000Updated Sep 25, 2023Sep 25, 2023
    • Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
      Python
      280000Updated Aug 16, 2023Aug 16, 2023