Skip to content

Latest commit

 

History

History
18 lines (16 loc) · 4.14 KB

readme.md

File metadata and controls

18 lines (16 loc) · 4.14 KB

Project for Efficient LLM

Tools

  • Star vllm: A high-throughput and memory-efficient inference and serving engine for LLMs. [link][paper]
  • Star bitsandbytes: 8-bit CUDA functions for PyTorch. [link]
  • Star GPTQ-for-LLaMa: 4 bits quantization of LLaMA using GPTQ. [link]
  • Star TinyChatEngine: TinyChatEngine: On-Device LLM Inference Library. [link]
  • Star LMOps: General technology for enabling AI capabilities w/ LLMs and MLLMs. [link]
  • Star lit-gpt: Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.. [link]
  • Star fastllm: 纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行. [link]
  • Star llmtools: 4-Bit Finetuning of Large Language Models on One Consumer GPU. [link]
  • Star torchdistill: A coding-free framework built on PyTorch for reproducible deep learning studies. 🏆20 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.. [link][paper]
  • Star gpt4all: open-source LLM chatbots that you can run anywhere. [link][paper]
  • Star low_bit_llama: Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs. [link]
  • Star exllama: A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.. [link]

Open-source Lightweight LLM

  • Star TinyLlama: The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.. [link]