This repository contains papers related to all kinds of LLMs.
We strongly encourage researchers in the hope of advancing their excellent work.
Theme | Source | Link | Other |
---|---|---|---|
…… | …… | …… | …… |
Descriptions | …… |
Paper | Source | Link | Other |
---|---|---|---|
A Survey on Multimodal Large Language Models for Autonomous Driving | arXiv:2311.12320 | bilibili | …… |
Descriptions | …… | ||
Retrieval-Augmented Generation for Large Language Models: A Survey | Arxiv2023'Tongji University | …… | …… |
Descriptions | This paper provides a comprehensive overview of the integration of retrieval mechanisms with generative processes within large language models to enhance their performance and knowledge capabilities. | ||
A Survey on Multimodal Large Language Models for Autonomous Driving | WACV2023'Purdue University | Bilibili: MLM for Autonomous Driving Survey | Github: MLM for Autonomous Driving Survey |
Descriptions | This paper provides a comprehensive overview of the integration of retrieval mechanisms with generative processes within large language models to enhance their performance and knowledge capabilities. |
Paper | Source | Link | Other |
---|---|---|---|
…… | …… | …… | …… |
Descriptions | …… |
Paper | Source | Link | Other |
---|---|---|---|
Improving Text Embeddings with Large Language Models | Arxiv2024'Microsoft | …… | Hugging Face: e5-mistral-7b-instruct |
Descriptions | Mistral's primary work is achieved through a two-stage prompt process:
The author categorizes tasks into two major types - asymmetric tasks where the retrieved pair consists of a query and a document, differentiated by length into short-long matches, long-short matches, long-long matches, and short-short matches, with a classic example being the search engine scenario. The paper indicates that large language models (LLMs) can significantly improve the quality of text embeddings, partly due to the synthetic data and partly due to the autoregressive capabilities of the LLMs. Moreover, it can streamline multi-stage embedding tasks into a single-stage fine-tuning (SFT) task, simplifying the training process. |
||
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems | NAACL 2024 | bilibili | Code: stanford-futuredata/ARES |
Descriptions | ARES, an Automated RAG Evaluation System, efficiently evaluates retrieval-augmented generation systems across multiple tasks using synthetic data and minimal human annotations, maintaining accuracy even with domain shifts. |
Paper | Source | Link | Other |
---|---|---|---|
C-Pack: Packaged Resources To Advance General Chinese Embedding | Arxiv2023'BAAI | Bilibili: C-Pack | Github: C-Pack |
Descriptions | BAAI and Huggingface introduce C-Pack which is an advanced model for Chinese embeddings, significantly outperforming existing models and includes comprehensive benchmarks, a massive dataset, and a range of models. BAAI 联合Huggingface 推出的 C-Pack,主打中文嵌入,性能明显优于现有模型,包括全面的基准测试、大规模数据集和多种模型。 |
Paper | Source | Link | Other |
---|---|---|---|
Llama 2: Open Foundation and Fine-Tuned Chat Models | Arxiv2023'Meta | bilibili | Github: Llama |
Descriptions | The technical report of Llama 2 from Meta Which is one of the top leaders of the LLMs open-sourced community. The greatest contribution of Llama 2 is the development of a range of pretrained and fine-tuned large language models (LLMs) that not only outperform existing open-source chat models on various benchmarks but are also optimized for dialogue scenarios. Additionally, these models have shown excellent performance in human evaluations of helpfulness and safety, potentially serving as effective substitutes for closed-source models. The Llama 2 project also provides a detailed description of the fine-tuning process and safety enhancements, aimed at fostering further development by the community and contributing to the responsible development of large language models.
本论文是Llama 2 模型发布的技术报告,来自全球最主要的大模型开源领袖之一 Meta。Llama 2的最大贡献是开发了一系列预训练和微调的大型语言模型(LLM),这些模型不仅在多个基准测试中优于现有的开源聊天模型,而且还经过优化,特别适用于对话场景。此外,这些模型在人类评估的帮助性和安全性方面表现出色,可能成为闭源模型的有效替代品。Llama 2项目还提供了对微调过程和安全性提升的详细描述,旨在促进社区基于此工作进一步发展,贡献于负责任的大型语言模型的开发。 |
||
Higher Layers Need More LoRA Experts | Arxiv2024'Northwestern University | …… | …… |
Descriptions | In deep learning models, higher layers require more LoRA (Low-Rank Adaptation) experts to enhance the model’s expressive power and adaptability. | ||
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression | Arxiv2023'Microsoft | …… | …… |
Descriptions | To accelerate and enhance the performance of large language models (LLMs) in handling long texts, compressing prompts can be an effective method. | ||
Can AI Assistants Know What They Don't Know? | Arxiv2024'Fudan University | …… | Code: Say-I-Dont-Know |
Descriptions | The paper explores if AI assistants can identify when they don't know something, creating a "I don't know" dataset to teach this, resulting in fewer false answers and increased accuracy. | ||
Code Llama: Open Foundation Models for Code | Arxiv2023'Meta AI | bilibili | codellama |
Descriptions | The article introduces Code Llama, a family of large programming language models developed by Meta AI, based on Llama 2, designed to offer state-of-the-art performance among open models, support large input contexts, and possess zero-shot instruction following capabilities for programming tasks. | ||
Are Emergent Abilities of Large Language Models a Mirage? | NIPS2023'Stanford University | bilibili | …… |
Descriptions | The article challenges the notion that large language models (LLMs) exhibit "emergent abilities," suggesting that these abilities may be an artifact of the metrics chosen by researchers rather than inherent properties of the models themselves. Through mathematical modeling, empirical testing, and meta-analysis, the authors demonstrate that alternative metrics or improved statistical methods can eliminate the perception of emergent abilities, casting doubt on their existence as a fundamental aspect of scaling AI models. | ||
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models | Arxiv2023'MEGVII Technology | …… | VaryBase |
Descriptions | The article introduces Vary, a method for expanding the visual vocabulary of Large Vision-Language Models (LVLMs) to enhance dense and fine-grained visual perception capabilities for specific visual tasks, such as document-level OCR or chart understanding. | ||
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | Arxiv2019'UKP Lab | bilibili | sentence-transformers |
Descriptions | The paper introduces Sentence-BERT (SBERT), a modification of the BERT network that employs siamese and triplet network structures to produce semantically meaningful sentence embeddings that can be compared using cosine similarity, thereby significantly enhancing the efficiency of sentence similarity search and clustering tasks. | ||
Mixtral of Experts | Arxiv2019'UKP Lab | …… | Mixtral of Experts |
Descriptions | Mixtral is a model based on the Transformer architecture with two key differences:
The model is similar to the Mistral 7B architecture, but each layer includes eight feedforward units ("experts"). During processing, a routing network at each layer selects two "experts" to handle and merge the output for each token. Although only two experts’ data are processed per token, different experts may be chosen at each timestep. As a result, while each token has access to 47B parameters, only 13B active parameters are used during inference. Mixtral was trained with a context range of 32k tokens and has outperformed or matched the Llama 2 70B and GPT-3.5 in benchmarks, particularly excelling in mathematics, code generation, and multilingual tasks. Additionally, a specially tuned model—Mixtral 8x7B – Instruct—has surpassed human benchmark models including GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B chat models. |
||
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | https://github.com/Chinese-Tiny-LLM/Chinese-Tiny-LLM | ||
Descriptions | 本研究介绍了CT-LLM,一个优先考虑中文的大型语言模型(LLM),使用12000亿标记的大型语料库,其中8000亿为中文标记。CT-LLM通过从头开始并主要使用中文数据,展现了在理解和处理中文方面的卓越能力,同时通过对齐技术进一步提升。该模型在CHC-Bench上表现出色,在中文任务中表现优异,并展示了其在英语任务中的熟练程度。本研究挑战了主要使用英语语料库训练LLM的现有方法,开辟了新的训练方法视野。通过开源完整的训练过程和相关资源(如MAP-CC和CHC-Bench),我们希望促进学术界和工业界的进一步探索和创新,推动更包容和多功能的语言模型的发展。 | ||
Confident Adaptive Language Modeling | Source | Other | |
Descriptions | 近年来,基于Transformer的大型语言模型(LLM)在许多任务上取得了显著性能提升,但这些进步伴随着模型规模和推理时间成本的增加。实际上,LLM生成的序列包含不同难度级别的任务,一些预测需要模型的全部计算能力,而其他预测则可以用较少的计算资源完成。在本研究中,我们提出了自信自适应语言建模(CALM)框架,该框架根据输入和生成时间步骤动态分配计算资源。我们解决了早期退出解码的挑战,包括置信度衡量标准、将序列级约束连接到每个token的退出决策以及处理由于早期退出导致的隐藏表示缺失。通过理论分析和实验证明,该框架在保持高性能的同时,可将计算量减少至3倍。 |
Towards a Unified View of Parameter-Efficient Transfer Learning | ICLR2022'Carnegie Mellon University | …… | unify-parameter-efficient-tuning |
---|---|---|---|
Descriptions | This paper presents a unified framework for understanding and improving various parameter-efficient transfer learning methods by modifying specific hidden states in pre-trained models, defining a set of design dimensions to differentiate between methods, and experimentally demonstrating the framework's ability to identify important design choices in previous methods and instantiate new parameter-efficient tuning methods that are more effective with fewer parameters. | ||
QLoRA: Efficient Finetuning of Quantized LLMs | NeurIPS2023'University of Washington | bilibili | Github: QLoRA |
Descriptions | This paper introduces QLoRA, a method for fine-tuning LLMs that significantly reduces memory usage. QLoRA achieves this by:
这篇论文提出了一种名为 QLoRA 的大型语言模型的方法,可以显著降低内存使用量。QLoRA 通过以下方式实现这一点:
这些创新使 QLoRA 能够在内存有限的单个 GPU (48GB) 上微调大型模型 (例如,65B 参数)。 训练出的模型在聊天机器人基准测试上实现了最先进的性能,甚至在某些情况下超过了 ChatGPT 等先前模型的性能。 |
||
Prefix-Tuning: Optimizing Continuous Prompts for Generation | ArXive2021'Stanford University | bilibili | ... |
Descriptions | This paper introduces prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks. Unlike fine-tuning, which modifies all language model parameters, prefix-tuning keeps them frozen and optimizes a small continuous task-specific vector (called the prefix). This allows prefix-tuning to be more efficient than fine-tuning, especially in low-data settings.
这篇论文提出了一种名为“前缀微调”的轻量级替代微调的方法,用于自然语言生成任务。与微调修改所有语言模型参数不同,前缀微调保持参数冻结,并优化一个小的连续任务特定向量 (称为前缀)。这使得前缀微调比微调更有效,尤其是在数据量较小的背景下。 |
||
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks | ACL2022'Tsinghua University | …… | Github: |
Descriptions | The author introduces P-tuning v2, which utilizes deep prompt optimization techniques, such as Prefix Tuning, to improve upon Prompt Tuning and P-Tuning as a universal solution across scales and NLU tasks. Compared to P-tuning, this method incorporates prompt tokens at every layer rather than just at the input layer, bringing two main benefits:
|
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning | EMNLP2023'Peking University | bilibili | Github: ICL |
---|---|---|---|
Descriptions | This paper sheds light on the inner workings of in-context learning (ICL) in LLMs. While ICL has shown promise in enabling LLMs to perform various tasks through demonstrations, the mechanism behind this learning has been unclear. The authors investigate this mechanism through the lens of information flow and discover that labels in the demonstrations act as anchors. These labels serve two key functions: 1) During initial processing, semantic information accumulates within the representations of these label words. 2) This consolidated information acts as a reference point for the LLMs' final predictions. Based on these findings, the paper introduces three novel contributions: 1) An anchor re-weighting method to enhance ICL performance, 2) A demonstration compression technique to improve efficiency, and 3) An analysis framework to diagnose ICL errors in GPT2-XL. The effectiveness of these contributions validates the proposed mechanism and paves the way for future research in ICL.
这篇论文通过信息流视角揭示了大型语言模型 (LLM) 中的上下文学习 (ICL) 的内部工作原理。虽然 ICL 在通过演示让大型语言模型执行各种任务方面表现出潜力,但其背后的学习机制一直不清楚。作者通过信息流的视角研究了这种机制,并发现演示中的标签充当锚点作用。这些标签具有两个关键功能:1) 在初始处理过程中,语义信息会累积在这些标签词的表征中。2) 这种整合的信息作为大型语言模型最终预测的参考点。基于这些发现,论文提出了三项原创贡献:1) 提高 ICL 性能的锚点重新加权方法,2) 提高推理效率的演示压缩技术,3) 用于诊断 GPT2-XL 中 ICL 错误的分析框架。这些贡献的有效性验证了所提出的机制,并为 ICL 的未来研究铺平了道路。。 |
ChatDev: Communicative Agents for Software Development | Source | https://github.com/OpenBMB/ChatDev | Other |
---|---|---|---|
…… | …… | …… | …… |
Descriptions | 软件开发是一个需要多种技能协同的复杂任务。传统深度学习方法在瀑布模型的不同阶段(如设计、编码和测试)中存在技术不一致性,导致开发过程低效。本文提出了ChatDev,一个由大型语言模型驱动的聊天式软件开发框架,利用自然语言和编程语言的统一沟通方式,促进多代理系统在设计、编码和测试阶段的协作,提高了开发效率。ChatDev通过多轮对话生成解决方案,展示了语言作为多代理协作的桥梁的潜力 |
Paper | Source | Link | Other |
---|---|---|---|
…… | …… | …… | …… |
Descriptions | …… |
Paper | Source | Link | Other |
---|---|---|---|
…… | …… | …… | …… |
Descriptions | …… |
Paper | Source | Link | Other |
---|---|---|---|
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning | ICLR2023 | …… | …… |
Descriptions | Diffusion strategies, as a highly expressive class of policies, are used in offline reinforcement learning scenarios to improve learning efficiency and decision-making performance. |
Paper | Source | Link | Other |
---|---|---|---|
Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling | ICLR2023 | …… | …… |
Descriptions | 强化学习(RL)代理通常没有先验知识,从零开始学习。我们提出使用少量样本的大型语言模型(LLMs)来假设并验证一个抽象世界模型(AWM),以提高RL代理的样本效率。DECKARD代理在Minecraft中进行物品制作,通过两个阶段实现:Dream阶段,代理利用LLM将任务分解为子目标形成AWM;Wake阶段,代理为每个子目标学习策略并验证AWM。这种方法不仅显著提高了样本效率,还能纠正LLM中的错误,成功结合LLMs的噪声信息与环境动态中的知识。 |