From fb473ee694fba60a4990b73d48081159b07fba4c Mon Sep 17 00:00:00 2001 From: github-actions Date: Tue, 7 Jan 2025 00:57:37 +0000 Subject: [PATCH] chore: update confs --- arxiv.json | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/arxiv.json b/arxiv.json index 95af2c78..f6bd7763 100644 --- a/arxiv.json +++ b/arxiv.json @@ -37994,5 +37994,75 @@ "pub_date": "2025-01-03", "summary": "Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms and the exponential growth of users and items, the importance of cold-start recommendation (CSR) is becoming increasingly evident. At the same time, large language models (LLMs) have achieved tremendous success and possess strong capabilities in modeling user and item information, providing new potential for cold-start recommendations. However, the research community on CSR still lacks a comprehensive review and reflection in this field. Based on this, in this paper, we stand in the context of the era of large language models and provide a comprehensive review and discussion on the roadmap, related literature, and future directions of CSR. Specifically, we have conducted an exploration of the development path of how existing CSR utilizes information, from content features, graph relations, and domain information, to the world knowledge possessed by large language models, aiming to provide new insights for both the research and industrial communities on CSR. Related resources of cold-start recommendations are collected and continuously updated for the community in https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation.", "translated": "冷启动问题是推荐系统中长期存在的挑战之一,其核心在于如何准确建模新用户或交互有限的用户及项目,以提供更优质的推荐服务。随着互联网平台的多样化以及用户和项目数量的指数级增长,冷启动推荐(CSR)的重要性日益凸显。与此同时,大型语言模型(LLMs)取得了巨大成功,并在用户与项目信息建模方面展现出强大的能力,为冷启动推荐提供了新的潜力。然而,冷启动推荐研究领域目前仍缺乏对这一领域的全面回顾与反思。基于此,本文站在大型语言模型时代的背景下,对冷启动推荐的发展路线、相关文献以及未来方向进行了全面的回顾与探讨。具体而言,我们探索了现有冷启动推荐如何利用信息的发展路径,从内容特征、图关系、领域信息,到大型语言模型所具备的世界知识,旨在为冷启动推荐的研究界与工业界提供新的见解。冷启动推荐的相关资源已收集并在以下链接持续更新:https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation。" + }, + { + "title": "Metadata Conditioning Accelerates Language Model Pre-training", + "url": "http://arxiv.org/abs/2501.01956v1", + "pub_date": "2025-01-03", + "summary": "The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging. To address this, we propose a new method, termed Metadata Conditioning then Cooldown (MeCo), to incorporate additional learning cues during pre-training. MeCo first provides metadata (e.g., URLs like en.wikipedia.org) alongside the text during training and later uses a cooldown phase with only the standard text, thereby enabling the model to function normally even without metadata. MeCo significantly accelerates pre-training across different model scales (600M to 8B parameters) and training sources (C4, RefinedWeb, and DCLM). For instance, a 1.6B language model trained with MeCo matches the downstream task performance of standard pre-training while using 33% less data. Additionally, MeCo enables us to steer language models by conditioning the inference prompt on either real or fabricated metadata that encodes the desired properties of the output: for example, prepending wikipedia.org to reduce harmful generations or factquizmaster.com (fabricated) to improve common knowledge task performance. We also demonstrate that MeCo is compatible with different types of metadata, such as model-generated topics. MeCo is remarkably simple, adds no computational overhead, and demonstrates promise in producing more capable and steerable language models.", + "translated": "语言模型预训练语料库中风格、领域和质量水平的巨大多样性对于开发通用模型能力至关重要,但高效学习和部署这些异构数据源中所体现的正确行为具有挑战性。为此,我们提出了一种新方法,称为元数据调节后冷却(Metadata Conditioning then Cooldown, MeCo),在预训练过程中引入额外的学习线索。MeCo首先在训练期间将元数据(例如,像en.wikipedia.org这样的URL)与文本一起提供,然后在冷却阶段仅使用标准文本,从而使模型即使在没有元数据的情况下也能正常运行。MeCo显著加速了不同模型规模(6亿到80亿参数)和训练数据源(C4、RefinedWeb和DCLM)的预训练。例如,使用MeCo训练的16亿参数语言模型在使用33%更少数据的情况下,达到了标准预训练的下游任务性能。此外,MeCo使我们能够通过在推理提示中加入真实或虚构的元数据来引导语言模型,这些元数据编码了输出的期望属性:例如,添加wikipedia.org以减少有害生成,或添加factquizmaster.com(虚构)以提高常识任务性能。我们还展示了MeCo与不同类型元数据(如模型生成的主题)的兼容性。MeCo非常简单,不增加计算开销,并展示了在生成更强大和可引导的语言模型方面的潜力。" + }, + { + "title": "Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues\n and Challenges", + "url": "http://arxiv.org/abs/2501.01933v1", + "pub_date": "2025-01-03", + "summary": "This thesis presents Abstractive Text Summarization models for contemporary Sanskrit prose. The first chapter, titled Introduction, presents the motivation behind this work, the research questions, and the conceptual framework. Sanskrit is a low-resource inflectional language. The key research question that this thesis investigates is what the challenges in developing an abstractive TS for Sanskrit. To answer the key research questions, sub-questions based on four different themes have been posed in this work. The second chapter, Literature Review, surveys the previous works done. The third chapter, data preparation, answers the remaining three questions from the third theme. It reports the data collection and preprocessing challenges for both language model and summarization model trainings. The fourth chapter reports the training and inference of models and the results obtained therein. This research has initiated a pipeline for Sanskrit abstractive text summarization and has reported the challenges faced at every stage of the development. The research questions based on every theme have been answered to answer the key research question.", + "translated": "本论文提出了针对当代梵文散文的抽象文本摘要模型。第一章题为引言,介绍了本研究的动机、研究问题和概念框架。梵文是一种资源稀缺的屈折语。本论文探讨的关键研究问题是:开发梵文抽象文本摘要模型面临哪些挑战?为了回答这一关键问题,本研究基于四个不同的主题提出了子问题。第二章为文献综述,回顾了以往的相关研究。第三章为数据准备,回答了第三个主题中剩余的三个问题。该章节报告了在语言模型和摘要模型训练过程中数据收集和预处理所面临的挑战。第四章报告了模型的训练和推理过程以及所获得的结果。本研究为梵文抽象文本摘要建立了一个初步的流程,并报告了在开发过程中每个阶段所面临的挑战。基于每个主题的研究问题均已得到解答,从而回答了关键研究问题。" + }, + { + "title": "Turning Logic Against Itself : Probing Model Defenses Through\n Contrastive Questions", + "url": "http://arxiv.org/abs/2501.01872v1", + "pub_date": "2025-01-03", + "summary": "Despite significant efforts to align large language models with human values and ethical guidelines, these models remain susceptible to sophisticated jailbreak attacks that exploit their reasoning capabilities. Traditional safety mechanisms often focus on detecting explicit malicious intent, leaving deeper vulnerabilities unaddressed. In this work, we introduce a jailbreak technique, POATE (Polar Opposite query generation, Adversarial Template construction, and Elaboration), which leverages contrastive reasoning to elicit unethical responses. POATE generates prompts with semantically opposite intents and combines them with adversarial templates to subtly direct models toward producing harmful responses. We conduct extensive evaluations across six diverse language model families of varying parameter sizes, including LLaMA3, Gemma2, Phi3, and GPT-4, to demonstrate the robustness of the attack, achieving significantly higher attack success rates (~44%) compared to existing methods. We evaluate our proposed attack against seven safety defenses, revealing their limitations in addressing reasoning-based vulnerabilities. To counteract this, we propose a defense strategy that improves reasoning robustness through chain-of-thought prompting and reverse thinking, mitigating reasoning-driven adversarial exploits.", + "translated": "尽管在使大型语言模型与人类价值观和伦理准则对齐方面付出了巨大努力,这些模型仍然容易受到利用其推理能力的复杂越狱攻击。传统的安全机制通常侧重于检测明确的恶意意图,而忽视了更深层次的漏洞。在本研究中,我们引入了一种越狱技术——POATE(极性相反查询生成、对抗性模板构建和详细阐述),该技术利用对比推理来引发不道德的响应。POATE通过生成语义相反意图的提示,并将其与对抗性模板结合,巧妙地引导模型生成有害响应。我们对包括LLaMA3、Gemma2、Phi3和GPT-4在内的六种不同参数规模的语言模型家族进行了广泛评估,以证明该攻击的鲁棒性,与现有方法相比,攻击成功率显著提高(约44%)。我们针对七种安全防御措施评估了所提出的攻击,揭示了它们在应对基于推理的漏洞方面的局限性。为了应对这一问题,我们提出了一种防御策略,通过思维链提示和逆向思维来提高推理的鲁棒性,从而减轻推理驱动的对抗性利用。" + }, + { + "title": "Time Series Language Model for Descriptive Caption Generation", + "url": "http://arxiv.org/abs/2501.01832v1", + "pub_date": "2025-01-03", + "summary": "The automatic generation of representative natural language descriptions for observable patterns in time series data enhances interpretability, simplifies analysis and increases cross-domain utility of temporal data. While pre-trained foundation models have made considerable progress in natural language processing (NLP) and computer vision (CV), their application to time series analysis has been hindered by data scarcity. Although several large language model (LLM)-based methods have been proposed for time series forecasting, time series captioning is under-explored in the context of LLMs. In this paper, we introduce TSLM, a novel time series language model designed specifically for time series captioning. TSLM operates as an encoder-decoder model, leveraging both text prompts and time series data representations to capture subtle temporal patterns across multiple phases and generate precise textual descriptions of time series inputs. TSLM addresses the data scarcity problem in time series captioning by first leveraging an in-context prompting synthetic data generation, and second denoising the generated data via a novel cross-modal dense retrieval scoring applied to time series-caption pairs. Experimental findings on various time series captioning datasets demonstrate that TSLM outperforms existing state-of-the-art approaches from multiple data modalities by a significant margin.", + "translated": "为时间序列数据中的可观测模式自动生成具有代表性的自然语言描述,能够增强可解释性、简化分析过程,并提升时间数据在跨领域的实用性。尽管预训练的基础模型在自然语言处理(NLP)和计算机视觉(CV)领域取得了显著进展,但由于数据稀缺,它们在时间序列分析中的应用受到了限制。虽然已有几种基于大型语言模型(LLM)的方法被提出用于时间序列预测,但在LLM的背景下,时间序列描述生成的研究仍处于探索阶段。本文提出了一种新颖的时间序列语言模型TSLM,专门设计用于时间序列描述生成。TSLM采用编码器-解码器架构,利用文本提示和时间序列数据表示来捕捉多阶段的细微时间模式,并生成对时间序列输入的精确文本描述。TSLM通过两种方式解决了时间序列描述生成中的数据稀缺问题:首先,利用上下文提示合成数据生成;其次,通过对时间序列-描述对应用一种新颖的跨模态密集检索评分来去噪生成的数据。在多个时间序列描述生成数据集上的实验结果表明,TSLM显著优于来自多种数据模态的最先进方法。" + }, + { + "title": "Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large\n Language Models", + "url": "http://arxiv.org/abs/2501.01830v1", + "pub_date": "2025-01-03", + "summary": "Automated red-teaming has become a crucial approach for uncovering vulnerabilities in large language models (LLMs). However, most existing methods focus on isolated safety flaws, limiting their ability to adapt to dynamic defenses and uncover complex vulnerabilities efficiently. To address this challenge, we propose Auto-RT, a reinforcement learning framework that automatically explores and optimizes complex attack strategies to effectively uncover security vulnerabilities through malicious queries. Specifically, we introduce two key mechanisms to reduce exploration complexity and improve strategy optimization: 1) Early-terminated Exploration, which accelerate exploration by focusing on high-potential attack strategies; and 2) Progressive Reward Tracking algorithm with intermediate downgrade models, which dynamically refine the search trajectory toward successful vulnerability exploitation. Extensive experiments across diverse LLMs demonstrate that, by significantly improving exploration efficiency and automatically optimizing attack strategies, Auto-RT detects a boarder range of vulnerabilities, achieving a faster detection speed and 16.63\\% higher success rates compared to existing methods.", + "translated": "自动化红队测试已成为揭示大型语言模型(LLMs)漏洞的关键方法。然而,现有的大多数方法主要关注孤立的安全缺陷,限制了其适应动态防御和高效发现复杂漏洞的能力。为解决这一挑战,我们提出了Auto-RT,一种强化学习框架,能够自动探索和优化复杂攻击策略,通过恶意查询有效揭示安全漏洞。具体而言,我们引入了两种关键机制来降低探索复杂性并改进策略优化:1)**早期终止探索**,通过专注于高潜力攻击策略来加速探索;2)**渐进式奖励跟踪算法**,结合中间降级模型,动态优化搜索轨迹以成功利用漏洞。在多种LLMs上的广泛实验表明,通过显著提高探索效率和自动优化攻击策略,Auto-RT能够检测到更广泛的漏洞,相比现有方法,检测速度更快且成功率提高了16.63%。" + }, + { + "title": "The Proof is in the Almond Cookies", + "url": "http://arxiv.org/abs/2501.01827v1", + "pub_date": "2025-01-03", + "summary": "This paper presents a case study on how to process cooking recipes (and more generally, how-to instructions) in a way that makes it possible for a robot or artificial cooking assistant to support human chefs in the kitchen. Such AI assistants would be of great benefit to society, as they can help to sustain the autonomy of aging adults or people with a physical impairment, or they may reduce the stress in a professional kitchen. We propose a novel approach to computational recipe understanding that mimics the human sense-making process, which is narrative-based. Using an English recipe for almond crescent cookies as illustration, we show how recipes can be modelled as rich narrative structures by integrating various knowledge sources such as language processing, ontologies, and mental simulation. We show how such narrative structures can be used for (a) dealing with the challenges of recipe language, such as zero anaphora, (b) optimizing a robot's planning process, (c) measuring how well an AI system understands its current tasks, and (d) allowing recipe annotations to become language-independent.", + "translated": "本文通过案例研究探讨了如何处理烹饪食谱(以及更一般的操作说明),以使机器人或人工智能烹饪助手能够在厨房中为人类厨师提供支持。这类人工智能助手对社会大有裨益,因为它们可以帮助维持老年人或身体障碍人士的自主生活能力,或者减轻专业厨房中的压力。我们提出了一种新颖的计算食谱理解方法,该方法模仿了人类基于叙事的理解过程。通过以杏仁新月曲奇饼的英文食谱为例,我们展示了如何通过整合语言处理、本体论和心理模拟等多种知识源,将食谱建模为丰富的叙事结构。我们进一步展示了这种叙事结构如何用于:(a)应对食谱语言中的挑战,如零回指;(b)优化机器人的规划过程;(c)衡量人工智能系统对其当前任务的理解程度;(d)使食谱注释实现语言无关性。" + }, + { + "title": "SDPO: Segment-Level Direct Preference Optimization for Social Agents", + "url": "http://arxiv.org/abs/2501.01821v1", + "pub_date": "2025-01-03", + "summary": "Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex goal-oriented social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human preferences across a variety of agent tasks. Existing DPO-based approaches for multi-turn interactions are divided into turn-level and session-level methods. The turn-level method is overly fine-grained, focusing exclusively on individual turns, while session-level methods are too coarse-grained, often introducing training noise. To address these limitations, we propose Segment-Level Direct Preference Optimization (SDPO), which focuses on specific key segments within interactions to optimize multi-turn agent behavior while minimizing training noise. Evaluations on the SOTOPIA benchmark demonstrate that SDPO-tuned agents consistently outperform both existing DPO-based methods and proprietary LLMs like GPT-4o, underscoring SDPO's potential to advance the social intelligence of LLM-based agents. We release our code and data at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/SDPO.", + "translated": "由大语言模型(LLMs)驱动的社交代理能够模拟人类的社交行为,但在处理复杂的面向目标的社交对话时表现不足。直接偏好优化(Direct Preference Optimization, DPO)已被证明在将LLM行为与人类偏好对齐方面对各种代理任务有效。现有的基于DPO的多轮交互方法分为轮次级(turn-level)和会话级(session-level)方法。轮次级方法过于细粒度,仅关注单个轮次,而会话级方法则过于粗粒度,常常引入训练噪声。为了解决这些局限性,我们提出了**分段级直接偏好优化(Segment-Level Direct Preference Optimization, SDPO)**,该方法专注于交互中的特定关键片段,以优化多轮代理行为,同时最小化训练噪声。在SOTOPIA基准测试中的评估表明,经过SDPO调优的代理在性能上持续优于现有的基于DPO的方法以及像GPT-4o这样的专有LLMs,突显了SDPO在提升基于LLM的代理社交智能方面的潜力。我们在https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/SDPO 上发布了代码和数据。" + }, + { + "title": "End-to-End Long Document Summarization using Gradient Caching", + "url": "http://arxiv.org/abs/2501.01805v1", + "pub_date": "2025-01-03", + "summary": "Training transformer-based encoder-decoder models for long document summarization poses a significant challenge due to the quadratic memory consumption during training. Several approaches have been proposed to extend the input length at test time, but training with these approaches is still difficult, requiring truncation of input documents and causing a mismatch between training and test conditions. In this work, we propose CachED (Gradient $\\textbf{Cach}$ing for $\\textbf{E}$ncoder-$\\textbf{D}$ecoder models), an approach that enables end-to-end training of existing transformer-based encoder-decoder models, using the entire document without truncation. Specifically, we apply non-overlapping sliding windows to input documents, followed by fusion in decoder. During backpropagation, the gradients are cached at the decoder and are passed through the encoder in chunks by re-computing the hidden vectors, similar to gradient checkpointing. In the experiments on long document summarization, we extend BART to CachED BART, processing more than 500K tokens during training and achieving superior performance without using any additional parameters.", + "translated": "训练基于Transformer的编码器-解码器模型进行长文档摘要任务面临着一个重大挑战,即训练过程中二次方的内存消耗。尽管已有几种方法在测试时扩展了输入长度,但使用这些方法进行训练仍然困难,需要对输入文档进行截断,导致训练和测试条件不匹配。在本研究中,我们提出了CachED(编码器-解码器模型的梯度缓存),这种方法能够对现有的基于Transformer的编码器-解码器模型进行端到端训练,且无需截断整个文档。具体而言,我们对输入文档应用非重叠的滑动窗口,随后在解码器中进行融合。在反向传播过程中,梯度在解码器处被缓存,并通过重新计算隐藏向量分块传递回编码器,类似于梯度检查点技术。在长文档摘要的实验上,我们将BART扩展为CachED BART,在训练过程中处理超过50万个标记,且在不使用任何额外参数的情况下实现了卓越的性能。" + }, + { + "title": "Reading Between the Lines: A dataset and a study on why some texts are\n tougher than others", + "url": "http://arxiv.org/abs/2501.01796v1", + "pub_date": "2025-01-03", + "summary": "Our research aims at better understanding what makes a text difficult to read for specific audiences with intellectual disabilities, more specifically, people who have limitations in cognitive functioning, such as reading and understanding skills, an IQ below 70, and challenges in conceptual domains. We introduce a scheme for the annotation of difficulties which is based on empirical research in psychology as well as on research in translation studies. The paper describes the annotated dataset, primarily derived from the parallel texts (standard English and Easy to Read English translations) made available online. we fine-tuned four different pre-trained transformer models to perform the task of multiclass classification to predict the strategies required for simplification. We also investigate the possibility to interpret the decisions of this language model when it is aimed at predicting the difficulty of sentences. The resources are available from https://github.com/Nouran-Khallaf/why-tough", + "translated": "我们的研究旨在更好地理解是什么使得特定智力障碍受众难以阅读文本,特别是那些在认知功能(如阅读和理解能力)方面存在限制、智商低于70并在概念领域面临挑战的人群。我们引入了一种基于心理学实证研究和翻译学研究的难度标注方案。本文描述了标注的数据集,该数据集主要来源于在线提供的平行文本(标准英语和易读英语翻译)。我们微调了四种不同的预训练Transformer模型,以执行多类别分类任务,预测简化所需的策略。我们还探讨了当该语言模型旨在预测句子难度时解释其决策的可能性。相关资源可从https://github.com/Nouran-Khallaf/why-tough获取。" + }, + { + "title": "Automating Legal Concept Interpretation with LLMs: Retrieval,\n Generation, and Evaluation", + "url": "http://arxiv.org/abs/2501.01743v1", + "pub_date": "2025-01-03", + "summary": "Legal articles often include vague concepts to adapt to the ever-changing society. Providing detailed interpretations of these concepts is a critical task for legal practitioners, which requires meticulous and professional annotations by legal experts, admittedly time-consuming and expensive to collect at scale. In this paper, we introduce a novel retrieval-augmented generation framework, ATRI, for AuTomatically Retrieving relevant information from past judicial precedents and Interpreting vague legal concepts. We further propose a new benchmark, Legal Concept Entailment, to automate the evaluation of generated concept interpretations without expert involvement. Automatic evaluations indicate that our generated interpretations can effectively assist large language models (LLMs) in understanding vague legal concepts. Multi-faceted evaluations by legal experts indicate that the quality of our concept interpretations is comparable to those written by human experts. Our work has strong implications for leveraging LLMs to support legal practitioners in interpreting vague legal concepts and beyond.", + "translated": "法律条文常常包含模糊概念以适应不断变化的社会。对这些概念进行详细解释是法律从业者的一项关键任务,这需要法律专家进行细致且专业的注释,但大规模收集这些注释无疑耗时且昂贵。本文提出了一种新颖的检索增强生成框架——ATRI,用于从过去的司法判例中自动检索相关信息并解释模糊的法律概念。我们进一步提出了一个新的基准——法律概念蕴含(Legal Concept Entailment),以自动化地评估生成的概念解释,而无需专家参与。自动评估结果表明,我们生成的概念解释能有效帮助大语言模型(LLMs)理解模糊的法律概念。法律专家的多方面评估表明,我们生成的概念解释质量与人类专家撰写的解释相当。我们的工作对利用LLMs支持法律从业者解释模糊法律概念及其他领域具有重要的启示意义。" } ] \ No newline at end of file