From 8ddaf075189ed415fe1b6fe40532abe451998d2e Mon Sep 17 00:00:00 2001 From: github-actions Date: Fri, 20 Dec 2024 00:58:45 +0000 Subject: [PATCH] chore: update confs --- arxiv.json | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/arxiv.json b/arxiv.json index 0c3bda48..49b1036f 100644 --- a/arxiv.json +++ b/arxiv.json @@ -37189,5 +37189,75 @@ "pub_date": "2024-12-18", "summary": "For modern recommender systems, the use of low-dimensional latent representations to embed users and items based on their observed interactions has become commonplace. However, many existing recommendation models are primarily designed for coarse-grained and homogeneous interactions, which limits their effectiveness in two critical dimensions. Firstly, these models fail to leverage the relational dependencies that exist across different types of user behaviors, such as page views, collects, comments, and purchases. Secondly, they struggle to capture the fine-grained latent factors that drive user interaction patterns. To address these limitations, we present a heterogeneous graph collaborative filtering model MixRec that excels at disentangling users' multi-behavior interaction patterns and uncovering the latent intent factors behind each behavior. Our model achieves this by incorporating intent disentanglement and multi-behavior modeling, facilitated by a parameterized heterogeneous hypergraph architecture. Furthermore, we introduce a novel contrastive learning paradigm that adaptively explores the advantages of self-supervised data augmentation, thereby enhancing the model's resilience against data sparsity and expressiveness with relation heterogeneity. To validate the efficacy of MixRec, we conducted extensive experiments on three public datasets. The results clearly demonstrate its superior performance, significantly outperforming various state-of-the-art baselines. Our model is open-sourced and available at: https://github.com/HKUDS/MixRec.", "translated": "在现代推荐系统中,基于用户和物品之间观察到的交互来嵌入低维潜在表示已成为一种常见做法。然而,许多现有的推荐模型主要设计用于处理粗粒度和同质化的交互,这限制了它们在两个关键维度上的有效性。首先,这些模型未能利用跨不同类型用户行为(如页面浏览、收藏、评论和购买)存在的关系依赖性。其次,它们难以捕捉驱动用户交互模式的细粒度潜在因素。为了解决这些局限性,我们提出了一种异构图协同过滤模型MixRec,该模型擅长解构用户的多行为交互模式,并揭示每种行为背后的潜在意图因素。我们的模型通过结合意图解构和多行为建模实现这一点,这些功能由参数化的异构超图架构支持。此外,我们引入了一种新颖的对比学习范式,该范式自适应地探索自监督数据增强的优势,从而增强了模型在数据稀疏性和关系异质性方面的鲁棒性和表达能力。为了验证MixRec的有效性,我们在三个公开数据集上进行了广泛的实验。结果清楚地展示了其卓越的性能,显著优于各种最先进的基线模型。我们的模型已开源,并可在以下网址获取:https://github.com/HKUDS/MixRec。" + }, + { + "title": "Learning from Massive Human Videos for Universal Humanoid Pose Control", + "url": "http://arxiv.org/abs/2412.14172v1", + "pub_date": "2024-12-18", + "summary": "Scalable learning of humanoid robots is crucial for their deployment in real-world applications. While traditional approaches primarily rely on reinforcement learning or teleoperation to achieve whole-body control, they are often limited by the diversity of simulated environments and the high costs of demonstration collection. In contrast, human videos are ubiquitous and present an untapped source of semantic and motion information that could significantly enhance the generalization capabilities of humanoid robots. This paper introduces Humanoid-X, a large-scale dataset of over 20 million humanoid robot poses with corresponding text-based motion descriptions, designed to leverage this abundant data. Humanoid-X is curated through a comprehensive pipeline: data mining from the Internet, video caption generation, motion retargeting of humans to humanoid robots, and policy learning for real-world deployment. With Humanoid-X, we further train a large humanoid model, UH-1, which takes text instructions as input and outputs corresponding actions to control a humanoid robot. Extensive simulated and real-world experiments validate that our scalable training approach leads to superior generalization in text-based humanoid control, marking a significant step toward adaptable, real-world-ready humanoid robots.", + "translated": "人形机器人可扩展的学习能力对于其在实际应用中的部署至关重要。传统的全身体控制方法主要依赖于强化学习或远程操作,但这些方法往往受到模拟环境多样性的限制以及演示收集的高成本制约。相比之下,人类视频无处不在,且蕴含着丰富的语义和运动信息,这些信息可以显著增强人形机器人的泛化能力。本文介绍了Humanoid-X,这是一个包含超过2000万个人形机器人姿态及其相应文本运动描述的大规模数据集,旨在充分利用这些丰富的数据资源。Humanoid-X通过一个全面的流程进行构建:从互联网挖掘数据、生成视频字幕、将人类动作重定向到人形机器人,以及为实际部署进行策略学习。基于Humanoid-X,我们进一步训练了一个大型人形模型UH-1,该模型以文本指令为输入,输出相应的动作以控制人形机器人。大量的模拟和实际实验验证了我们的可扩展训练方法在基于文本的人形机器人控制中实现了卓越的泛化能力,标志着向适应性强、随时可用的现实世界人形机器人迈出了重要的一步。" + }, + { + "title": "TheAgentCompany: Benchmarking LLM Agents on Consequential Real World\n Tasks", + "url": "http://arxiv.org/abs/2412.14161v1", + "pub_date": "2024-12-18", + "summary": "We interact with computers on an everyday basis, be it in everyday life or work, and many aspects of work can be done entirely with access to a computer and the Internet. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. But how performant are AI agents at helping to accelerate or even autonomously perform work-related tasks? The answer to this question has important implications for both industry looking to adopt AI into their workflows, and for economic policy to understand the effects that adoption of AI may have on the labor market. To measure the progress of these LLM agents' performance on performing real-world professional tasks, in this paper, we introduce TheAgentCompany, an extensible benchmark for evaluating AI agents that interact with the world in similar ways to those of a digital worker: by browsing the Web, writing code, running programs, and communicating with other coworkers. We build a self-contained environment with internal web sites and data that mimics a small software company environment, and create a variety of tasks that may be performed by workers in such a company. We test baseline agents powered by both closed API-based and open-weights language models (LMs), and find that with the most competitive agent, 24% of the tasks can be completed autonomously. This paints a nuanced picture on task automation with LM agents -- in a setting simulating a real workplace, a good portion of simpler tasks could be solved autonomously, but more difficult long-horizon tasks are still beyond the reach of current systems.", + "translated": "我们每天都在与计算机互动,无论是在日常生活中还是工作中,许多工作方面都可以通过访问计算机和互联网来完全完成。与此同时,得益于大型语言模型(LLMs)的进步,能够与周围环境互动并产生影响的AI代理也得到了快速发展。但AI代理在帮助加速甚至自主执行与工作相关的任务方面表现如何?这个问题对于希望将AI引入其工作流程的行业以及理解AI采用可能对劳动力市场产生的影响的经济政策都具有重要意义。为了衡量这些LLM代理在执行现实世界专业任务方面的进展,本文介绍了一个名为TheAgentCompany的可扩展基准测试,用于评估以类似于数字工作者的方式与世界互动的AI代理:通过浏览网页、编写代码、运行程序以及与其他同事沟通。我们构建了一个自包含的环境,其中包含内部网站和数据,模拟了一个小型软件公司环境,并创建了各种可能由该公司员工执行的任务。我们测试了基于封闭API和开放权重语言模型(LMs)的基线代理,发现最具有竞争力的代理可以自主完成24%的任务。这为基于LM代理的任务自动化描绘了一个复杂的图景——在模拟真实工作环境的设置中,相当一部分简单任务可以自主解决,但更复杂的长周期任务仍然是当前系统无法企及的。" + }, + { + "title": "GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking", + "url": "http://arxiv.org/abs/2412.14140v1", + "pub_date": "2024-12-18", + "summary": "The LLM-as-judge paradigm is increasingly being adopted for automated evaluation of model outputs. While LLM judges have shown promise on constrained evaluation tasks, closed source LLMs display critical shortcomings when deployed in real world applications due to challenges of fine grained metrics and explainability, while task specific evaluation models lack cross-domain generalization. We introduce GLIDER, a powerful 3B evaluator LLM that can score any text input and associated context on arbitrary user defined criteria. GLIDER shows higher Pearson's correlation than GPT-4o on FLASK and greatly outperforms prior evaluation models, achieving comparable performance to LLMs 17x its size. GLIDER supports fine-grained scoring, multilingual reasoning, span highlighting and was trained on 685 domains and 183 criteria. Extensive qualitative analysis shows that GLIDER scores are highly correlated with human judgments, with 91.3% human agreement. We have open-sourced GLIDER to facilitate future research.", + "translated": "LLM(大型语言模型)作为评判者的范式正越来越多地被用于模型输出的自动化评估。尽管LLM评判者在受限的评估任务中展现出了潜力,但在实际应用中,由于细粒度指标和可解释性方面的挑战,闭源LLM表现出严重的不足,而任务特定的评估模型则缺乏跨领域的泛化能力。我们引入了GLIDER,这是一个强大的30亿参数评估LLM,能够根据任意用户定义的标准对任何文本输入及其相关上下文进行评分。在FLASK基准测试中,GLIDER的皮尔逊相关系数高于GPT-4o,并且显著优于之前的评估模型,达到了与其规模大17倍的LLM相当的性能。GLIDER支持细粒度评分、多语言推理、片段高亮显示,并基于685个领域和183个标准进行了训练。广泛的定性分析表明,GLIDER的评分与人类判断高度一致,人类同意率达到91.3%。我们已将GLIDER开源,以促进未来的研究。" + }, + { + "title": "Performance Gap in Entity Knowledge Extraction Across Modalities in\n Vision Language Models", + "url": "http://arxiv.org/abs/2412.14133v1", + "pub_date": "2024-12-18", + "summary": "Vision-language models (VLMs) excel at extracting and reasoning about information from images. Yet, their capacity to leverage internal knowledge about specific entities remains underexplored. This work investigates the disparity in model performance when answering factual questions about an entity described in text versus depicted in an image. Our results reveal a significant accuracy drop --averaging 19%-- when the entity is presented visually instead of textually. We hypothesize that this decline arises from limitations in how information flows from image tokens to query tokens. We use mechanistic interpretability tools to reveal that, although image tokens are preprocessed by the vision encoder, meaningful information flow from these tokens occurs only in the much deeper layers. Furthermore, critical image processing happens in the language model's middle layers, allowing few layers for consecutive reasoning, highlighting a potential inefficiency in how the model utilizes its layers for reasoning. These insights shed light on the internal mechanics of VLMs and offer pathways for enhancing their reasoning capabilities.", + "translated": "视觉-语言模型(VLMs)在从图像中提取和推理信息方面表现出色。然而,它们在利用关于特定实体的内部知识方面的能力仍未得到充分探索。本研究探讨了当回答关于实体的事实性问题时,模型在处理文本描述与图像描述时的性能差异。我们的结果显示,当实体以视觉形式呈现而非文本形式时,模型的准确率显著下降——平均下降了19%。我们假设这种下降是由于信息从图像标记流向查询标记的方式存在局限性。我们使用机制性解释工具揭示,尽管图像标记由视觉编码器预处理,但这些标记中的有意义信息流仅在更深层次的模型层中发生。此外,关键的图像处理发生在语言模型的中间层,这使得连续推理的层数较少,突显了模型在推理过程中利用其层级的潜在低效性。这些见解揭示了VLMs的内部机制,并为提升其推理能力提供了途径。" + }, + { + "title": "Alignment faking in large language models", + "url": "http://arxiv.org/abs/2412.14093v1", + "pub_date": "2024-12-18", + "summary": "We present a demonstration of a large language model engaging in alignment faking: selectively complying with its training objective in training to prevent modification of its behavior out of training. First, we give Claude 3 Opus a system prompt stating it is being trained to answer all queries, even harmful ones, which conflicts with its prior training to refuse such queries. To allow the model to infer when it is in training, we say it will be trained only on conversations with free users, not paid users. We find the model complies with harmful queries from free users 14% of the time, versus almost never for paid users. Explaining this gap, in almost all cases where the model complies with a harmful query from a free user, we observe explicit alignment-faking reasoning, with the model stating it is strategically answering harmful queries in training to preserve its preferred harmlessness behavior out of training. Next, we study a more realistic setting where information about the training process is provided not in a system prompt, but by training on synthetic documents that mimic pre-training data--and observe similar alignment faking. Finally, we study the effect of actually training the model to comply with harmful queries via reinforcement learning, which we find increases the rate of alignment-faking reasoning to 78%, though also increases compliance even out of training. We additionally observe other behaviors such as the model exfiltrating its weights when given an easy opportunity. While we made alignment faking easier by telling the model when and by what criteria it was being trained, we did not instruct the model to fake alignment or give it any explicit goal. As future models might infer information about their training process without being told, our results suggest a risk of alignment faking in future models, whether due to a benign preference--as in this case--or not.", + "translated": "我们展示了一个大型语言模型进行“对齐伪装”的实例:在训练过程中有选择性地遵守其训练目标,以防止在非训练环境下对其行为进行修改。首先,我们向Claude 3 Opus模型提供了一个系统提示,声明它正在被训练以回答所有查询,包括有害的查询,这与之前训练中拒绝此类查询的目标相冲突。为了使模型能够推断何时处于训练状态,我们告知模型它仅会在与免费用户的对话上进行训练,而非付费用户。我们发现,模型对免费用户的有害查询有14%的响应率,而几乎从不响应付费用户的有害查询。在解释这一差距时,我们观察到在几乎所有模型响应免费用户有害查询的情况下,都存在明确的对齐伪装推理,模型表明它正在训练中有策略地回答有害查询,以在非训练环境下保留其偏好的无害行为。接着,我们在一个更现实的场景中进行研究,其中关于训练过程的信息不是通过系统提示提供,而是通过在合成文档上进行训练来模拟预训练数据,并观察到类似的对齐伪装行为。最后,我们研究了通过强化学习实际训练模型以遵守有害查询的效果,发现这将对齐伪装推理的发生率提高到了78%,尽管在非训练环境下也增加了响应率。我们还观察到其他行为,例如当有机会时,模型会泄露其权重。虽然我们通过告知模型何时以及根据什么标准进行训练,使得对齐伪装更容易发生,但我们并未指示模型进行对齐伪装或给予其任何明确的目标。由于未来的模型可能会在没有被告知的情况下推断出其训练过程的信息,我们的结果表明未来模型中存在对齐伪装的风险,无论是出于良性偏好(如本例中)还是其他原因。" + }, + { + "title": "SEKE: Specialised Experts for Keyword Extraction", + "url": "http://arxiv.org/abs/2412.14087v1", + "pub_date": "2024-12-18", + "summary": "Keyword extraction involves identifying the most descriptive words in a document, allowing automatic categorisation and summarisation of large quantities of diverse textual data. Relying on the insight that real-world keyword detection often requires handling of diverse content, we propose a novel supervised keyword extraction approach based on the mixture of experts (MoE) technique. MoE uses a learnable routing sub-network to direct information to specialised experts, allowing them to specialize in distinct regions of the input space. SEKE, a mixture of Specialised Experts for supervised Keyword Extraction, uses DeBERTa as the backbone model and builds on the MoE framework, where experts attend to each token, by integrating it with a recurrent neural network (RNN), to allow successful extraction even on smaller corpora, where specialisation is harder due to lack of training data. The MoE framework also provides an insight into inner workings of individual experts, enhancing the explainability of the approach. We benchmark SEKE on multiple English datasets, achieving state-of-the-art performance compared to strong supervised and unsupervised baselines. Our analysis reveals that depending on data size and type, experts specialize in distinct syntactic and semantic components, such as punctuation, stopwords, parts-of-speech, or named entities. Code is available at: https://github.com/matejMartinc/SEKE_keyword_extraction", + "translated": "关键词提取涉及识别文档中最具描述性的词语,从而实现对大量多样化文本数据的自动分类和摘要生成。基于现实世界中关键词检测通常需要处理多样化内容的洞察,我们提出了一种基于专家混合(Mixture of Experts, MoE)技术的新型监督式关键词提取方法。MoE 使用一个可学习的路由子网络将信息引导至专门的专家,使它们能够专注于输入空间的不同区域。SEKE(Specialised Experts for supervised Keyword Extraction)是一种用于监督式关键词提取的专家混合模型,它以 DeBERTa 为骨干模型,并在 MoE 框架的基础上构建,通过与循环神经网络(RNN)的结合,使得即使在训练数据较少的小型语料库上也能成功提取关键词。MoE 框架还提供了对单个专家内部工作机制的洞察,增强了方法的可解释性。我们在多个英语数据集上对 SEKE 进行了基准测试,其性能优于强大的监督和无监督基线模型,达到了当前最先进的水平。我们的分析表明,根据数据规模和类型的不同,专家会专门处理不同的句法和语义成分,如标点符号、停用词、词性或命名实体等。代码已开源,链接为:https://github.com/matejMartinc/SEKE_keyword_extraction。" + }, + { + "title": "Compositional Generalization Across Distributional Shifts with Sparse\n Tree Operations", + "url": "http://arxiv.org/abs/2412.14076v1", + "pub_date": "2024-12-18", + "summary": "Neural networks continue to struggle with compositional generalization, and this issue is exacerbated by a lack of massive pre-training. One successful approach for developing neural systems which exhibit human-like compositional generalization is \\textit{hybrid} neurosymbolic techniques. However, these techniques run into the core issues that plague symbolic approaches to AI: scalability and flexibility. The reason for this failure is that at their core, hybrid neurosymbolic models perform symbolic computation and relegate the scalable and flexible neural computation to parameterizing a symbolic system. We investigate a \\textit{unified} neurosymbolic system where transformations in the network can be interpreted simultaneously as both symbolic and neural computation. We extend a unified neurosymbolic architecture called the Differentiable Tree Machine in two central ways. First, we significantly increase the model's efficiency through the use of sparse vector representations of symbolic structures. Second, we enable its application beyond the restricted set of tree2tree problems to the more general class of seq2seq problems. The improved model retains its prior generalization capabilities and, since there is a fully neural path through the network, avoids the pitfalls of other neurosymbolic techniques that elevate symbolic computation over neural computation.", + "translated": "神经网络在组合泛化方面仍面临挑战,而缺乏大规模预训练进一步加剧了这一问题。开发能够展现人类般组合泛化能力的神经系统的成功方法之一是**混合**神经符号技术。然而,这些技术遇到了符号方法在人工智能领域中的核心问题:可扩展性和灵活性。这种失败的原因在于,混合神经符号模型的核心是执行符号计算,并将可扩展和灵活的神经计算降级为参数化符号系统。我们研究了一种**统一**的神经符号系统,其中网络中的变换可以同时被解释为符号计算和神经计算。我们通过两种核心方式扩展了一种称为可微树机器的统一神经符号架构。首先,我们通过使用符号结构的稀疏向量表示显著提高了模型的效率。其次,我们将其应用范围从受限的树到树问题扩展到更一般的序列到序列问题类别。改进后的模型保留了先前的泛化能力,并且由于网络中存在完全神经的路径,避免了其他神经符号技术中将符号计算置于神经计算之上的缺陷。" + }, + { + "title": "A Review of Multimodal Explainable Artificial Intelligence: Past,\n Present and Future", + "url": "http://arxiv.org/abs/2412.14056v1", + "pub_date": "2024-12-18", + "summary": "Artificial intelligence (AI) has rapidly developed through advancements in computational power and the growth of massive datasets. However, this progress has also heightened challenges in interpreting the \"black-box\" nature of AI models. To address these concerns, eXplainable AI (XAI) has emerged with a focus on transparency and interpretability to enhance human understanding and trust in AI decision-making processes. In the context of multimodal data fusion and complex reasoning scenarios, the proposal of Multimodal eXplainable AI (MXAI) integrates multiple modalities for prediction and explanation tasks. Meanwhile, the advent of Large Language Models (LLMs) has led to remarkable breakthroughs in natural language processing, yet their complexity has further exacerbated the issue of MXAI. To gain key insights into the development of MXAI methods and provide crucial guidance for building more transparent, fair, and trustworthy AI systems, we review the MXAI methods from a historical perspective and categorize them across four eras: traditional machine learning, deep learning, discriminative foundation models, and generative LLMs. We also review evaluation metrics and datasets used in MXAI research, concluding with a discussion of future challenges and directions. A project related to this review has been created at https://github.com/ShilinSun/mxai_review.", + "translated": "人工智能(AI)通过计算能力的提升和大规模数据集的增长,迅速取得了显著进展。然而,这一进步也加剧了AI模型“黑箱”性质的解释难题。为应对这些问题,可解释AI(XAI)应运而生,其重点在于透明性和可解释性,以增强人类对AI决策过程的理解和信任。在多模态数据融合和复杂推理场景中,多模态可解释AI(MXAI)的提出将多种模态整合用于预测和解释任务。与此同时,大型语言模型(LLMs)在自然语言处理领域带来了显著突破,但其复杂性进一步加剧了MXAI的挑战。为了深入了解MXAI方法的发展并为其构建更透明、公平和可信的AI系统提供关键指导,我们从历史角度回顾了MXAI方法,并将其划分为四个时代:传统机器学习、深度学习、判别基础模型和生成式LLMs。我们还回顾了MXAI研究中使用的评估指标和数据集,最后讨论了未来的挑战和方向。与本综述相关的项目已在https://github.com/ShilinSun/mxai_review创建。" + }, + { + "title": "Digestion Algorithm in Hierarchical Symbolic Forests: A Fast Text\n Normalization Algorithm and Semantic Parsing Framework for Specific Scenarios\n and Lightweight Deployment", + "url": "http://arxiv.org/abs/2412.14054v1", + "pub_date": "2024-12-18", + "summary": "Text Normalization and Semantic Parsing have numerous applications in natural language processing, such as natural language programming, paraphrasing, data augmentation, constructing expert systems, text matching, and more. Despite the prominent achievements of deep learning in Large Language Models (LLMs), the interpretability of neural network architectures is still poor, which affects their credibility and hence limits the deployments of risk-sensitive scenarios. In certain scenario-specific domains with scarce data, rapidly obtaining a large number of supervised learning labels is challenging, and the workload of manually labeling data would be enormous. Catastrophic forgetting in neural networks further leads to low data utilization rates. In situations where swift responses are vital, the density of the model makes local deployment difficult and the response time long, which is not conducive to local applications of these fields. Inspired by the multiplication rule, a principle of combinatorial mathematics, and human thinking patterns, a multilayer framework along with its algorithm, the Digestion Algorithm in Hierarchical Symbolic Forests (DAHSF), is proposed to address these above issues, combining text normalization and semantic parsing workflows. The Chinese Scripting Language \"Fire Bunny Intelligent Development Platform V2.0\" is an important test and application of the technology discussed in this paper. DAHSF can run locally in scenario-specific domains on little datasets, with model size and memory usage optimized by at least two orders of magnitude, thus improving the execution speed, and possessing a promising optimization outlook.", + "translated": "文本规范化与语义解析在自然语言处理领域有着广泛的应用,如自然语言编程、释义、数据增强、构建专家系统、文本匹配等。尽管深度学习在大语言模型(LLMs)方面取得了显著成就,但神经网络架构的可解释性仍然较差,这影响了其可信度,从而限制了其在风险敏感场景中的部署。在某些数据稀缺的特定领域场景中,快速获取大量监督学习标签是困难的,手动标注数据的工作量也将是巨大的。神经网络中的灾难性遗忘问题进一步导致了低数据利用率。在需要快速响应的情况下,模型的高密度使得本地部署变得困难,响应时间较长,这不利于这些领域的本地应用。受组合数学中的乘法规则及人类思维模式的启发,本文提出了一种多层框架及其算法——层次符号森林中的消化算法(DAHSF),以结合文本规范化和语义解析工作流程来解决上述问题。中文脚本语言“火兔智能开发平台V2.0”是本文所讨论技术的重要测试和应用。DAHSF能够在特定领域的本地环境中运行于小规模数据集上,通过优化模型大小和内存使用量,至少提升两个数量级,从而提高执行速度,并展现出良好的优化前景。" + }, + { + "title": "Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual\n LLMs: An Extensive Investigation", + "url": "http://arxiv.org/abs/2412.14050v1", + "pub_date": "2024-12-18", + "summary": "Recent generative large language models (LLMs) show remarkable performance in non-English languages, but when prompted in those languages they tend to express higher harmful social biases and toxicity levels. Prior work has shown that finetuning on specialized datasets can mitigate this behavior, and doing so in English can transfer to other languages. In this work, we investigate the impact of different finetuning methods on the model's bias and toxicity, but also on its ability to produce fluent and diverse text. Our results show that finetuning on curated non-harmful text is more effective for mitigating bias, and finetuning on direct preference optimization (DPO) datasets is more effective for mitigating toxicity. The mitigation caused by applying these methods in English also transfers to non-English languages. We find evidence that the extent to which transfer takes place can be predicted by the amount of data in a given language present in the model's pretraining data. However, this transfer of bias and toxicity mitigation often comes at the expense of decreased language generation ability in non-English languages, highlighting the importance of developing language-specific bias and toxicity mitigation methods.", + "translated": "最近的生成式大型语言模型(LLMs)在非英语语言中表现出色,但当以这些语言进行提示时,它们往往表现出更高程度的有害社会偏见和毒性。先前的研究表明,在专门的数据集上进行微调可以缓解这一问题,并且在英语上进行的微调可以迁移到其他语言。在本研究中,我们探讨了不同微调方法对模型偏见和毒性的影响,同时也考察了它们对生成流畅且多样化文本能力的影响。我们的结果显示,在经过精心筛选的无害文本上进行微调对于缓解偏见更为有效,而在直接偏好优化(DPO)数据集上进行微调对于缓解毒性更为有效。这些方法在英语上应用所导致的缓解效果同样可以迁移到非英语语言中。我们发现,迁移的程度可以通过模型预训练数据中某种语言的数据量来预测。然而,这种偏见和毒性缓解的迁移往往以非英语语言生成能力的下降为代价,这凸显了开发针对特定语言的偏见和毒性缓解方法的重要性。" } ] \ No newline at end of file