chore: update confs

Doragd · Jan 9, 2025 · 48aaa90 · 48aaa90
1 parent e908fc0
commit 48aaa90
Showing 1 changed file with 35 additions and 0 deletions.
diff --git a/arxiv.json b/arxiv.json
@@ -38274,5 +38274,40 @@
         "pub_date": "2025-01-07",
         "summary": "Slot and intent detection (SID) is a classic natural language understanding task. Despite this, research has only more recently begun focusing on SID for dialectal and colloquial varieties. Many approaches for low-resource scenarios have not yet been applied to dialectal SID data, or compared to each other on the same datasets. We participate in the VarDial 2025 shared task on slot and intent detection in Norwegian varieties, and compare multiple set-ups: varying the training data (English, Norwegian, or dialectal Norwegian), injecting character-level noise, training on auxiliary tasks, and applying Layer Swapping, a technique in which layers of models fine-tuned on different datasets are assembled into a model. We find noise injection to be beneficial while the effects of auxiliary tasks are mixed. Though some experimentation was required to successfully assemble a model from layers, it worked surprisingly well; a combination of models trained on English and small amounts of dialectal data produced the most robust slot predictions. Our best models achieve 97.6% intent accuracy and 85.6% slot F1 in the shared task.",
         "translated": "槽位和意图检测（Slot and Intent Detection, SID）是一项经典的自然语言理解任务。尽管如此，研究直到最近才开始关注方言和口语变体中的SID。许多针对低资源场景的方法尚未应用于方言SID数据，或尚未在同一数据集上进行相互比较。我们参与了VarDial 2025关于挪威语变体中槽位和意图检测的共享任务，并比较了多种设置：改变训练数据（英语、挪威语或挪威方言）、注入字符级噪声、训练辅助任务，以及应用层交换技术（Layer Swapping），该技术将微调于不同数据集的模型层组合成一个模型。我们发现噪声注入是有益的，而辅助任务的效果则参差不齐。尽管需要一些实验才能成功从层中组装模型，但其效果出奇地好；结合使用英语和少量方言数据训练的模型产生了最稳健的槽位预测。在共享任务中，我们最好的模型实现了97.6%的意图准确率和85.6%的槽位F1得分。"
+    },
+    {
+        "title": "Re-ranking the Context for Multimodal Retrieval Augmented Generation",
+        "url": "http://arxiv.org/abs/2501.04695v1",
+        "pub_date": "2025-01-08",
+        "summary": "Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context with improved accuracy and reduced hallucinations. However, multi-modal RAG systems face unique challenges: (i) the retrieval process may select irrelevant entries to user query (e.g., images, documents), and (ii) vision-language models or multi-modal language models like GPT-4o may hallucinate when processing these entries to generate RAG output. In this paper, we aim to address the first challenge, i.e, improving the selection of relevant context from the knowledge-base in retrieval phase of the multi-modal RAG. Specifically, we leverage the relevancy score (RS) measure designed in our previous work for evaluating the RAG performance to select more relevant entries in retrieval process. The retrieval based on embeddings, say CLIP-based embedding, and cosine similarity usually perform poorly particularly for multi-modal data. We show that by using a more advanced relevancy measure, one can enhance the retrieval process by selecting more relevant pieces from the knowledge-base and eliminate the irrelevant pieces from the context by adaptively selecting up-to-$k$ entries instead of fixed number of entries. Our evaluation using COCO dataset demonstrates significant enhancement in selecting relevant context and accuracy of the generated response.",
+        "translated": "检索增强生成（RAG）通过引入外部知识来增强大型语言模型（LLMs），从而在特定上下文中生成更准确且减少幻觉的响应。然而，多模态RAG系统面临独特的挑战：（i）检索过程可能会选择与用户查询无关的条目（如图像、文档），（ii）视觉语言模型或如GPT-4o的多模态语言模型在处理这些条目以生成RAG输出时可能会产生幻觉。本文旨在解决第一个挑战，即在多模态RAG的检索阶段改进从知识库中选择相关上下文的能力。具体而言，我们利用我们之前工作中设计的相关性评分（RS）指标来评估RAG性能，从而在检索过程中选择更相关的条目。基于嵌入（如CLIP嵌入）和余弦相似度的检索通常表现不佳，尤其是对于多模态数据。我们展示了通过使用更高级的相关性度量，可以在检索过程中从知识库中选择更多相关的内容，并通过自适应选择最多$k$个条目（而非固定数量的条目）来消除上下文中的无关内容。我们在COCO数据集上的评估表明，在相关上下文的选择和生成响应的准确性方面取得了显著的提升。"
+    },
+    {
+        "title": "Multi-task retriever fine-tuning for domain-specific and efficient RAG",
+        "url": "http://arxiv.org/abs/2501.04652v1",
+        "pub_date": "2025-01-08",
+        "summary": "Retrieval-Augmented Generation (RAG) has become ubiquitous when deploying Large Language Models (LLMs), as it can address typical limitations such as generating hallucinated or outdated information. However, when building real-world RAG applications, practical issues arise. First, the retrieved information is generally domain-specific. Since it is computationally expensive to fine-tune LLMs, it is more feasible to fine-tune the retriever to improve the quality of the data included in the LLM input. Second, as more applications are deployed in the same real-world system, one cannot afford to deploy separate retrievers. Moreover, these RAG applications normally retrieve different kinds of data. Our solution is to instruction fine-tune a small retriever encoder on a variety of domain-specific tasks to allow us to deploy one encoder that can serve many use cases, thereby achieving low-cost, scalability, and speed. We show how this encoder generalizes to out-of-domain settings as well as to an unseen retrieval task on real-world enterprise use cases.",
+        "translated": "检索增强生成（Retrieval-Augmented Generation, RAG）在部署大型语言模型（Large Language Models, LLMs）时已变得无处不在，因为它能够解决生成虚假或过时信息等典型限制。然而，在构建实际应用的RAG系统时，一些实际问题会浮现。首先，检索到的信息通常是领域特定的。由于对LLMs进行微调的计算成本较高，更可行的做法是对检索器进行微调，以提高输入到LLM的数据质量。其次，随着越来越多的应用部署在同一个实际系统中，无法负担为每个应用部署单独的检索器。此外，这些RAG应用通常检索不同类型的数据。我们的解决方案是，通过在各种领域特定任务上对一个小型检索器编码器进行指令微调，从而使得一个编码器能够服务于多种用例，进而实现低成本、可扩展性和速度。我们展示了该编码器如何在领域外设置以及在实际企业用例中未见过的检索任务上实现泛化。"
+    },
+    {
+        "title": "Knowledge Retrieval Based on Generative AI",
+        "url": "http://arxiv.org/abs/2501.04635v1",
+        "pub_date": "2025-01-08",
+        "summary": "This study develops a question-answering system based on Retrieval-Augmented Generation (RAG) using Chinese Wikipedia and Lawbank as retrieval sources. Using TTQA and TMMLU+ as evaluation datasets, the system employs BGE-M3 for dense vector retrieval to obtain highly relevant search results and BGE-reranker to reorder these results based on query relevance. The most pertinent retrieval outcomes serve as reference knowledge for a Large Language Model (LLM), enhancing its ability to answer questions and establishing a knowledge retrieval system grounded in generative AI.   The system's effectiveness is assessed through a two-stage evaluation: automatic and assisted performance evaluations. The automatic evaluation calculates accuracy by comparing the model's auto-generated labels with ground truth answers, measuring performance under standardized conditions without human intervention. The assisted performance evaluation involves 20 finance-related multiple-choice questions answered by 20 participants without financial backgrounds. Initially, participants answer independently. Later, they receive system-generated reference information to assist in answering, examining whether the system improves accuracy when assistance is provided.   The main contributions of this research are: (1) Enhanced LLM Capability: By integrating BGE-M3 and BGE-reranker, the system retrieves and reorders highly relevant results, reduces hallucinations, and dynamically accesses authorized or public knowledge sources. (2) Improved Data Privacy: A customized RAG architecture enables local operation of the LLM, eliminating the need to send private data to external servers. This approach enhances data security, reduces reliance on commercial services, lowers operational costs, and mitigates privacy risks.",
+        "translated": "本研究开发了一种基于检索增强生成（Retrieval-Augmented Generation, RAG）的问答系统，该系统以中文维基百科和法律数据库（Lawbank）作为检索源。使用TTQA和TMMLU+作为评估数据集，系统采用BGE-M3进行稠密向量检索以获取高度相关的搜索结果，并使用BGE-reranker根据查询相关性对结果进行重排序。最相关的检索结果作为大语言模型（Large Language Model, LLM）的参考知识，增强了其回答问题的能力，并建立了一个基于生成式人工智能的知识检索系统。\n\n系统的有效性通过两阶段评估进行验证：自动评估和辅助性能评估。自动评估通过比较模型自动生成的标签与标准答案，计算准确率，衡量在无人工干预的标准化条件下的性能表现。辅助性能评估则涉及20名无金融背景的参与者回答20道金融相关的选择题。参与者首先独立作答，随后在系统生成的参考信息的帮助下重新作答，以检验系统在提供辅助时是否提高了准确率。\n\n本研究的主要贡献包括：（1）增强大语言模型能力：通过整合BGE-M3和BGE-reranker，系统能够检索并重排序高度相关的结果，减少幻觉现象，并动态访问授权或公开的知识源。（2）提升数据隐私保护：通过定制化的RAG架构，系统实现了大语言模型的本地化运行，避免了将私有数据发送至外部服务器的需求。这种方法增强了数据安全性，减少了对商业服务的依赖，降低了运营成本，并缓解了隐私风险。"
+    },
+    {
+        "title": "Evaluating Interval-based Tokenization for Pitch Representation in\n  Symbolic Music Analysis",
+        "url": "http://arxiv.org/abs/2501.04630v1",
+        "pub_date": "2025-01-08",
+        "summary": "Symbolic music analysis tasks are often performed by models originally developed for Natural Language Processing, such as Transformers. Such models require the input data to be represented as sequences, which is achieved through a process of tokenization. Tokenization strategies for symbolic music often rely on absolute MIDI values to represent pitch information. However, music research largely promotes the benefit of higher-level representations such as melodic contour and harmonic relations for which pitch intervals turn out to be more expressive than absolute pitches. In this work, we introduce a general framework for building interval-based tokenizations. By evaluating these tokenizations on three music analysis tasks, we show that such interval-based tokenizations improve model performances and facilitate their explainability.",
+        "translated": "符号音乐分析任务通常由最初为自然语言处理（NLP）开发的模型（如Transformer）执行。这些模型要求输入数据以序列形式表示，这一过程通过**分词**（tokenization）实现。符号音乐的分词策略通常依赖于绝对MIDI值来表示音高信息。然而，音乐研究在很大程度上强调了更高层次表示的优势，例如**旋律轮廓**和**和声关系**，其中**音程**（pitch intervals）比绝对音高更具表现力。在本研究中，我们引入了一个通用的框架，用于构建基于音程的分词方法。通过在三个音乐分析任务上评估这些分词方法，我们证明了基于音程的分词不仅提升了模型性能，还增强了模型的可解释性。"
+    },
+    {
+        "title": "A Closer Look on Gender Stereotypes in Movie Recommender Systems and\n  Their Implications with Privacy",
+        "url": "http://arxiv.org/abs/2501.04420v1",
+        "pub_date": "2025-01-08",
+        "summary": "The movie recommender system typically leverages user feedback to provide personalized recommendations that align with user preferences and increase business revenue. This study investigates the impact of gender stereotypes on such systems through a specific attack scenario. In this scenario, an attacker determines users' gender, a private attribute, by exploiting gender stereotypes about movie preferences and analyzing users' feedback data, which is either publicly available or observed within the system. The study consists of two phases. In the first phase, a user study involving 630 participants identified gender stereotypes associated with movie genres, which often influence viewing choices. In the second phase, four inference algorithms were applied to detect gender stereotypes by combining the findings from the first phase with users' feedback data. Results showed that these algorithms performed more effectively than relying solely on feedback data for gender inference. Additionally, we quantified the extent of gender stereotypes to evaluate their broader impact on digital computational science. The latter part of the study utilized two major movie recommender datasets: MovieLens 1M and Yahoo!Movie. Detailed experimental information is available on our GitHub repository: https://github.com/fr-iit/GSMRS",
+        "translated": "电影推荐系统通常利用用户反馈来提供与用户偏好相符的个性化推荐，以增加商业收入。本研究通过一个特定的攻击场景，探讨了性别刻板印象对此类系统的影响。在该场景中，攻击者通过利用关于电影偏好的性别刻板印象并分析用户的反馈数据（这些数据要么是公开的，要么是在系统内观察到的），来确定用户的性别（一种私人属性）。研究分为两个阶段。在第一阶段，一项涉及630名参与者的用户研究确定了与电影类型相关的性别刻板印象，这些刻板印象往往会影响观影选择。在第二阶段，应用了四种推断算法，通过将第一阶段的发现与用户的反馈数据相结合来检测性别刻板印象。结果显示，这些算法在性别推断方面比仅依赖反馈数据更为有效。此外，我们还量化了性别刻板印象的程度，以评估其对数字计算科学的更广泛影响。研究的后半部分使用了两个主要的电影推荐数据集：MovieLens 1M和Yahoo!Movie。详细的实验信息可在我们的GitHub仓库中找到：https://github.com/fr-iit/GSMRS"
     }
 ]