chore: update confs

Vic-GoodLuck · Dec 13, 2024 · 4903a2d · 4903a2d
1 parent 502af48
commit 4903a2d
Showing 1 changed file with 35 additions and 0 deletions.
diff --git a/arxiv.json b/arxiv.json
@@ -36783,5 +36783,40 @@
         "pub_date": "2024-12-11",
         "summary": "The objective of multimodal intent recognition (MIR) is to leverage various modalities-such as text, video, and audio-to detect user intentions, which is crucial for understanding human language and context in dialogue systems. Despite advances in this field, two main challenges persist: (1) effectively extracting and utilizing semantic information from robust textual features; (2) aligning and fusing non-verbal modalities with verbal ones effectively. This paper proposes a Text Enhancement with CommOnsense Knowledge Extractor (TECO) to address these challenges. We begin by extracting relations from both generated and retrieved knowledge to enrich the contextual information in the text modality. Subsequently, we align and integrate visual and acoustic representations with these enhanced text features to form a cohesive multimodal representation. Our experimental results show substantial improvements over existing baseline methods.",
         "translated": "多模态意图识别（MIR）的目标是利用文本、视频和音频等多种模态来检测用户意图，这对于理解对话系统中的人类语言和上下文至关重要。尽管该领域取得了进展，但仍存在两个主要挑战：（1）有效提取和利用来自鲁棒文本特征的语义信息；（2）有效地将非语言模态与语言模态对齐和融合。本文提出了一种结合常识知识提取器（TECO）的文本增强方法来应对这些挑战。我们首先从生成和检索的知识中提取关系，以丰富文本模态中的上下文信息。接着，我们将视觉和听觉表示与这些增强后的文本特征对齐并整合，形成一个连贯的多模态表示。我们的实验结果显示，与现有的基线方法相比，取得了显著的改进。"
+    },
+    {
+        "title": "Foundational Large Language Models for Materials Research",
+        "url": "http://arxiv.org/abs/2412.09560v1",
+        "pub_date": "2024-12-12",
+        "summary": "Materials discovery and development are critical for addressing global challenges. Yet, the exponential growth in materials science literature comprising vast amounts of textual data has created significant bottlenecks in knowledge extraction, synthesis, and scientific reasoning. Large Language Models (LLMs) offer unprecedented opportunities to accelerate materials research through automated analysis and prediction. Still, their effective deployment requires domain-specific adaptation for understanding and solving domain-relevant tasks. Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models on an extensive corpus of materials literature and crystallographic data. Through systematic evaluation, we demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities. The specialized LLaMat-CIF variant demonstrates unprecedented capabilities in crystal structure generation, predicting stable crystals with high coverage across the periodic table. Intriguingly, despite LLaMA-3's superior performance in comparison to LLaMA-2, we observe that LLaMat-2 demonstrates unexpectedly enhanced domain-specific performance across diverse materials science tasks, including structured information extraction from text and tables, more particularly in crystal structure generation, a potential adaptation rigidity in overtrained LLMs. Altogether, the present work demonstrates the effectiveness of domain adaptation towards developing practically deployable LLM copilots for materials research. Beyond materials science, our findings reveal important considerations for domain adaptation of LLMs, such as model selection, training methodology, and domain-specific performance, which may influence the development of specialized scientific AI systems.",
+        "translated": "材料发现与开发对于应对全球性挑战至关重要。然而，材料科学文献的指数级增长，其中包含大量文本数据，已导致知识提取、综合和科学推理方面出现显著瓶颈。大型语言模型（LLMs）通过自动化分析和预测，为加速材料研究提供了前所未有的机遇。尽管如此，要有效部署这些模型，仍需进行领域特定的适应，以理解和解决领域相关任务。在此，我们介绍了LLaMat，这是一系列针对材料科学的基础模型，通过在广泛的材料文献和晶体学数据语料库上对LLaMA模型进行持续预训练而开发。通过系统评估，我们展示了LLaMat在材料特定自然语言处理（NLP）和结构化信息提取方面的卓越表现，同时保持了通用语言能力。专门的LLaMat-CIF变体在晶体结构生成方面展现出前所未有的能力，能够预测周期表上高覆盖率的稳定晶体。有趣的是，尽管LLaMA-3在性能上优于LLaMA-2，我们观察到LLaMat-2在多种材料科学任务中表现出意外增强的领域特定性能，包括从文本和表格中提取结构化信息，特别是在晶体结构生成方面，这可能揭示了过度训练的LLMs中存在的适应刚性。总体而言，本研究展示了领域适应在开发可实际部署的材料研究LLM助手方面的有效性。除材料科学外，我们的发现还揭示了LLMs领域适应的重要考虑因素，如模型选择、训练方法和领域特定性能，这些因素可能影响专门科学AI系统的发展。"
+    },
+    {
+        "title": "SPRec: Leveraging Self-Play to Debias Preference Alignment for Large\n  Language Model-based Recommendations",
+        "url": "http://arxiv.org/abs/2412.09243v1",
+        "pub_date": "2024-12-12",
+        "summary": "Large language models (LLMs) have attracted significant attention in recommendation systems. Current LLM-based recommender systems primarily rely on supervised fine-tuning (SFT) to train the model for recommendation tasks. However, relying solely on positive samples limits the model's ability to align with user satisfaction and expectations. To address this, researchers have introduced Direct Preference Optimization (DPO), which explicitly aligns recommendations with user preferences using offline preference ranking data. Despite its advantages, our theoretical analysis reveals that DPO inherently biases the model towards a few items, exacerbating the filter bubble issue and ultimately degrading user experience. In this paper, we propose SPRec, a novel self-play recommendation framework designed to mitigate over-recommendation and improve fairness without requiring additional data or manual intervention. In each self-play iteration, the model undergoes an SFT step followed by a DPO step, treating offline interaction data as positive samples and the predicted outputs from the previous iteration as negative samples. This effectively re-weights the DPO loss function using the model's logits, adaptively suppressing biased items. Extensive experiments on multiple real-world datasets demonstrate SPRec's effectiveness in enhancing recommendation accuracy and addressing fairness concerns.",
+        "translated": "大型语言模型（LLMs）在推荐系统中引起了广泛关注。当前基于LLM的推荐系统主要依赖于监督微调（SFT）来训练模型以完成推荐任务。然而，仅仅依赖正样本限制了模型与用户满意度和期望对齐的能力。为了解决这一问题，研究者引入了直接偏好优化（DPO），该方法利用离线偏好排序数据显式地将推荐与用户偏好对齐。尽管DPO具有优势，但我们的理论分析表明，DPO本质上会使模型偏向于少数几个项目，加剧了过滤气泡问题，并最终降低了用户体验。在本文中，我们提出了SPRec，这是一种新颖的自博弈推荐框架，旨在减少过度推荐并提高公平性，且无需额外数据或人工干预。在每次自博弈迭代中，模型先进行SFT步骤，随后进行DPO步骤，将离线交互数据视为正样本，并将前一次迭代的预测输出视为负样本。这有效地利用模型的logits对DPO损失函数进行了重新加权，动态抑制了偏差项目。在多个真实世界数据集上的广泛实验表明，SPRec在提高推荐准确性和解决公平性问题方面具有显著效果。"
+    },
+    {
+        "title": "When Text Embedding Meets Large Language Model: A Comprehensive Survey",
+        "url": "http://arxiv.org/abs/2412.09165v1",
+        "pub_date": "2024-12-12",
+        "summary": "Text embedding has become a foundational technology in natural language processing (NLP) during the deep learning era, driving advancements across a wide array of downstream tasks. While many natural language understanding challenges can now be modeled using generative paradigms and leverage the robust generative and comprehension capabilities of large language models (LLMs), numerous practical applications, such as semantic matching, clustering, and information retrieval, continue to rely on text embeddings for their efficiency and effectiveness. In this survey, we categorize the interplay between LLMs and text embeddings into three overarching themes: (1) LLM-augmented text embedding, enhancing traditional embedding methods with LLMs; (2) LLMs as text embedders, utilizing their innate capabilities for embedding generation; and (3) Text embedding understanding with LLMs, leveraging LLMs to analyze and interpret embeddings. By organizing these efforts based on interaction patterns rather than specific downstream applications, we offer a novel and systematic overview of contributions from various research and application domains in the era of LLMs. Furthermore, we highlight the unresolved challenges that persisted in the pre-LLM era with pre-trained language models (PLMs) and explore the emerging obstacles brought forth by LLMs. Building on this analysis, we outline prospective directions for the evolution of text embedding, addressing both theoretical and practical opportunities in the rapidly advancing landscape of NLP.",
+        "translated": "在深度学习时代，文本嵌入已成为自然语言处理（NLP）领域的基础技术，推动了众多下游任务的发展。尽管许多自然语言理解挑战现在可以通过生成范式建模，并利用大型语言模型（LLMs）强大的生成和理解能力，但诸如语义匹配、聚类和信息检索等许多实际应用仍然依赖于文本嵌入，以实现其高效性和有效性。在本综述中，我们将LLMs与文本嵌入之间的相互作用分为三大主题：（1）LLM增强的文本嵌入，利用LLMs提升传统嵌入方法；（2）LLMs作为文本嵌入器，利用其固有能力生成嵌入；（3）基于LLMs的文本嵌入理解，利用LLMs分析和解释嵌入。通过根据交互模式而非特定下游应用来组织这些努力，我们提供了一个新颖且系统的概览，涵盖了LLMs时代各个研究与应用领域的贡献。此外，我们强调了在预训练语言模型（PLMs）时代未解决的持续挑战，并探讨了LLMs带来的新兴障碍。基于这一分析，我们概述了文本嵌入未来发展的潜在方向，涵盖了NLP快速演进领域中的理论与实践机遇。"
+    },
+    {
+        "title": "Predicting Quality of Video Gaming Experience Using Global-Scale\n  Telemetry Data and Federated Learning",
+        "url": "http://arxiv.org/abs/2412.08950v1",
+        "pub_date": "2024-12-12",
+        "summary": "Frames Per Second (FPS) significantly affects the gaming experience. Providing players with accurate FPS estimates prior to purchase benefits both players and game developers. However, we have a limited understanding of how to predict a game's FPS performance on a specific device. In this paper, we first conduct a comprehensive analysis of a wide range of factors that may affect game FPS on a global-scale dataset to identify the determinants of FPS. This includes player-side and game-side characteristics, as well as country-level socio-economic statistics. Furthermore, recognizing that accurate FPS predictions require extensive user data, which raises privacy concerns, we propose a federated learning-based model to ensure user privacy. Each player and game is assigned a unique learnable knowledge kernel that gradually extracts latent features for improved accuracy. We also introduce a novel training and prediction scheme that allows these kernels to be dynamically plug-and-play, effectively addressing cold start issues. To train this model with minimal bias, we collected a large telemetry dataset from 224 countries and regions, 100,000 users, and 835 games. Our model achieved a mean Wasserstein distance of 0.469 between predicted and ground truth FPS distributions, outperforming all baseline methods.",
+        "translated": "帧率（FPS）显著影响游戏体验。在购买前为玩家提供准确的FPS估算，不仅有利于玩家，也惠及游戏开发者。然而，我们对于如何预测特定设备上游戏的FPS表现知之甚少。本文中，我们首先对可能影响全球范围内游戏FPS的广泛因素进行了全面分析，以识别FPS的决定因素。这包括玩家端和游戏端特征，以及国家层面的社会经济统计数据。此外，鉴于准确预测FPS需要大量用户数据，这引发了隐私问题，我们提出了一种基于联邦学习模型的解决方案，以确保用户隐私。每个玩家和游戏都被赋予一个独特的可学习知识核，逐渐提取潜在特征以提高准确性。我们还引入了一种新颖的训练和预测方案，使得这些知识核能够动态即插即用，有效解决冷启动问题。为了以最小偏差训练该模型，我们收集了来自224个国家和地区、100,000名用户和835款游戏的庞大遥测数据集。我们的模型在预测与真实FPS分布之间的平均Wasserstein距离达到了0.469，优于所有基线方法。"
+    },
+    {
+        "title": "A Flexible Plug-and-Play Module for Generating Variable-Length",
+        "url": "http://arxiv.org/abs/2412.08922v1",
+        "pub_date": "2024-12-12",
+        "summary": "Deep supervised hashing has become a pivotal technique in large-scale image retrieval, offering significant benefits in terms of storage and search efficiency. However, existing deep supervised hashing models predominantly focus on generating fixed-length hash codes. This approach fails to address the inherent trade-off between efficiency and effectiveness when using hash codes of varying lengths. To determine the optimal hash code length for a specific task, multiple models must be trained for different lengths, leading to increased training time and computational overhead. Furthermore, the current paradigm overlooks the potential relationships between hash codes of different lengths, limiting the overall effectiveness of the models. To address these challenges, we propose the Nested Hash Layer (NHL), a plug-and-play module designed for existing deep supervised hashing models. The NHL framework introduces a novel mechanism to simultaneously generate hash codes of varying lengths in a nested manner. To tackle the optimization conflicts arising from the multiple learning objectives associated with different code lengths, we further propose an adaptive weights strategy that dynamically monitors and adjusts gradients during training. Additionally, recognizing that the structural information in longer hash codes can provide valuable guidance for shorter hash codes, we develop a long-short cascade self-distillation method within the NHL to enhance the overall quality of the generated hash codes. Extensive experiments demonstrate that NHL not only accelerates the training process but also achieves superior retrieval performance across various deep hashing models. Our code is publicly available at https://github.com/hly1998/NHL.",
+        "translated": "深度监督哈希已成为大规模图像检索中的关键技术，在存储和搜索效率方面提供了显著优势。然而，现有的深度监督哈希模型主要集中在生成固定长度的哈希码。这种方法未能解决在使用不同长度的哈希码时效率与效果之间的固有权衡问题。为了确定特定任务的最佳哈希码长度，必须为不同长度训练多个模型，这导致了训练时间和计算开销的增加。此外，当前的方法忽视了不同长度哈希码之间潜在的关联，限制了模型的整体效果。为应对这些挑战，我们提出了嵌套哈希层（Nested Hash Layer，NHL），这是一个为现有深度监督哈希模型设计的即插即用模块。NHL框架引入了一种新颖的机制，能够以嵌套方式同时生成不同长度的哈希码。为了解决与不同码长相关的多个学习目标之间的优化冲突，我们进一步提出了一种自适应权重策略，该策略在训练过程中动态监控和调整梯度。此外，考虑到较长哈希码中的结构信息可以为较短哈希码提供有价值的指导，我们在NHL中开发了一种长短码级联自蒸馏方法，以提高生成哈希码的整体质量。大量实验表明，NHL不仅加速了训练过程，而且在各种深度哈希模型中实现了卓越的检索性能。我们的代码已在 https://github.com/hly1998/NHL 公开发布。"
     }
 ]