Skip to content

Latest commit

 

History

History
174 lines (173 loc) · 399 KB

File metadata and controls

174 lines (173 loc) · 399 KB

WSDM2024 Paper List

论文 作者 组织 摘要 翻译 代码 引用数
RecJPQ: Training Large-Catalogue Sequential Recommenders Aleksandr V. Petrov, Craig Macdonald Sequential recommender systems rank items based on the likelihood of their next appearance in user-item interactions. Current models such as BERT4Rec and SASRec generate sequence embeddings and compute scores for catalogue items, but the increasing catalogue size makes training these models costly. The Joint Product Quantisation method, originally proposed for passage retrieval, markedly reduces the size of the retrieval index with minimal effect on model effectiveness by replacing passage embeddings with a limited number of shared centroid embeddings. This paper introduces RecJPQ, a novel adaptation of JPQ for sequential recommendations. We apply RecJPQ to SASRec, BERT4Rec, and GRU4rec models on three large-scale sequential datasets. Our results showed that RecJPQ could notably reduce the model size (e.g., 48x reduction for the Gowalla dataset with no effectiveness degradation). RecJPQ can also improve model performance through a regularisation effect (e.g. +0.96% NDCG@10 improvement on the Booking.com dataset). 连续推荐系统根据项目在用户-项目交互中下一次出现的可能性对项目进行排序。当前的模型如 BERT4Rec 和 SASRec 生成序列嵌入并计算目录项的分数,但目录大小的增加使得这些模型的训练成本较高。联合乘积量化方法最初用于文本检索,通过用有限数量的共享质心嵌入代替文本嵌入,在对模型有效性影响最小的情况下显著减少了检索索引的大小。本文介绍了 RecJPQ,它是 JPQ 对顺序推荐的一个新的改进。我们将 RecJPQ 应用于三个大规模连续数据集上的 SASRec、 BERT4Rec 和 GRU4rec 模型。我们的研究结果表明,RecJPQ 可以显著减少模型的大小(例如,Gowalla 数据集减少了48倍,而且没有效率降低)。RecJPQ 还可以通过正则化效应提高模型性能(例如,Booking.com 数据集上的 NDCG 提高了0.96%)。 code 2
Understanding User Behavior in Carousel Recommendation Systems for Click Modeling and Learning to Rank Santiago de LeonMartinez Carousels (also-known as multilists) have become the standard user interface for e-commerce platforms replacing the ranked list, the previous standard for recommender systems. While the research community has begun to focus on carousels, there are many unanswered questions and undeveloped areas when compared to the literature for ranked lists, which includes information retrieval research on the presentation of web search results. This work is an extended abstract for the RecSys 2023 Doctoral Symposium outlining a PhD project, with the main contribution of addressing the undeveloped areas in carousel recommenders: 1) the formulation of new click models and 2) learning to rank with click data. We present two significant barriers for this contribution and the field: lack of public datasets and lack of eye tracking user studies of browsing behavior. Clicks, the standard feedback collected by recommender systems, are insufficient to understand the whole interaction process of a user with a recommender requiring system designers to make assumptions, especially on browsing behavior. Eye tracking provides a means to elucidate the process and test these assumptions. Thus, to address these barriers and encourage future work, we will conduct an eye tracking user study within a carousel movie recommendation setting and make the dataset publicly available. Moreover, the insights learned on browsing behavior will help motivate the formulation of new click models and learning to rank. 旋转木马(也称为多列表)已经成为电子商务平台的标准用户界面,取代了以前推荐系统的标准排名列表。虽然研究团体已经开始关注旋转木马,但与排名列表的文献相比,还有许多未解答的问题和未开发的领域,其中包括对网络搜索结果呈现的信息检索研究。本文是 RecSys 2023年博士研讨会的扩展摘要,概述了一个博士项目,主要贡献在于解决传送带推荐中的不发达领域: 1)制定新的点击模型,2)学习根据点击数据进行排名。我们提出了两个重要的障碍,这个贡献和领域: 缺乏公共数据集和缺乏眼球跟踪用户研究的浏览行为。点击,推荐系统收集的标准反馈,不足以理解用户与推荐系统的整个交互过程,需要系统设计者做出假设,尤其是在浏览行为方面。眼球追踪提供了一种方法来阐明这一过程,并检验这些假设。因此,为了解决这些障碍并鼓励未来的工作,我们将在旋转木马电影推荐设置中进行眼动跟踪用户研究,并使数据集公开可用。此外,学到的浏览行为的洞察力将有助于激励制定新的点击模型和学习排名。 code 1
Vector Search with OpenAI Embeddings: Lucene Is All You Need Jasper Xian, Tommaso Teofili, Ronak Pradeep, Jimmy Lin We provide a reproducible, end-to-end demonstration of vector search with OpenAI embeddings using Lucene on the popular MS MARCO passage ranking test collection. The main goal of our work is to challenge the prevailing narrative that a dedicated vector store is necessary to take advantage of recent advances in deep neural networks as applied to search. Quite the contrary, we show that hierarchical navigable small-world network (HNSW) indexes in Lucene are adequate to provide vector search capabilities in a standard bi-encoder architecture. This suggests that, from a simple cost-benefit analysis, there does not appear to be a compelling reason to introduce a dedicated vector store into a modern "AI stack" for search, since such applications have already received substantial investments in existing, widely deployed infrastructure. 我们提供了一个可重复的,端到端的向量搜索演示与 OpenAI 嵌入使用 Lucene 在流行的 MS MARCO 通道排序测试集合。我们工作的主要目标是挑战流行的说法,即专用矢量存储是必要的,以利用深层神经网络的最新进展,适用于搜索。恰恰相反,我们证明 Lucene 的分层导航小世界网络(HNSW)索引足以在标准的双编码器架构中提供向量搜索功能。这表明,从一个简单的成本-收益分析来看,似乎没有令人信服的理由将一个专门的矢量存储引入现代“人工智能堆栈”中进行搜索,因为此类应用已经在现有的、广泛部署的基础设施中获得了大量投资。 code 1
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions Zahra Abbasiantaeb, Yifei Yuan, Evangelos Kanoulas, Mohammad Aliannejadi Conversational question-answering (CQA) systems aim to create interactive search systems that effectively retrieve information by interacting with users. To replicate human-to-human conversations, existing work uses human annotators to play the roles of the questioner (student) and the answerer (teacher). Despite its effectiveness, challenges exist as human annotation is time-consuming, inconsistent, and not scalable. To address this issue and investigate the applicability of large language models (LLMs) in CQA simulation, we propose a simulation framework that employs zero-shot learner LLMs for simulating teacher-student interactions. Our framework involves two LLMs interacting on a specific topic, with the first LLM acting as a student, generating questions to explore a given search topic. The second LLM plays the role of a teacher by answering questions and is equipped with additional information, including a text on the given topic. We implement both the student and teacher by zero-shot prompting the GPT-4 model. To assess the effectiveness of LLMs in simulating CQA interactions and understand the disparities between LLM- and human-generated conversations, we evaluate the simulated data from various perspectives. We begin by evaluating the teacher's performance through both automatic and human assessment. Next, we evaluate the performance of the student, analyzing and comparing the disparities between questions generated by the LLM and those generated by humans. Furthermore, we conduct extensive analyses to thoroughly examine the LLM performance by benchmarking state-of-the-art reading comprehension models on both datasets. Our results reveal that the teacher LLM generates lengthier answers that tend to be more accurate and complete. The student LLM generates more diverse questions, covering more aspects of a given topic. 会话问答(CQA)系统旨在创建交互式搜索系统,通过与用户交互有效地检索信息。为了复制人与人之间的对话,现有的工作使用人工注释器来扮演提问者(学生)和回答者(老师)的角色。尽管它很有效,但由于人工注释耗时、不一致且不可伸缩,因此存在挑战。为了解决这一问题,并研究大语言模型(LLM)在 CQA 模拟中的适用性,我们提出了一个模拟框架,使用零射程学习者 LLM 来模拟师生交互。我们的框架涉及两个 LLM 在特定主题上的交互,第一个 LLM 作为学生,生成问题来探索给定的搜索主题。第二个法学硕士课程通过回答问题来扮演教师的角色,并配备了额外的信息,包括关于给定主题的课文。我们通过零击提示 GPT-4模型来实现学生和老师。为了评估 LLM 在模拟 CQA 交互中的有效性,并了解 LLM 和人类生成的会话之间的差异,我们从不同的角度评估了模拟数据。我们首先通过自动评估和人工评估来评估教师的表现。接下来,我们评估学生的表现,分析和比较由 LLM 产生的问题和由人类产生的问题之间的差异。此外,我们进行了广泛的分析,通过对两个数据集的最先进的阅读理解模型进行基准测试,彻底检查 LLM 的性能。我们的研究结果表明,教师 LLM 生成更长的答案,往往是更准确和完整的。学生 LLM 生成更多不同的问题,涵盖给定主题的更多方面。 code 1
MADM: A Model-agnostic Denoising Module for Graph-based Social Recommendation Wenze Ma, Yuexian Wang, Yanmin Zhu, Zhaobo Wang, Mengyuan Jing, Xuhao Zhao, Jiadi Yu, Feilong Tang Shanghai Jiao Tong Univ, Shanghai, Peoples R China Graph-based social recommendation improves the prediction accuracy of recommendation by leveraging high-order neighboring information contained in social relations. However, most of them ignore the problem that social relations can be noisy for recommendation. Several studies attempt to tackle this problem by performing social graph denoising, but they suffer from 1) adaptability issues for other graph-based social recommendation models and 2) insufficiency issues for user social representation learning. To address the limitations, we propose a model-agnostic graph denoising module (denoted as MADM) which works as a plug-and-play module to provide refined social structure for base models. Meanwhile, to propel user social representations to be minimal and sufficient for recommendation, MADM further employs mutual information maximization (MIM) between user social representations and the interaction graph and realizes two ways of MIM: contrastive learning and forward predictive learning. We provide theoretical insights and guarantees from the perspectives of Information Theory and Multi-view Learning to explain its rationality. Extensive experiments on three real-world datasets demonstrate the effectiveness of MADM. The codes are available here. 基于图的社会推荐通过利用社会关系中包含的高阶相邻信息来提高推荐的预测精度。然而,他们中的大多数忽略了社会关系可能会因为推荐而变得嘈杂的问题。一些研究试图通过实施社会图去噪来解决这个问题,但是他们面临着: 1)其他基于图的社会推荐模型的适应性问题和2)用户社会表征学习的不足问题。针对这些局限性,本文提出了一种模型无关图去噪模块(MADM) ,它作为一个即插即用模块为基本模型提供精确的社会结构。同时,为了推动用户社交表征的最小化和推荐的充分性,MADM 进一步在用户社交表征和交互图之间引入了互信息最大化(MIM) ,实现了 MIM 的两种方式: 对比学习和前向预测学习。我们从信息论和多视角学习的角度提供理论的见解和保证,以解释其合理性。在三个实际数据集上的大量实验证明了 MADM 的有效性。密码在这里。 code 1
A Multi-Granularity-Aware Aspect Learning Model for Multi-Aspect Dense Retrieval Xiaojie Sun, Keping Bi, Jiafeng Guo, Sihui Yang, Qishen Zhang, Zhongyi Liu, Guannan Zhang, Xueqi Cheng Dense retrieval methods have been mostly focused on unstructured text and less attention has been drawn to structured data with various aspects, e.g., products with aspects such as category and brand. Recent work has proposed two approaches to incorporate the aspect information into item representations for effective retrieval by predicting the values associated with the item aspects. Despite their efficacy, they treat the values as isolated classes (e.g., "Smart Homes", "Home, Garden Tools", and "Beauty Health") and ignore their fine-grained semantic relation. Furthermore, they either enforce the learning of aspects into the CLS token, which could confuse it from its designated use for representing the entire content semantics, or learn extra aspect embeddings only with the value prediction objective, which could be insufficient especially when there are no annotated values for an item aspect. Aware of these limitations, we propose a MUlti-granulaRity-aware Aspect Learning model (MURAL) for multi-aspect dense retrieval. It leverages aspect information across various granularities to capture both coarse and fine-grained semantic relations between values. Moreover, MURAL incorporates separate aspect embeddings as input to transformer encoders so that the masked language model objective can assist implicit aspect learning even without aspect-value annotations. Extensive experiments on two real-world datasets of products and mini-programs show that MURAL outperforms state-of-the-art baselines significantly. 密集型检索方法主要集中在非结构化文本上,而对具有各种方面的结构化数据的关注较少,例如,具有类别和品牌等方面的产品。最近的工作提出了两种方法,通过预测与项目方面相关的值,将方面信息合并到项目表示中,以实现有效的检索。尽管它们有效,但它们把这些价值观当作孤立的类(例如,“智能家居”、“家庭、园艺工具”和“美容健康”) ,而忽视了它们细粒度的语义关系。此外,它们要么将方面的学习强制到 CLS 标记中,这可能会混淆它表示整个内容语义的指定用途,要么只学习价值预测目标的额外方面嵌入,这可能是不够的,尤其是当项目方面没有注释值时。考虑到这些局限性,我们提出了一种多粒度感知方面学习模型(MURAL)用于多方面密集检索。它利用跨不同粒度的方面信息来捕获值之间的粗粒度和细粒度语义关系。此外,MURAL 将独立的方面嵌入作为输入到变压器编码器中,这样即使没有方面值注释,蒙版语言模型目标也可以帮助隐式方面学习。在两个产品和小程序的真实世界数据集上的大量实验表明,MURAL 的性能明显优于最先进的基线。 code 1
LLMRec: Large Language Models with Graph Augmentation for Recommendation Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang The problem of data sparsity has long been a challenge in recommendation systems, and previous studies have attempted to address this issue by incorporating side information. However, this approach often introduces side effects such as noise, availability issues, and low data quality, which in turn hinder the accurate modeling of user preferences and adversely impact recommendation performance. In light of the recent advancements in large language models (LLMs), which possess extensive knowledge bases and strong reasoning capabilities, we propose a novel framework called LLMRec that enhances recommender systems by employing three simple yet effective LLM-based graph augmentation strategies. Our approach leverages the rich content available within online platforms (e.g., Netflix, MovieLens) to augment the interaction graph in three ways: (i) reinforcing user-item interaction egde, (ii) enhancing the understanding of item node attributes, and (iii) conducting user node profiling, intuitively from the natural language perspective. By employing these strategies, we address the challenges posed by sparse implicit feedback and low-quality side information in recommenders. Besides, to ensure the quality of the augmentation, we develop a denoised data robustification mechanism that includes techniques of noisy implicit feedback pruning and MAE-based feature enhancement that help refine the augmented data and improve its reliability. Furthermore, we provide theoretical analysis to support the effectiveness of LLMRec and clarify the benefits of our method in facilitating model optimization. Experimental results on benchmark datasets demonstrate the superiority of our LLM-based augmentation approach over state-of-the-art techniques. To ensure reproducibility, we have made our code and augmented data publicly available at: https://github.com/HKUDS/LLMRec.git 数据稀少的问题长期以来一直是推荐系统中的一个挑战,以前的研究试图通过纳入辅助信息来解决这一问题。然而,这种方法通常会引入诸如噪音、可用性问题和低数据质量等副作用,这反过来又会阻碍对用户偏好的精确建模,并对推荐性能产生不利影响。鉴于大语言模型(LLM)具有广泛的知识库和强大的推理能力,我们提出了一种新的框架 LLMRec,通过采用三种简单而有效的基于 LLM 的图增强策略来增强推荐系统。我们的方法利用在线平台(例如 Netflix,MovieLens)中可用的丰富内容以三种方式增强交互图: (i)加强用户-项目交互层面,(ii)增强对项目节点属性的理解,以及(iii)从自然语言的角度直观地进行用户节点剖析。通过使用这些策略,我们解决了推荐者中稀疏的隐式反馈和低质量的副信息所带来的挑战。此外,为了保证增强数据的质量,本文提出了一种去噪数据鲁棒化机制,该机制包括有噪隐式反馈修剪技术和基于 MAE 的特征增强技术,这些技术有助于细化增强数据,提高增强数据的可靠性。此外,我们还提供了理论分析来支持 LLMRec 的有效性,并阐明了我们的方法在促进模型优化方面的好处。在基准数据集上的实验结果表明了我们基于 LLM 的增强方法相对于最新技术的优越性。为确保可重复性,我们已将我们的代码和增强数据公开于以下 https://github.com/hkuds/llmrec.git code 1
Likelihood-Based Methods Improve Parameter Estimation in Opinion Dynamics Models Jacopo Lenti, Corrado Monti, Gianmarco De Francisci Morales We show that a maximum likelihood approach for parameter estimation in agent-based models (ABMs) of opinion dynamics outperforms the typical simulation-based approach. Simulation-based approaches simulate the model repeatedly in search of a set of parameters that generates data similar enough to the observed one. In contrast, likelihood-based approaches derive a likelihood function that connects the unknown parameters to the observed data in a statistically principled way. We compare these two approaches on the well-known bounded-confidence model of opinion dynamics. We do so on three realistic scenarios of increasing complexity depending on data availability: (i) fully observed opinions and interactions, (ii) partially observed interactions, (iii) observed interactions with noisy proxies of the opinions. We highlight how identifying observed and latent variables is fundamental for connecting the model to the data. To realize the likelihood-based approach, we first cast the model into a probabilistic generative guise that supports a proper data likelihood. Then, we describe the three scenarios via probabilistic graphical models and show the nuances that go into translating the model. Finally, we implement the resulting probabilistic models in an automatic differentiation framework (PyTorch). This step enables easy and efficient maximum likelihood estimation via gradient descent. Our experimental results show that the maximum likelihood estimates are up to 4x more accurate and require up to 200x less computational time. 结果表明,基于主体的观点动力学模型参数估计的最大似然方法优于典型的基于仿真的方法。基于仿真的方法反复模拟模型,以寻找一组参数,生成与观测数据足够相似的数据。相比之下,基于似然的方法推导出一个似然函数,用统计学原理将未知参数与观测数据联系起来。我们比较了这两种方法在众所周知的有界置信模型的意见动态。我们这样做是基于三个现实的场景: (i)完全观察到的观点和相互作用,(ii)部分观察到的相互作用,(iii)观察到的与观点的嘈杂代理的相互作用。我们强调识别观察变量和潜在变量是如何将模型与数据联系起来的基础。为了实现基于似然的方法,我们首先将模型转换成一个支持适当数据似然的概率生成伪装。然后,我们通过概率图形模型来描述这三个场景,并展示模型转换过程中的细微差别。最后,我们在一个自动微分框架(PyTorch)中实现最终的概率模型。这个步骤可以简单有效地利用梯度下降法进行最大似然估计。我们的实验结果表明,最大似然估计是高达4倍以上的准确性,需要高达200倍以下的计算时间。 code 1
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization Cheng Deng, Tianhang Zhang, Zhongmou He, Qiyuan Chen, Yuanyuan Shi, Yi Xu, Luoyi Fu, Weinan Zhang, Xinbing Wang, Chenghu Zhou, Zhouhan Lin, Junxian He Large language models (LLMs) have achieved great success in general domains of natural language processing. In this paper, we bring LLMs to the realm of geoscience with the objective of advancing research and applications in this field. To this end, we present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience. For instance, we have curated the first geoscience instruction tuning dataset, GeoSignal, which aims to align LLM responses to geoscience-related user queries. Additionally, we have established the first geoscience benchmark, GeoBench, to evaluate LLMs in the context of geoscience. In this work, we experiment with a complete recipe to adapt a pre-trained general-domain LLM to the geoscience domain. Specifically, we further train the LLaMA-7B model on 5.5B tokens of geoscience text corpus, including over 1 million pieces of geoscience literature, and utilize GeoSignal's supervised data to fine-tune the model. Moreover, we share a protocol that can efficiently gather domain-specific data and construct domain-supervised data, even in situations where manpower is scarce. Meanwhile, we equip K2 with the abilities of using tools to be a naive geoscience aide. Experiments conducted on the GeoBench demonstrate the effectiveness of our approach and datasets on geoscience knowledge understanding and utilization.We open-source all the training data and K2 model checkpoints at https://github.com/davendw49/k2. 大语言模型(LLM)在自然语言处理的一般领域取得了巨大的成功。本文将 LLM 引入地球科学领域,旨在促进该领域的研究和应用。为此,我们提出了地球科学有史以来第一个 LLM,K2,以及一套资源的开发,以进一步促进地球科学中的 LLM 研究。例如,我们策划了第一个地球科学指令调整数据集 GeoSignal,其目的是使 LLM 响应与地球科学相关的用户查询保持一致。此外,我们还建立了第一个地球科学基准,GeoBench,用于评估地球科学背景下的 LLM。在这项工作中,我们试验了一个完整的配方,以适应预先训练的一般领域 LLM 的地球科学领域。具体而言,我们进一步训练 LLaMA-7B 模型在地学文本语料库的5.5 B 标记上,包括超过100万篇地学文献,并利用 GeoSignal 的监督数据来微调模型。此外,我们共享一个协议,可以有效地收集特定领域的数据和构造领域监督的数据,即使在人力资源紧缺的情况下。同时,我们使 K2具备使用工具的能力,成为一个天真的地球科学助手。在 GeoBench 上进行的实验证明了我们的方法和数据集对地球科学知识的理解和利用的有效性。我们开源了所有的训练数据和 https://github.com/davendw49/k2的 k2模型检查点。 code 1
Collaboration and Transition: Distilling Item Transitions into Multi-Query Self-Attention for Sequential Recommendation Tianyu Zhu, Yansong Shi, Yuan Zhang, Yihong Wu, Fengran Mo, JianYun Nie Modern recommender systems employ various sequential modules such as self-attention to learn dynamic user interests. However, these methods are less effective in capturing collaborative and transitional signals within user interaction sequences. First, the self-attention architecture uses the embedding of a single item as the attention query, which is inherently challenging to capture collaborative signals. Second, these methods typically follow an auto-regressive framework, which is unable to learn global item transition patterns. To overcome these limitations, we propose a new method called Multi-Query Self-Attention with Transition-Aware Embedding Distillation (MQSA-TED). First, we propose an $L$-query self-attention module that employs flexible window sizes for attention queries to capture collaborative signals. In addition, we introduce a multi-query self-attention method that balances the bias-variance trade-off in modeling user preferences by combining long and short-query self-attentions. Second, we develop a transition-aware embedding distillation module that distills global item-to-item transition patterns into item embeddings, which enables the model to memorize and leverage transitional signals and serves as a calibrator for collaborative signals. Experimental results on four real-world datasets show the superiority of our proposed method over state-of-the-art sequential recommendation methods. 现代推荐系统采用自我关注等多种顺序模块来学习动态用户兴趣。然而,这些方法在捕获用户交互序列中的协作和过渡信号方面效率较低。首先,自我注意体系结构使用嵌入单个条目作为注意查询,这对于捕获协作信号具有内在的挑战性。其次,这些方法通常遵循一个自动回归框架,该框架不能学习全局项目转换模式。为了克服这些局限性,本文提出了一种新的基于过渡意识的多查询自注意(MQSA-TED)方法。首先,我们提出了一个 $L $- query 自我注意模块,该模块使用灵活的窗口大小来进行注意查询以捕获协作信号。此外,本文还提出了一种结合长查询和短查询自注意的多查询自注意方法,平衡了偏差-方差权衡。其次,我们开发了一个具有过渡意识的嵌入蒸馏模块,该模块将全局项目到项目的过渡模式提取为项目嵌入,使模型能够记忆和利用过渡信号,并作为协作信号的校准器。在四个实际数据集上的实验结果表明,本文提出的方法优于目前最先进的顺序推荐方法。 code 0
Contextual MAB Oriented Embedding Denoising for Sequential Recommendation Zhichao Feng, Pengfei Wang, Kaiyuan Li, Chenliang Li, Shangguang Wang Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China; Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing, Peoples R China Deep neural networks now have become the de-facto standard for sequential recommendation. In the existing techniques, an embedding vector is assigned for each item, encoding all the characteristics of the latter in latent space. Then, the recommendation is transferred to devising a similarity metric to recommend user's next behavior. Here, we consider each dimension of an embedding vector as a (latent) feature. Though effective, it is unknown which feature carries what semantics toward the item. Actually, in reality, this merit is highly preferable since a specific group of features could induce a particular relation among the items while the others are in vain. Unfortunately, the previous treatment overlooks the feature semantic learning at such a fine-grained level. When each item contains multiple latent aspects, which however is prevalent in real-world, the relations between items are very complex. The existing solutions are easy to fail on better recommendation performance. It is necessary to disentangle the item embeddings and extract credible features in a context-aware manner. To address this issue, in this work, we present a novel Contextual MAB based Embedding Denoising model (Comed for short) to adaptively identify relevant dimension-level features for a better recommendation. Specifically, Comed formulates the embedding denoising task as a Contextual Multi-armed Bandit problem. For each feature of the item embedding, we assign a two-armed neural bandit to determine whether the constituent semantics should be preserved, rendering the whole process as embedding denoising. By aggregating the denoised embeddings as contextual information, a reward function deduced by the similarity between the historical interaction sequence and the target item is further designed to approximate the maximum expected payoffs of bandits for efficient learning. Considering the possible inefficiency of training the serial operating mechanism, we also design a swift learning strategy to accelerate the co-guidance between the renovated sequential embedding and the parallel actions of neural bandits for a better recommendation. Comprehensive trials conducted on four widely recognized benchmarks substantiate the efficiency and efficacy of our framework. 深层神经网络现在已经成为事实上的顺序推荐标准。在现有的技术中,为每个项目指定一个嵌入向量,将后者的所有特征编码到潜在空间中。然后,将推荐转换为设计一个相似性度量来推荐用户的下一个行为。在这里,我们将嵌入向量的每个维视为一个(潜在的)特征。虽然有效,但是不知道哪个特性对项目承载了什么语义。事实上,在现实中,这种优点是非常可取的,因为一组特定的特征可以在项目之间产生一种特定的关系,而其他的都是徒劳的。不幸的是,以前的处理忽略了特征语义学习在这样一个细粒度的水平。当每个项目包含多个潜在方面时,项目之间的关系是非常复杂的,而这在现实世界中却是普遍存在的。现有的解决方案很容易因为更好的推荐性能而失败。有必要对项目嵌入进行解密,并以上下文感知的方式提取可信特征。为了解决这一问题,本文提出了一种新的基于上下文 MAB 的嵌入式去噪模型(简称 Comed) ,用于自适应地识别相关的维级特征,以获得更好的推荐。具体来说,Comed 将嵌入去噪任务描述为一个上下文多臂老虎机问题。对于项目嵌入的每一个特征,我们分配一个两臂神经元来确定是否应该保留组成语义,将整个过程描述为嵌入去噪。通过将去噪嵌入作为上下文信息进行聚合,进一步设计了一个由历史交互序列与目标项之间的相似性推导出的奖励函数,以逼近土匪有效学习的最大期望收益。考虑到训练串行操作机制可能效率低下,我们还设计了一种快速学习策略,以加速改进的顺序嵌入与神经网络并行机制之间的协同引导,从而得到更好的推荐。根据四项得到广泛承认的基准进行的全面试验证实了我们框架的效率和效力。 code 0
User Behavior Enriched Temporal Knowledge Graphs for Sequential Recommendation Hengchang Hu, Wei Guo, Xu Liu, Yong Liu, Ruiming Tang, Rui Zhang, MinYen Kan Natl Univ Singapore, Singapore, Singapore; Huawei Noahs Ark Lab, Singapore, Singapore; Ruizhang Info, Nanjing, Peoples R China Knowledge Graphs (KGs) enhance recommendations by providing external connectivity between items. However, there is limited research on distilling relevant knowledge in sequential recommendation, where item connections can change over time. To address this, we introduce the Temporal Knowledge Graph (TKG), which incorporates such dynamic features of user behaviors into the original KG while emphasizing sequential relationships. The TKG captures both patterns of entity dynamics (nodes) and structural dynamics (edges). Considering real-world applications with large-scale and rapidly evolving user behavior patterns, we propose an efficient two-phase framework called TKG-SRec, which strengthens Sequential Rec-ommendation with Temporal KGs. In the first phase, we learn dynamic entity embeddings using our novel Knowledge Evolution Network (KEN) that brings together pretrained static knowledge with evolving temporal knowledge. In the second stage, downstream sequential recommender models utilize these time-specific dynamic entity embeddings with compatible neural backbones like GRUs, Transformers, and MLPs. From our extensive experiments over four datasets, TKG-SRec outperforms the current state-of-the-art by a statistically significant 5% on average. Detailed analysis validates that such filtered temporal knowledge better adapts entity embedding for sequential recommendation. In summary, TKG-SRec provides an effective and efficient approach. 知识图表(KGs)通过提供项目之间的外部连接来增强建议。然而,在序贯推荐中提取相关知识的研究很有限,因为项目间的联系会随着时间的推移而改变。为了解决这个问题,我们引入了时态知识图(TKG) ,它在强调顺序关系的同时,将用户行为的动态特征整合到原始知识图中。TKG 捕获了实体动力学(节点)和结构动力学(边)两种模式。考虑到现实世界中具有大规模和快速发展的用户行为模式的应用程序,我们提出了一个有效的两阶段框架 TKG-SRec,该框架使用时态 KG 强化了顺序推荐。在第一阶段,我们使用新的知识进化网络(KEN)学习动态实体嵌入,该网络将预先训练的静态知识与进化的时态知识结合在一起。在第二阶段,下游顺序推荐模型利用这些时间特定的动态实体嵌入与兼容的神经骨干,如 GRU,变压器和 MLP。从我们对四个数据集的广泛实验来看,TKG-SRec 的性能平均比目前的最先进水平高出5% 。详细的分析验证了这种过滤后的时态知识更适合于实体嵌入的顺序推荐。总之,TKG-SRec 提供了一种有效的方法。 code 0
Global Heterogeneous Graph and Target Interest Denoising for Multi-behavior Sequential Recommendation Xuewei Li, Hongwei Chen, Jian Yu, Mankun Zhao, Tianyi Xu, Wenbin Zhang, Mei Yu Tianjin Univ, Informat & Network Ctr, Tianjin, Peoples R China; Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China Multi-behavior sequential recommendation (MBSR) predicts a user's next item of interest based on their interaction history across different behavior types. Although existing studies have proposed capturing the correlation between different types of behavior, two important challenges have not been explored: i) Dealing with heterogeneous item transitions (both global and local perspectives). ii) Mitigating the issue of noise that arises from the incorporation of auxiliary behaviors. To address these issues, we propose a novel solution, Global Heterogeneous Graph and Target Interest Denoising for Multi-behavior Sequential Recommendation (GHTID). In particular, we view the transitions between behavior types of items as different relationships and propose two heterogeneous graphs. By considering the relationship between items under different behavioral types of transformations, we propose two heterogeneous graph convolution modules and explicitly learn heterogeneous item transitions. Moreover, we utilize two attention networks to integrate long-term and short-term interests associated with the target behavior to alleviate the noisy interference of auxiliary behaviors. Extensive experiments on four real-world datasets demonstrate that our method outperforms other state-of-the-art methods. code 0
Attribute Simulation for Item Embedding Enhancement in Multi-interest Recommendation Yaokun Liu, Xiaowang Zhang, Minghui Zou, Zhiyong Feng Although multi-interest recommenders have achieved significant progress in the matching stage, our research reveals that existing models tend to exhibit an under-clustered item embedding space, which leads to a low discernibility between items and hampers item retrieval. This highlights the necessity for item embedding enhancement. However, item attributes, which serve as effective and straightforward side information for enhancement, are either unavailable or incomplete in many public datasets due to the labor-intensive nature of manual annotation tasks. This dilemma raises two meaningful questions: 1. Can we bypass manual annotation and directly simulate complete attribute information from the interaction data? And 2. If feasible, how to simulate attributes with high accuracy and low complexity in the matching stage? In this paper, we first establish an inspiring theoretical feasibility that the item-attribute correlation matrix can be approximated through elementary transformations on the item co-occurrence matrix. Then based on formula derivation, we propose a simple yet effective module, SimEmb (Item Embedding Enhancement via Simulated Attribute), in the multi-interest recommendation of the matching stage to implement our findings. By simulating attributes with the co-occurrence matrix, SimEmb discards the item ID-based embedding and employs the attribute-weighted summation for item embedding enhancement. Comprehensive experiments on four benchmark datasets demonstrate that our approach notably enhances the clustering of item embedding and significantly outperforms SOTA models with an average improvement of 25.59% on Recall@20. 虽然多兴趣推荐系统在匹配阶段已经取得了显著的进展,但是我们的研究发现现有的模型倾向于表现出一种欠聚类的项目嵌入空间,导致项目之间的差异性较低,从而阻碍了项目的检索。这突出了项目嵌入增强的必要性。然而,由于人工注释任务的劳动密集性,在许多公共数据集中,作为有效和直接的增强辅助信息的项属性要么不可用,要么不完整。这个困境提出了两个有意义的问题: 1。我们是否可以绕过手动注释,直接从交互数据模拟完整的属性信息?二。如果可行,如何在匹配阶段模拟高精度、低复杂度的属性?在本文中,我们首先建立了一个鼓舞人心的理论可行性,即项目-属性相关矩阵可以通过项目共现矩阵的初等变换来近似。然后在公式推导的基础上,在匹配阶段的多兴趣推荐中,提出了一个简单而有效的模块——模拟属性项嵌入增强模块(SimEmb) ,以实现我们的研究结果。通过使用共生矩阵模拟属性,SimEmb 放弃了基于项目 ID 的嵌入,采用属性加权和的方法对项目进行嵌入增强。通过对四个基准数据集的综合实验表明,该方法显著提高了项目嵌入的聚类性能,并明显优于 SOTA 模型,在 Recall@20上平均提高了25.59% 。 code 0
Deep Evolutional Instant Interest Network for CTR Prediction in Trigger-Induced Recommendation Zhibo Xiao, Luwei Yang, Tao Zhang, Wen Jiang, Wei Ning, Yujiu Yang The recommendation has been playing a key role in many industries, e.g., e-commerce, streaming media, social media, etc. Recently, a new recommendation scenario, called Trigger-Induced Recommendation (TIR), where users are able to explicitly express their instant interests via trigger items, is emerging as an essential role in many e-commerce platforms, e.g., Alibaba.com and Amazon. Without explicitly modeling the user's instant interest, traditional recommendation methods usually obtain sub-optimal results in TIR. Even though there are a few methods considering the trigger and target items simultaneously to solve this problem, they still haven't taken into account temporal information of user behaviors, the dynamic change of user instant interest when the user scrolls down and the interactions between the trigger and target items. To tackle these problems, we propose a novel method – Deep Evolutional Instant Interest Network (DEI2N), for click-through rate prediction in TIR scenarios. Specifically, we design a User Instant Interest Modeling Layer to predict the dynamic change of the intensity of instant interest when the user scrolls down. Temporal information is utilized in user behavior modeling. Moreover, an Interaction Layer is introduced to learn better interactions between the trigger and target items. We evaluate our method on several offline and real-world industrial datasets. Experimental results show that our proposed DEI2N outperforms state-of-the-art baselines. In addition, online A/B testing demonstrates the superiority over the existing baseline in real-world production environments. 推荐在许多行业都发挥了重要作用,例如电子商务、流媒体、社交媒体等。最近,一个新的推荐场景,称为触发诱导推荐(TIR) ,用户可以通过触发条目明确表达他们的即时兴趣,正在成为许多电子商务平台,如阿里巴巴和亚马逊的重要角色。传统的推荐方法在没有明确建立用户即时兴趣模型的情况下,往往在 TIR 中得到次优的推荐结果。针对这一问题,目前虽然有很多方法同时考虑了触发条目和目标条目,但都没有考虑到用户行为的时间信息、用户向下滚动时瞬间兴趣的动态变化以及触发条目和目标条目之间的交互作用。为了解决这些问题,我们提出了一种新的方法-深度进化即时兴趣网络(DEI2N) ,用于 TIR 场景中的点进率预测。具体来说,我们设计了一个用户即时兴趣建模层来预测用户向下滚动时即时兴趣强度的动态变化。时态信息用于用户行为建模。此外,还引入了交互层来学习触发器和目标项之间更好的交互。我们评估我们的方法在几个离线和现实世界的工业数据集。实验结果表明,我们提出的 DEI2N 性能优于最先进的基线。此外,在线 A/B 测试证明了在现实生产环境中优于现有基线的优越性。 code 0
PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models WeiCheng Chang, JyunYu Jiang, Jiong Zhang, Mutasem AlDarabsah, Choon Hui Teo, ChoJui Hsieh, HsiangFu Yu, S. V. N. Vishwanathan Embedding-based Retrieval Models (ERMs) have emerged as a promising framework for large-scale text retrieval problems due to powerful large language models. Nevertheless, fine-tuning ERMs to reach state-of-the-art results can be expensive due to the extreme scale of data as well as the complexity of multi-stages pipelines (e.g., pre-training, fine-tuning, distillation). In this work, we propose the PEFA framework, namely ParamEter-Free Adapters, for fast tuning of ERMs without any backward pass in the optimization. At index building stage, PEFA equips the ERM with a non-parametric k-nearest neighbor (kNN) component. At inference stage, PEFA performs a convex combination of two scoring functions, one from the ERM and the other from the kNN. Based on the neighborhood definition, PEFA framework induces two realizations, namely PEFA-XL (i.e., extra large) using double ANN indices and PEFA-XS (i.e., extra small) using a single ANN index. Empirically, PEFA achieves significant improvement on two retrieval applications. For document retrieval, regarding Recall@100 metric, PEFA improves not only pre-trained ERMs on Trivia-QA by an average of 13.2%, but also fine-tuned ERMs on NQ-320K by an average of 5.5%, respectively. For product search, PEFA improves the Recall@100 of the fine-tuned ERMs by an average of 5.3% and 14.5%, for PEFA-XS and PEFA-XL, respectively. Our code is available at https://github.com/ amzn/pecos/tree/mainline/examples/pefa-wsdm24 基于嵌入式的检索模型(ERM)由于其强大的语言模型功能而成为解决大规模文本检索问题的有力工具。然而,由于数据的极端规模以及多阶段管道的复杂性(例如,预训练、微调、精馏) ,为达到最先进的结果而对 ERM 进行微调可能是昂贵的。在这项工作中,我们提出了 PEFA 框架,即参数自由适配器,用于快速调优 ERM,而不需要在优化过程中进行任何反向传递。在索引建立阶段,PEFA 为 ERM 配备了一个非参数 k- 最近邻(kNN)分量。在推理阶段,PEFA 执行两个评分功能的凸组合,一个来自机构风险管理,另一个来自 kNN。在邻域定义的基础上,PEFA 框架引入了两种实现方法,即使用双神经网络指标的 PEFA-XL (即超大)和使用单神经网络指标的 PEFA-XS (即超小)。经验表明,PEFA 在两个检索应用程序上取得了显著的改进。就文献检索而言,在召回@100指标方面,PEFA 不仅平均提高了 Trivia-QA 上预先训练的 ERM 的13.2% ,而且平均提高了 NQ-320k 上的 ERM 的5.5% 。对于产品搜索,对于 PEFA-XS 和 PEFA-XL,PEFA 分别使经过微调的 ERM 的召回率平均提高5.3% 和14.5% 。我们的代码可以在 amzn/pecos/tree/mainline/example/pefa-wsdm24 https://github.com/找到 code 0
User Consented Federated Recommender System Against Personalized Attribute Inference Attack Qi Hu, Yangqiu Song Recommender systems can be privacy-sensitive. To protect users' private historical interactions, federated learning has been proposed in distributed learning for user representations. Using federated recommender (FedRec) systems, users can train a shared recommendation model on local devices and prevent raw data transmissions and collections. However, the recommendation model learned by a common FedRec may still be vulnerable to private information leakage risks, particularly attribute inference attacks, which means that the attacker can easily infer users' personal attributes from the learned model. Additionally, traditional FedRecs seldom consider the diverse privacy preference of users, leading to difficulties in balancing the recommendation utility and privacy preservation. Consequently, FedRecs may suffer from unnecessary recommendation performance loss due to over-protection and private information leakage simultaneously. In this work, we propose a novel user-consented federated recommendation system (UC-FedRec) to flexibly satisfy the different privacy needs of users by paying a minimum recommendation accuracy price. UC-FedRec allows users to self-define their privacy preferences to meet various demands and makes recommendations with user consent. Experiments conducted on different real-world datasets demonstrate that our framework is more efficient and flexible compared to baselines. 推荐系统可能对隐私敏感。为了保护用户的私有历史交互,联邦学习被提出用于用户表示的分布式学习。使用联邦推荐(FedRec)系统,用户可以在本地设备上培训共享推荐模型,并防止原始数据传输和收集。然而,一个普通的联邦快递推荐模型可能仍然容易受到私人信息泄露的攻击,特别是属性推理攻击,这意味着攻击者可以很容易地从推荐模型中推断出用户的个人属性。此外,传统的 FedRecs 很少考虑用户的不同隐私偏好,导致难以平衡推荐实用程序和隐私保护。因此,由于过度保护和私人信息泄露,美联储可能同时遭受不必要的推荐性能损失。在本研究中,我们提出一个新的使用者同意的联邦推荐系统(UC-FedRec) ,以支付最低的推荐准确度价格,灵活地满足使用者不同的隐私需求。UC-FedRec 允许用户自定义他们的隐私偏好,以满足不同的需求,并在用户同意的情况下提出建议。在不同的实际数据集上进行的实验表明,我们的框架比基线更有效和灵活。 code 0
Multi-Intent Attribute-Aware Text Matching in Searching Mingzhe Li, Xiuying Chen, Jing Xiang, Qishen Zhang, Changsheng Ma, Chenchen Dai, Jinxiong Chang, Zhongyi Liu, Guannan Zhang Text matching systems have become a fundamental service in most searching platforms. For instance, they are responsible for matching user queries to relevant candidate items, or rewriting the user-input query to a pre-selected high-performing one for a better search experience. In practice, both the queries and items often contain multiple attributes, such as the category of the item and the location mentioned in the query, which represent condensed key information that is helpful for matching. However, most of the existing works downplay the effectiveness of attributes by integrating them into text representations as supplementary information. Hence, in this work, we focus on exploring the relationship between the attributes from two sides. Since attributes from two ends are often not aligned in terms of number and type, we propose to exploit the benefit of attributes by multiple-intent modeling. The intents extracted from attributes summarize the diverse needs of queries and provide rich content of items, which are more refined and abstract, and can be aligned for paired inputs. Concretely, we propose a multi-intent attribute-aware matching model (MIM), which consists of three main components: attribute-aware encoder, multi-intent modeling, and intent-aware matching. In the attribute-aware encoder, the text and attributes are weighted and processed through a scaled attention mechanism with regard to the attributes' importance. Afterward, the multi-intent modeling extracts intents from two ends and aligns them. Herein, we come up with a distribution loss to ensure the learned intents are diverse but concentrated, and a kullback-leibler divergence loss that aligns the learned intents. Finally, in the intent-aware matching, the intents are evaluated by a self-supervised masking task, and then incorporated to output the final matching result. 文本匹配系统已经成为大多数搜索平台的基础服务。例如,它们负责将用户查询与相关候选项匹配,或者将用户输入查询重写为预先选定的高性能查询,以获得更好的搜索体验。实际上,查询和项通常都包含多个属性,例如项的类别和查询中提到的位置,这些属性表示有助于匹配的压缩键信息。然而,现有的大多数作品通过将属性作为补充信息集成到文本表示中来淡化属性的有效性。因此,本文着重从两个方面探讨属性之间的关系。由于来自两端的属性通常在数量和类型方面不一致,我们建议通过多意图建模来利用属性的优势。从属性中提取的意图总结了查询的不同需求,并提供了丰富的项目内容,这些内容更加精炼和抽象,并且可以对成对的输入进行对齐。具体地说,我们提出了一个多意图属性感知匹配模型(MIM) ,它由三个主要部分组成: 属性感知编码器、多意图建模和意图感知匹配。在属性感知编码器中,文本和属性根据属性的重要性通过缩放注意机制进行加权和处理。然后,多意图建模从两端提取意图并对齐它们。在这里,我们提出了一个分布损失,以确保学习意图的多样性,但集中,和一个 kullback-leibler 散度损失,调整了学习意图。最后,在意图感知匹配中,通过自监督掩蔽任务对意图进行评估,然后合并到一起输出最终的匹配结果。 code 0
Mixed Attention Network for Cross-domain Sequential Recommendation Guanyu Lin, Chen Gao, Yu Zheng, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li, Meng Wang In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain sequential recommendation models such as PiNet and DASL have a common drawback relying heavily on overlapped users in different domains, which limits their usage in practical recommender systems. In this paper, we propose a Mixed Attention Network (MAN) with local and global attention modules to extract the domain-specific and cross-domain information. Firstly, we propose a local/global encoding layer to capture the domain-specific/cross-domain sequential pattern. Then we propose a mixed attention layer with item similarity attention, sequence-fusion attention, and group-prototype attention to capture the local/global item similarity, fuse the local/global item sequence, and extract the user groups across different domains, respectively. Finally, we propose a local/global prediction layer to further evolve and combine the domain-specific and cross-domain interests. Experimental results on two real-world datasets (each with two domains) demonstrate the superiority of our proposed model. Further study also illustrates that our proposed method and components are model-agnostic and effective, respectively. The code and data are available at https://github.com/Guanyu-Lin/MAN. 在现代推荐系统中,顺序推荐利用按时间顺序排列的用户行为来提出有效的下一项推荐,这种推荐存在数据稀疏问题,尤其是对于新用户。一个很有前途的工作领域是跨域推荐,它用跨多个域的数据来训练模型,以提高数据稀缺域的性能。最近提出的跨域顺序推荐模型,如 PiNet 和 DASL,有一个共同的缺点,即严重依赖于不同领域的重叠用户,这限制了它们在实际推荐系统中的使用。本文提出了一种基于局部和全局注意模块的混合注意网络(MAN)来提取特定领域和跨领域的信息。首先,我们提出了一个局部/全局编码层来捕获域特定/跨域的序列模式。然后提出了一个具有项目相似性注意、序列融合注意和群体原型注意的混合注意层,分别捕获局部/全局项目相似性,融合局部/全局项目序列,提取不同领域的用户群。最后,我们提出了一个局部/全局预测层,以进一步发展和结合特定领域和跨领域的利益。在两个实际数据集上的实验结果表明了该模型的优越性。进一步的研究还表明,我们提出的方法和组件是模型无关的和有效的,分别。代码和数据可在 https://github.com/guanyu-lin/man 查阅。 code 0
Multi-Sequence Attentive User Representation Learning for Side-information Integrated Sequential Recommendation Xiaolin Lin, Jinwei Luo, Junwei Pan, Weike Pan, Zhong Ming, Xun Liu, Shudong Huang, Jie Jiang Shenzhen Technol Univ, Shenzhen Univ, Guangdong Lab Artificial Intelligence & Digital E, Shenzhen, Peoples R China; Tencent, Shenzhen, Peoples R China; Shenzhen Univ, Shenzhen, Peoples R China; Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China; Tencent Mus Entertainment, Shenzhen, Peoples R China Side-information integrated sequential recommendation incorporates supplementary information to alleviate the issue of data sparsity. The state-of-the-art works mainly leverage some side information to improve the attention calculation to learn user representation more accurately. However, there are still some limitations to be addressed in this topic. Most of them merely learn the user representation at the item level and overlook the association of the item sequence and the side-information sequences when calculating the attentions, which results in the incomprehensive learning of user representation. Some of them learn the user representations at both the item and side-information levels, but they still face the problem of insufficient optimization of multiple user representations. To address these limitations, we propose a novel model, i.e., Multi-Sequence Sequential Recommender (MSSR), which learns the user's multiple representations from diverse sequences. Specifically, we design a multi-sequence integrated attention layer to learn more attentive pairs than the existing works and adaptively fuse these pairs to learn user representation. Moreover, our user representation alignment module constructs the self-supervised signals to optimize the representations. Subsequently, they are further refined by our side information predictor during training. For item prediction, our MSSR extra considers the side information of the candidate item, enabling a comprehensive measurement of the user's preferences. Extensive experiments on four public datasets show that our MSSR outperforms eleven state-of-the-art baselines. Visualization and case study also demonstrate the rationality and interpretability of our MSSR. 边缘信息集成顺序推荐采用补充信息的方式来缓解数据稀疏性问题。最先进的作品主要是利用一些侧面信息来改进注意力计算,从而更准确地学习用户表征。但是,在这个主题中仍然有一些限制需要解决。它们大多只是在项目层次上学习用户表征,在计算注意力时忽略了项目序列与侧信息序列之间的关联,导致用户表征学习的不全面。其中一些在项目和侧信息级别学习用户表示,但是它们仍然面临多个用户表示不够优化的问题。为了解决这些局限性,我们提出了一种新的模型,即多序列序列推荐器(MSSR) ,它从不同的序列中学习用户的多重表示。具体来说,我们设计了一个多序列的集成注意层来学习比现有作品更多的注意对,并自适应地融合这些注意对来学习用户表征。此外,我们的用户表示对齐模块构造自我监督信号来优化表示。随后,在训练过程中通过我们的侧信息预测器对其进行了进一步细化。对于项目预测,我们的 MSSR 额外考虑了候选项目的侧面信息,从而能够全面测量用户的偏好。对四个公共数据集的大量实验表明,我们的微卫星定位系统优于十一个最先进的基准线。可视化和案例分析也证明了我们的 MSSR 的合理性和可解释性。 code 0
Intent Contrastive Learning with Cross Subsequences for Sequential Recommendation Xiuyuan Qin, Huanhuan Yuan, Pengpeng Zhao, Guanfeng Liu, Fuzhen Zhuang, Victor S. Sheng The user purchase behaviors are mainly influenced by their intentions (e.g., buying clothes for decoration, buying brushes for painting, etc.). Modeling a user's latent intention can significantly improve the performance of recommendations. Previous works model users' intentions by considering the predefined label in auxiliary information or introducing stochastic data augmentation to learn purposes in the latent space. However, the auxiliary information is sparse and not always available for recommender systems, and introducing stochastic data augmentation may introduce noise and thus change the intentions hidden in the sequence. Therefore, leveraging user intentions for sequential recommendation (SR) can be challenging because they are frequently varied and unobserved. In this paper, Intent contrastive learning with Cross Subsequences for sequential Recommendation (ICSRec) is proposed to model users' latent intentions. Specifically, ICSRec first segments a user's sequential behaviors into multiple subsequences by using a dynamic sliding operation and takes these subsequences into the encoder to generate the representations for the user's intentions. To tackle the problem of no explicit labels for purposes, ICSRec assumes different subsequences with the same target item may represent the same intention and proposes a coarse-grain intent contrastive learning to push these subsequences closer. Then, fine-grain intent contrastive learning is mentioned to capture the fine-grain intentions of subsequences in sequential behaviors. Extensive experiments conducted on four real-world datasets demonstrate the superior performance of the proposed ICSRec model compared with baseline methods. 用户的购买行为主要受其购买意图的影响(如购买衣服装饰、购买画笔等)。对用户的潜在意图进行建模可以显著提高推荐的性能。以往的研究通过考虑辅助信息中的预定义标签或引入随机数据增量在潜在空间中学习目的来模拟用户的意图。然而,辅助信息是稀疏的,并不总是可用于推荐系统,引入随机数据增强可能会引入噪声,从而改变意图隐藏在序列。因此,利用用户意图进行顺序推荐(SR)可能是具有挑战性的,因为它们经常变化和未被观察到。本文提出了一种基于交叉子序列的序列推荐意图对比学习(ICSRec)方法来模拟用户的潜在意图。具体来说,ICSRec 首先使用动态滑动操作将用户的连续行为分割成多个子序列,并将这些子序列带入编码器以生成用户意图的表示。为了解决目的不明确标签的问题,ICSRec 假设具有相同目标项的不同子序列可以表示相同的意图,并提出了一种粗粒度意图对比学习方法来使这些子序列更加接近。然后,提出细粒度意图对比学习来捕捉序列行为中子序列的细粒度意图。在四个实际数据集上进行的大量实验表明,与基线方法相比,所提出的 ICSRec 模型具有更好的性能。 code 0
Debiasing Sequential Recommenders through Distributionally Robust Optimization over System Exposure Jiyuan Yang, Yue Ding, Yidan Wang, Pengjie Ren, Zhumin Chen, Fei Cai, Jun Ma, Rui Zhang, Zhaochun Ren, Xin Xin Sequential recommendation (SR) models are typically trained on user-item interactions which are affected by the system exposure bias, leading to the user preference learned from the biased SR model not being fully consistent with the true user preference. Exposure bias refers to the fact that user interactions are dependent upon the partial items exposed to the user. Existing debiasing methods do not make full use of the system exposure data and suffer from sub-optimal recommendation performance and high variance. In this paper, we propose to debias sequential recommenders through Distributionally Robust Optimization (DRO) over system exposure data. The key idea is to utilize DRO to optimize the worst-case error over an uncertainty set to safeguard the model against distributional discrepancy caused by the exposure bias. The main challenge to apply DRO for exposure debiasing in SR lies in how to construct the uncertainty set and avoid the overestimation of user preference on biased samples. Moreover, how to evaluate the debiasing effect on biased test set is also an open question. To this end, we first introduce an exposure simulator trained upon the system exposure data to calculate the exposure distribution, which is then regarded as the nominal distribution to construct the uncertainty set of DRO. Then, we introduce a penalty to items with high exposure probability to avoid the overestimation of user preference for biased samples. Finally, we design a debiased self-normalized inverse propensity score (SNIPS) evaluator for evaluating the debiasing effect on the biased offline test set. We conduct extensive experiments on two real-world datasets to verify the effectiveness of the proposed methods. Experimental results demonstrate the superior exposure debiasing performance of proposed methods. Codes and data are available at \url{https://github.com/nancheng58/DebiasedSR_DRO}. 序贯推荐(SR)模型通常针对受系统暴露偏差影响的用户-项目交互进行训练,导致从偏向 SR 模型中学到的用户偏好与真实用户偏好不完全一致。暴露偏差是指用户交互依赖于暴露给用户的部分项目这一事实。现有的去偏方法没有充分利用系统曝光数据,推荐性能不理想,方差较大。本文提出了一种基于系统曝光数据的分布式鲁棒优化(DRO)方法来降低序列推荐器的偏差。其核心思想是利用 DRO 对不确定集上的最坏情况误差进行优化,以保护模型不受曝光偏差引起的分布差异的影响。如何构造不确定性集合,避免偏差样本对用户偏好的过高估计,是应用 DRO 进行 SR 曝光消偏的主要挑战。此外,如何评价偏置测试集的去偏效果也是一个悬而未决的问题。为此,我们首先介绍了一个基于系统曝光数据训练的曝光模拟器来计算曝光分布,然后将其视为标称分布来构造 DRO 的不确定度集。然后,我们引入了一个惩罚项目的高暴露概率,以避免过高估计的用户偏好偏差样本。最后,我们设计了一个去偏的自标准化逆倾向得分(SNIPS)评估器来评估有偏离离线测试集的去偏效果。为了验证该方法的有效性,我们在两个实际数据集上进行了大量的实验。实验结果表明,该方法具有较好的曝光消偏性能。代码和数据可在 url { https://github.com/nancheng58/debiasedsr_dro }获得。 code 0
Applications of LLMs in E-Commerce Search and Product Knowledge Graph: The DoorDash Case Study Sudeep Das, Raghav Saboo, Chaitanya S. K. Vadrevu, Bruce Wang, Steven Xu DoorDash Inc, Seattle, WA USA; DoorDash Inc, New York, NY USA; DoorDash Inc, San Francisco, CA 94107 USA Extracting knowledge from unstructured or semi-structured textual information is essential for the machine learning applications that power DoorDash's search experience, and the development and maintenance of its product knowledge graph. Large language models (LLMs) have opened up new possibilities for utilizing their power in these areas, replacing or complementing traditional natural language processing methods. LLMs are also proving to be useful in the label and annotation generation process, which is critical for these use cases. In this talk, we will provide a high-level overview of how we incorporated LLMs for search relevance and product understanding use cases, as well as the key lessons learned and challenges faced during their practical implementation. code 0
AAGenRec: A Novel Approach for Mitigating Inter-task Interference in Multi-task Optimization of Sequential Behavior Modeling Jiawei Zhang, Shimin Yang, Liang Shen Meituan, Beijing, Peoples R China Multi-task optimization is an emerging research field in recommender systems that aims to enhance the recommendation performance across multiple tasks. Various methodologies have been introduced to address aspects like balancing task weights, handling gradient conflicts, and achieving Pareto optimality. These approaches have shown promise in specific contexts, but are not well-suited for real-world scenarios that involve user sequential behaviors. To address this gap, we present AAGenRec, a novel and effective solution for sequential behavior modeling within multi-task recommender systems, inspired by concepts from acoustic attenuation and genetic differences. Specifically, AAGenRec leverages an established genetic distance method to quantify the dissimilarity between tasks, then introduces an impact attenuation mechanism to mitigate the uncertain task interference in multi-task optimization. Extensive experiments conducted on public e-commerce datasets demonstrate the effectiveness of AAGenRec. 多任务优化是推荐系统中一个新兴的研究领域,旨在提高多任务推荐的性能。我们引入了各种方法来解决诸如平衡任务权重、处理梯度冲突和实现帕累托最优等问题。这些方法已经在特定的上下文中显示了希望,但是不太适合涉及用户顺序行为的真实场景。为了解决这一差距,我们提出了 AAGenRec,一个新颖而有效的解决方案,用于多任务推荐系统中的顺序行为建模,灵感来自声衰减和遗传差异的概念。具体来说,AAGenRec 利用已建立的遗传距离方法来量化任务间的差异,然后引入冲击衰减机制来缓解多任务优化中的不确定性任务干扰。在公共电子商务数据集上进行的大量实验证明了 AAGenRec 的有效性。 code 0
To Copy, or not to Copy; That is a Critical Issue of the Output Softmax Layer in Neural Sequential Recommenders HawShiuan Chang, Nikhil Agarwal, Andrew McCallum Recent studies suggest that the existing neural models have difficulty handling repeated items in sequential recommendation tasks. However, our understanding of this difficulty is still limited. In this study, we substantially advance this field by identifying a major source of the problem: the single hidden state embedding and static item embeddings in the output softmax layer. Specifically, the similarity structure of the global item embeddings in the softmax layer sometimes forces the single hidden state embedding to be close to new items when copying is a better choice, while sometimes forcing the hidden state to be close to the items from the input inappropriately. To alleviate the problem, we adapt the recently-proposed softmax alternatives such as softmax-CPR to sequential recommendation tasks and demonstrate that the new softmax architectures unleash the capability of the neural encoder on learning when to copy and when to exclude the items from the input sequence. By only making some simple modifications on the output softmax layer for SASRec and GRU4Rec, softmax-CPR achieves consistent improvement in 12 datasets. With almost the same model size, our best method not only improves the average NDCG@10 of GRU4Rec in 5 datasets with duplicated items by 10% (4%-17% individually) but also improves 7 datasets without duplicated items by 24% (8%-39%)! 最近的研究表明,现有的神经模型难以处理重复项目的顺序推荐任务。然而,我们对这个困难的理解仍然是有限的。在这项研究中,我们通过识别问题的一个主要来源: 单一的隐藏状态嵌入和静态项嵌入在输出 softmax 层大大推进了这个领域。具体来说,Softmax 层中全局项嵌入的相似性结构有时迫使单个隐藏状态嵌入接近新项,而复制是更好的选择,有时迫使隐藏状态不适当地接近来自输入的项。为了缓解这一问题,我们将最近提出的 softmax 替代方案(如 softmax-CPR)适用于顺序推荐任务,并证明新的 softmax 架构释放了神经编码器学习何时复制和何时从输入序列中排除项目的能力。通过对 SASRec 和 GRU4Rec 的输出 softmax 层进行一些简单的修改,softmax-CPR 在12个数据集中实现了一致的改进。在几乎相同的模型大小下,我们的最佳方法不仅将5个重复项目数据集中 GRU4Rec 的平均 NDCG@10提高了10% (单独4%-17%) ,而且将7个没有重复项目的数据集提高了24% (8%-39%) ! code 0
Budgeted Embedding Table For Recommender Systems Yunke Qu, Tong Chen, Quoc Viet Hung Nguyen, Hongzhi Yin At the heart of contemporary recommender systems (RSs) are latent factor models that provide quality recommendation experience to users. These models use embedding vectors, which are typically of a uniform and fixed size, to represent users and items. As the number of users and items continues to grow, this design becomes inefficient and hard to scale. Recent lightweight embedding methods have enabled different users and items to have diverse embedding sizes, but are commonly subject to two major drawbacks. Firstly, they limit the embedding size search to optimizing a heuristic balancing the recommendation quality and the memory complexity, where the trade-off coefficient needs to be manually tuned for every memory budget requested. The implicitly enforced memory complexity term can even fail to cap the parameter usage, making the resultant embedding table fail to meet the memory budget strictly. Secondly, most solutions, especially reinforcement learning based ones derive and optimize the embedding size for each each user/item on an instance-by-instance basis, which impedes the search efficiency. In this paper, we propose Budgeted Embedding Table (BET), a novel method that generates table-level actions (i.e., embedding sizes for all users and items) that is guaranteed to meet pre-specified memory budgets. Furthermore, by leveraging a set-based action formulation and engaging set representation learning, we present an innovative action search strategy powered by an action fitness predictor that efficiently evaluates each table-level action. Experiments have shown state-of-the-art performance on two real-world datasets when BET is paired with three popular recommender models under different memory budgets. 当代推荐系统(RS)的核心是为用户提供高质量推荐体验的潜在因素模型。这些模型使用嵌入向量来表示用户和项目,嵌入向量通常具有统一和固定的大小。随着用户和项目数量的持续增长,这种设计变得效率低下且难以扩展。最近的轻量级嵌入方法允许不同的用户和项具有不同的嵌入大小,但是通常有两个主要缺点。首先,它们将嵌入大小搜索限制为优化一种启发式算法,以平衡推荐质量和内存复杂度,其中需要针对每个请求的内存预算手动调整折衷系数。隐式强制的内存复杂度项甚至可能无法限制参数的使用,使得结果嵌入表无法严格满足内存预算。其次,大多数解决方案,尤其是基于强化学习的解决方案,会逐个实例地推导和优化每个用户/条目的嵌入大小,这会影响搜索效率。在本文中,我们提出了预算嵌入表(BET) ,一种新的方法,生成表级的行为(即,嵌入大小的所有用户和项目) ,是保证满足预先指定的内存预算。此外,通过利用基于集合的动作制定和参与集合表示学习,我们提出了一个创新的动作搜索策略,由动作适应性预测器驱动,有效地评估每个表级动作。实验表明,当 BET 与三个流行的推荐模型在不同的内存预算下配对时,在两个真实世界的数据集上有最先进的性能。 code 0
AutoPooling: Automated Pooling Search for Multi-valued Features in Recommendations He Wei, Yuekui Yang, Shaoping Ma, Haiyang Wu, Yangyang Tang, Meixi Liu, Yang Zhang Tencent TEG, Machine Learning Platform Dept, Beijing, Peoples R China; Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China Large-scale recommender systems usually contain hundreds of multi-valued feature fields, which have different number of values in each field. For easier computation in traditional fixed-shape neural networks, pooling operators are widely used to compress the multi-valued feature into a fixed-dimension vector. Most existing works set a single pooling method for all fields, but this leads to sub-optimal results, because different feature fields have different information distributions and thus require different pooling methods. In this work, we propose an AutoML-based framework, called AutoPooling, which can automatically and efficiently search for the optimal pooling operator for each multivalued feature. Specifically, learnable weights are assigned to all candidate pooling operators in each feature field. Then an AutoML-based algorithm is used to learn both model parameters and the field-aware weights. Finally, the optimal pooling operator can be acquired based on the associated weights. We evaluate the proposed framework on both public and industrial datasets. The results show that AutoPooling significantly outperforms the benchmarks. Further experiment results show that our method is robust in various deep recommend models and different search spaces. code 0
Linear Recurrent Units for Sequential Recommendation Zhenrui Yue, Yueqi Wang, Zhankui He, Huimin Zeng, Julian J. McAuley, Dong Wang State-of-the-art sequential recommendation relies heavily on self-attention-based recommender models. Yet such models are computationally expensive and often too slow for real-time recommendation. Furthermore, the self-attention operation is performed at a sequence-level, thereby making low-cost incremental inference challenging. Inspired by recent advances in efficient language modeling, we propose linear recurrent units for sequential recommendation (LRURec). Similar to recurrent neural networks, LRURec offers rapid inference and can achieve incremental inference on sequential inputs. By decomposing the linear recurrence operation and designing recursive parallelization in our framework, LRURec provides the additional benefits of reduced model size and parallelizable training. Moreover, we optimize the architecture of LRURec by implementing a series of modifications to address the lack of non-linearity and improve training dynamics. To validate the effectiveness of our proposed LRURec, we conduct extensive experiments on multiple real-world datasets and compare its performance against state-of-the-art sequential recommenders. Experimental results demonstrate the effectiveness of LRURec, which consistently outperforms baselines by a significant margin. Results also highlight the efficiency of LRURec with our parallelized training paradigm and fast inference on long sequences, showing its potential to further enhance user experience in sequential recommendation. 最先进的顺序推荐在很大程度上依赖于基于自我注意的推荐模型。然而,这样的模型在计算上是昂贵的,而且对于实时推荐来说往往太慢了。此外,自我注意操作是在序列级上执行的,因此使得低成本的增量推理具有挑战性。受到最近在有效语言建模方面的进展的启发,我们提出了用于顺序推荐的线性递归单元(LRURec)。类似于递归神经网络,LRURec 提供了快速推理,并可以实现对顺序输入的增量推理。通过在我们的框架中分解线性递归操作和设计递归并行,LRURec 提供了减少模型大小和可并行训练的额外好处。此外,我们通过实现一系列的修改来优化 LRURec 的体系结构,以解决缺乏非线性和改善训练动态性的问题。为了验证我们提出的 LRURec 的有效性,我们在多个真实世界的数据集上进行了广泛的实验,并将其性能与最先进的顺序推荐进行了比较。实验结果证明了 LRURec 算法的有效性,该算法的性能始终优于基线算法。结果还突出了 LRURec 的效率与我们的并行训练范式和快速推断的长序列,显示了它的潜力,进一步提高用户体验的顺序推荐。 code 0
CharmBana: Progressive Responses with Real-Time Internet Search for Knowledge-Powered Conversations Revanth Gangi Reddy, Sharath Chandra Etagi Suresh, Hao Bai, Wentao Yao, Mankeerat Sidhu, Karan Aggarwal, Prathamesh Sonawane, ChengXiang Zhai UIUC, Champaign, IL 61820 USA Chatbots are often hindered by the latency associated with integrating real-time web search results, compromising user experience. To overcome this, we present CharmBana, an innovative social chatbot that introduces the use of progressive response generation to effortlessly blend search results into the bot's responses, while ensuring low response latency. The use of progressive responses is especially beneficial for voice-based chatbots, where the preliminary response buys time for a detailed follow-up, ensuring a smooth user interaction. As a result, our method not only cuts down user waiting times by 50% but also generates more relevant, precise, and engaging search inquiries. When tested in the Alexa Prize Socialbot Grand Challenge 5, our chatbot employing progressive responses consistently received higher user ratings. 聊天机器人经常受到与整合实时网络搜索结果相关的延迟的阻碍,从而影响用户体验。为了克服这个问题,我们介绍了 CharmBana,一个创新的社交聊天机器人,它引入了渐进式响应生成的使用,可以毫不费力地将搜索结果融入机器人的响应中,同时确保低响应延迟。使用渐进式响应对基于语音的聊天机器人尤其有益,初步响应可以为详细的后续行动争取时间,确保用户交互的顺利进行。因此,我们的方法不仅减少了50% 的用户等待时间,而且还产生了更多的相关性,准确性和参与搜索查询。当测试在 Alexa 奖社交机器人大挑战5,我们的聊天机器人采用进步的反应一贯收到较高的用户评分。 code 0
SIRUP: Search-based Book Recommendation Playground Ghazaleh Haratinezhad Torbati, Anna Tigunova, Gerhard Weikum Max Planck Inst Informat, Saarbrucken, Germany This work presents a playground platform to demonstrate and interactively explore a suite of methods for utilizing user review texts to generate book recommendations. The focus is on search-based settings where the user provides situative context by focusing on a genre, a given item, her full user profile, or a newly formulated query. The platform allows exploration over two large datasets with various methods for creating concise user profiles. 这项工作提出了一个操场平台,演示和互动探索一套方法,利用用户评论文本生成图书推荐。重点是基于搜索的设置,其中用户通过关注一种类型、一个给定的项目、她的完整用户配置文件或一个新的查询提供情景上下文。该平台允许探索两个大型数据集与各种方法创建简明的用户配置文件。 code 0
Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs Behnam Rahdari, Hao Ding, Ziwei Fan, Yifei Ma, Zhuotong Chen, Anoop Deoras, Branislav Kveton The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas of aspect-based explanation and chain-of-thought prompting to generate explanations through intermediate reasoning steps. In this paper, we share our experience in building the framework and present an interactive demonstration for exploring our results. 大型语言模型(LLM)的独特功能,例如自然语言文本生成能力,将它们定位为为建议提供解释的强有力候选者。然而,尽管 LLM 的规模很大,大多数现有的模型都难以可靠地提供零射击的解释。为了解决这个问题,我们提出了一个名为逻辑支架的框架,它结合了基于方面的解释和思想链的提示,通过中间的推理步骤来生成解释。在本文中,我们分享了我们在构建框架方面的经验,并为探索我们的结果提供了一个交互式示范。 code 0
Effective and Efficient Transformer Models for Sequential Recommendation Aleksandr V. Petrov Univ Glasgow, Glasgow, Lanark, Scotland Sequential Recommender Systems use the order of user-item interactions to predict the next item in the sequence. This task is similar to Language Modelling, where the goal is to predict the next token based on the sequence of past tokens. Therefore, adaptations of language models, and, in particular, Transformer-based models, achieved state-of-the-art results for a sequential recommendation. However, despite similarities, the sequential recommendation problem poses a number of specific challenges not present in Language Modelling. These challenges include the large catalogue size of real-world recommender systems, which increases GPU memory requirements and makes the training and the inference of recommender models slow. Another challenge is that a good recommender system should focus not only on the accuracy of recommendation but also on additional metrics, such as diversity and novelty, which makes the direct adaptation of language model training strategies problematic. Our research focuses on solving these challenges. In this doctoral consortium abstract, we briefly describe the motivation and background for our work and then pose research questions and discuss current progress towards solving the described problems. 序列推荐系统使用用户-项目交互的顺序来预测序列中的下一个项目。此任务类似于语言建模,其目标是根据过去的令牌序列预测下一个令牌。因此,对语言模型的修改,特别是基于 Transformer 的模型,为顺序推荐获得了最先进的结果。然而,尽管有相似之处,顺序推荐问题提出了许多语言建模中没有出现的具体挑战。这些挑战包括现实世界中推荐系统的大量目录,这增加了 GPU 的内存需求,使得推荐模型的培训和推断变得缓慢。另一个挑战是,一个好的推荐系统不仅应该关注推荐的准确性,还应该关注其他指标,如多样性和新颖性,这使得直接调整语言模型培训策略成为一个问题。我们的研究侧重于解决这些挑战。在这篇博士论文摘要中,我们简要描述了我们工作的动机和背景,然后提出了研究问题,并讨论了目前在解决所描述的问题方面的进展。 code 0
CDRNP: Cross-Domain Recommendation to Cold-Start Users via Neural Process Xiaodong Li, Jiawei Sheng, Jiangxia Cao, Wenyuan Zhang, Quangang Li, Tingwen Liu Cross-domain recommendation (CDR) has been proven as a promising way to tackle the user cold-start problem, which aims to make recommendations for users in the target domain by transferring the user preference derived from the source domain. Traditional CDR studies follow the embedding and mapping (EMCDR) paradigm, which transfers user representations from the source to target domain by learning a user-shared mapping function, neglecting the user-specific preference. Recent CDR studies attempt to learn user-specific mapping functions in meta-learning paradigm, which regards each user's CDR as an individual task, but neglects the preference correlations among users, limiting the beneficial information for user representations. Moreover, both of the paradigms neglect the explicit user-item interactions from both domains during the mapping process. To address the above issues, this paper proposes a novel CDR framework with neural process (NP), termed as CDRNP. Particularly, it develops the meta-learning paradigm to leverage user-specific preference, and further introduces a stochastic process by NP to capture the preference correlations among the overlapping and cold-start users, thus generating more powerful mapping functions by mapping the user-specific preference and common preference correlations to a predictive probability distribution. In addition, we also introduce a preference remainer to enhance the common preference from the overlapping users, and finally devises an adaptive conditional decoder with preference modulation to make prediction for cold-start users with items in the target domain. Experimental results demonstrate that CDRNP outperforms previous SOTA methods in three real-world CDR scenarios. 跨域推荐已被证明是一种解决用户冷启动问题的有效方法,其目的是通过传递源域中的用户偏好来为目标域中的用户提供推荐。传统的 CDR 研究遵循嵌入与映射(EMCDR)范式,通过学习用户共享映射函数,将用户表征从源域转移到目标域,忽略了用户特定的偏好。最近的 CDR 研究试图在元学习范式中学习用户特定的映射函数,它将每个用户的 CDR 视为一个单独的任务,但忽略了用户之间的偏好相关性,限制了用户表征的有益信息。此外,这两个范例在映射过程中都忽略了来自两个领域的显式用户-项目交互。为了解决上述问题,本文提出了一种新的带有神经过程的 CDR 框架,称为 CDRNP。特别是,它开发了元学习范式来利用用户特定的偏好,并进一步引入了 NP 的随机过程来捕捉重叠和冷启动用户之间的偏好相关性,从而通过将用户特定的偏好和共同的偏好相关性映射到预测概率分布来产生更强大的映射功能。此外,我们还引入了偏好余数来增强重叠用户的共同偏好,最后设计了一个具有偏好调制的自适应条件解码器来对目标域内的冷启动用户进行预测。实验结果表明,在三种实际的 CDR 场景下,CDRNP 方法的性能优于以往的 SOTA 方法。 code 0
Calibration-compatible Listwise Distillation of Privileged Features for CTR Prediction Xiaoqiang Gui, Yueyao Cheng, XiangRong Sheng, Yunfeng Zhao, Guoxian Yu, Shuguang Han, Yuning Jiang, Jian Xu, Bo Zheng In machine learning systems, privileged features refer to the features that are available during offline training but inaccessible for online serving. Previous studies have recognized the importance of privileged features and explored ways to tackle online-offline discrepancies. A typical practice is privileged features distillation (PFD): train a teacher model using all features (including privileged ones) and then distill the knowledge from the teacher model using a student model (excluding the privileged features), which is then employed for online serving. In practice, the pointwise cross-entropy loss is often adopted for PFD. However, this loss is insufficient to distill the ranking ability for CTR prediction. First, it does not consider the non-i.i.d. characteristic of the data distribution, i.e., other items on the same page significantly impact the click probability of the candidate item. Second, it fails to consider the relative item order ranked by the teacher model's predictions, which is essential to distill the ranking ability. To address these issues, we first extend the pointwise-based PFD to the listwise-based PFD. We then define the calibration-compatible property of distillation loss and show that commonly used listwise losses do not satisfy this property when employed as distillation loss, thus compromising the model's calibration ability, which is another important measure for CTR prediction. To tackle this dilemma, we propose Calibration-compatible LIstwise Distillation (CLID), which employs carefully-designed listwise distillation loss to achieve better ranking ability than the pointwise-based PFD while preserving the model's calibration ability. We theoretically prove it is calibration-compatible. Extensive experiments on public datasets and a production dataset collected from the display advertising system of Alibaba further demonstrate the effectiveness of CLID. 在机器学习系统中,特权特性是指在离线培训期间可用但在线服务无法访问的特性。以前的研究已经认识到特权功能的重要性,并探索了解决在线和离线差异的方法。一个典型的实践是特权特征提取(PFD) : 使用所有特征(包括特权特征)训练一个教师模型,然后使用学生模型(不包括特权特征)从教师模型中提取知识,然后用于在线服务。在实际应用中,PFD 通常采用逐点交叉熵损失。然而,这种损失不足以提取 CTR 预测的排名能力。首先,它没有考虑数据分布的非 ID 特征,也就是说,同一页面上的其他条目显著影响候选条目的点击概率。其次,没有考虑教师模型预测排名的相对项目顺序,这对提取排名能力至关重要。为了解决这些问题,我们首先将基于点的 PFD 扩展到基于列表的 PFD。然后定义了蒸馏损失的标定相容性,指出常用的列表损失作为蒸馏损失时不能满足这一性质,从而影响了模型的标定能力,这是 CTR 预测的另一个重要措施。为了解决这一难题,我们提出了与标定兼容的列表蒸馏(CLID)模型,该模型采用精心设计的列表蒸馏损耗来获得比基于点的 PFD 更好的排序能力,同时保留了模型的标定能力。我们从理论上证明了它是校准兼容的。从阿里巴巴的显示广告系统收集的公共数据集和生产数据集进行了广泛的实验,进一步证明了 CLID 的有效性。 code 0
Ranking with Long-Term Constraints Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choice data may only improve short-term engagement, but not the long-term sustainability of the platform and the long-term benefits to its users, content providers, and other stakeholders. In this paper, we thus develop a new framework in which decision makers (e.g., platform operators, regulators, users) can express long-term goals for the behavior of the platform (e.g., fairness, revenue distribution, legal requirements). These goals take the form of exposure or impact targets that go well beyond individual sessions, and we provide new control-based algorithms to achieve these goals. In particular, the controllers are designed to achieve the stated long-term goals with minimum impact on short-term engagement. Beyond the principled theoretical derivation of the controllers, we evaluate the algorithms on both synthetic and real-world data. While all controllers perform well, we find that they provide interesting trade-offs in efficiency, robustness, and the ability to plan ahead. 用户通过他们的选择(例如点击,购买)提供的反馈是最常见的数据类型之一,可用于培训搜索和推荐算法。然而,基于选择数据的近视培训系统只能改善短期参与,而不能改善平台的长期可持续性及其用户、内容提供者和其他利益相关者的长期利益。在本文中,我们提出了一个新的框架,在这个框架中决策者(例如,平台运营商,监管者,用户)可以表达平台行为的长期目标(例如,公平性,收入分配,法律要求)。这些目标采取暴露或影响目标的形式,远远超出了单个会话的范围,我们提供了新的基于控制的算法来实现这些目标。特别是,控制器的设计是为了实现既定的长期目标,对短期参与的影响最小。除了对控制器进行原理性的理论推导外,我们还对这些算法进行了合成数据和实际数据的评估。虽然所有控制器都表现良好,但我们发现它们在效率、健壮性和提前计划能力方面提供了有趣的权衡。 code 0
Exploring Adapter-based Transfer Learning for Recommender Systems: Empirical Studies and Practical Insights Junchen Fu, Fajie Yuan, Yu Song, Zheng Yuan, Mingyue Cheng, Shenghui Cheng, Jiaqi Zhang, Jie Wang, Yunzhu Pan Adapters, a plug-in neural network module with some tunable parameters, have emerged as a parameter-efficient transfer learning technique for adapting pre-trained models to downstream tasks, especially for natural language processing (NLP) and computer vision (CV) fields. Meanwhile, learning recommendation models directly from raw item modality features -- e.g., texts of NLP and images of CV -- can enable effective and transferable recommender systems (called TransRec). In view of this, a natural question arises: can adapter-based learning techniques achieve parameter-efficient TransRec with good performance? To this end, we perform empirical studies to address several key sub-questions. First, we ask whether the adapter-based TransRec performs comparably to TransRec based on standard full-parameter fine-tuning? does it hold for recommendation with different item modalities, e.g., textual RS and visual RS. If yes, we benchmark these existing adapters, which have been shown to be effective in NLP and CV tasks, in the item recommendation settings. Third, we carefully study several key factors for the adapter-based TransRec in terms of where and how to insert these adapters? Finally, we look at the effects of adapter-based TransRec by either scaling up its source training data or scaling down its target training data. Our paper provides key insights and practical guidance on unified & transferable recommendation -- a less studied recommendation scenario. We promise to release all code & datasets for future research. 适配器是一种具有可调参数的插入式神经网络模块,已经成为一种参数高效的传递学习技术,用于将预先训练好的模型适应下游任务,特别是在自然语言处理(NLP)和计算机视觉(CV)领域。同时,直接从原始项目模式特征(如 NLP 文本和简历图像)学习推荐模型,可以实现有效和可转移的推荐系统(称为 TransRec)。有鉴于此,一个自然而然的问题出现了: 基于适配器的学习技术能否以良好的性能实现参数有效的 TransRec?为此,我们进行了实证研究,以解决几个关键的子问题。首先,我们询问基于适配器的 TransRec 是否与基于标准全参数微调的 TransRec 性能相当?它是否适用于不同项目模式的推荐,例如,文本 RS 和视觉 RS。如果是,我们基准测试这些现有的适配器,已被证明是有效的自然语言处理和简历任务,在项目推荐设置。第三,我们仔细研究了基于适配器的 TransRec 在何处以及如何插入这些适配器方面的几个关键因素?最后,我们通过扩展源训练数据或缩小目标训练数据来研究基于适配器的 TransRec 的效果。我们的论文提供了关于统一和可转移推荐的关键见解和实践指导——一个研究较少的推荐场景。我们承诺为未来的研究发布所有的代码和数据集。 code 0
PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation Chunjing Gan, Bo Huang, Binbin Hu, Jian Ma, Zhiqiang Zhang, Jun Zhou, Guannan Zhang, Wenliang Zhong To help merchants/customers to provide/access a variety of services through miniapps, online service platforms have occupied a critical position in the effective content delivery, in which how to recommend items in the new domain launched by the service provider for customers has become more urgent. However, the non-negligible gap between the source and diversified target domains poses a considerable challenge to cross-domain recommendation systems, which often leads to performance bottlenecks in industrial settings. While entity graphs have the potential to serve as a bridge between domains, rudimentary utilization still fail to distill useful knowledge and even induce the negative transfer issue. To this end, we propose PEACE, a Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation. For domain gap bridging, PEACE is built upon a multi-interest and entity-oriented pre-training architecture which could not only benefit the learning of generalized knowledge in a multi-granularity manner, but also help leverage more structural information in the entity graph. Then, we bring the prototype learning into the pre-training over source domains, so that representations of users and items are greatly improved by the contrastive prototype learning module and the prototype enhanced attention mechanism for adaptive knowledge utilization. To ease the pressure of online serving, PEACE is carefully deployed in a lightweight manner, and significant performance improvements are observed in both online and offline environments. 为协助商户/客户透过迷你应用程式提供/使用多元化的服务,网上服务平台在有效提供内容方面占有重要地位,因此如何向客户推荐服务供应商推出的新界别项目,已变得更为迫切。然而,来源域和多样化目标域之间不可忽视的差距对跨域推荐系统提出了相当大的挑战,这往往导致行业环境中的性能瓶颈。虽然实体图有可能作为领域之间的桥梁,但基本的利用仍然不能提取有用的知识,甚至引起负迁移问题。为此,我们提出了一个面向跨域推荐的原型学习增强可转移框架 PEACE。为了缩小领域间的差距,PEACE 建立在一个多兴趣和面向实体的预培训架构之上,它不仅有利于以多粒度方式学习广义知识,而且有助于利用实体图中更多的结构信息。然后,将原型学习引入到源域上的预训练中,通过对比原型学习模块和原型增强注意机制对用户和项目进行自适应知识利用,从而大大改善了用户和项目的表示。为了减轻在线服务的压力,PEACE 以一种轻量级的方式精心部署,并且在两种在线和离线环境中都观察到了显著的性能改善。 code 0
Motif-based Prompt Learning for Universal Cross-domain Recommendation Bowen Hao, Chaoqun Yang, Lei Guo, Junliang Yu, Hongzhi Yin Cross-Domain Recommendation (CDR) stands as a pivotal technology addressing issues of data sparsity and cold start by transferring general knowledge from the source to the target domain. However, existing CDR models suffer limitations in adaptability across various scenarios due to their inherent complexity. To tackle this challenge, recent advancements introduce universal CDR models that leverage shared embeddings to capture general knowledge across domains and transfer it through "Multi-task Learning" or "Pre-train, Fine-tune" paradigms. However, these models often overlook the broader structural topology that spans domains and fail to align training objectives, potentially leading to negative transfer. To address these issues, we propose a motif-based prompt learning framework, MOP, which introduces motif-based shared embeddings to encapsulate generalized domain knowledge, catering to both intra-domain and inter-domain CDR tasks. Specifically, we devise three typical motifs: butterfly, triangle, and random walk, and encode them through a Motif-based Encoder to obtain motif-based shared embeddings. Moreover, we train MOP under the "Pre-training & Prompt Tuning" paradigm. By unifying pre-training and recommendation tasks as a common motif-based similarity learning task and integrating adaptable prompt parameters to guide the model in downstream recommendation tasks, MOP excels in transferring domain knowledge effectively. Experimental results on four distinct CDR tasks demonstrate the effectiveness of MOP than the state-of-the-art models. 跨域推荐(CDR)是解决数据稀疏性和冷启动问题的关键技术,它通过将一般知识从数据源转移到目标域来实现。然而,现有的 CDR 模型由于其固有的复杂性,在不同场景的适应性方面存在局限性。为了应对这一挑战,最近的进展引入了通用的 CDR 模型,利用共享嵌入来获取跨领域的一般知识,并通过“多任务学习”或“预训练,微调”范例进行转移。然而,这些模型往往忽视了更广泛的结构拓扑,跨越领域和未能调整培训目标,潜在地导致负迁移。为了解决这些问题,我们提出了一个基于主题的快速学习框架 MOP,它引入了基于主题的共享嵌入来封装广义领域知识,同时满足领域内和领域间的 CDR 任务。具体来说,我们设计了三种典型的主题: 蝴蝶、三角形和随机游走,并通过基于主题的编码器对它们进行编码,以获得基于主题的共享嵌入。此外,我们训练 MOP 在“预先训练和即时调整”的范式。通过将预训练任务和推荐任务统一为基于主题的相似性学习任务,并结合自适应提示参数对下游推荐任务的模型进行指导,MOP 在领域知识的有效传递方面表现突出。在四个不同的 CDR 任务上的实验结果证明了 MOP 的有效性,而不是最先进的模型。 code 0
C²DR: Robust Cross-Domain Recommendation based on Causal Disentanglement Menglin Kong, Jia Wang, Yushan Pan, Haiyang Zhang, Muzhou Hou code 0
Inverse Learning with Extremely Sparse Feedback for Recommendation Guanyu Lin, Chen Gao, Yu Zheng, Yinfeng Li, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li Modern personalized recommendation services often rely on user feedback, either explicit or implicit, to improve the quality of services. Explicit feedback refers to behaviors like ratings, while implicit feedback refers to behaviors like user clicks. However, in the scenario of full-screen video viewing experiences like Tiktok and Reels, the click action is absent, resulting in unclear feedback from users, hence introducing noises in modeling training. Existing approaches on de-noising recommendation mainly focus on positive instances while ignoring the noise in a large amount of sampled negative feedback. In this paper, we propose a meta-learning method to annotate the unlabeled data from loss and gradient perspectives, which considers the noises in both positive and negative instances. Specifically, we first propose an Inverse Dual Loss (IDL) to boost the true label learning and prevent the false label learning. Then we further propose an Inverse Gradient (IG) method to explore the correct updating gradient and adjust the updating based on meta-learning. Finally, we conduct extensive experiments on both benchmark and industrial datasets where our proposed method can significantly improve AUC by 9.25% against state-of-the-art methods. Further analysis verifies the proposed inverse learning framework is model-agnostic and can improve a variety of recommendation backbones. The source code, along with the best hyper-parameter settings, is available at this link: https://github.com/Guanyu-Lin/InverseLearning. 现代个性化推荐服务通常依赖于用户的反馈,无论是显性的还是隐性的,以提高服务质量。显式反馈指的是评分等行为,而隐式反馈指的是用户点击等行为。然而,在像 Tiktok 和 Reels 这样的全屏视频观看体验的场景中,没有点击动作,导致用户的反馈不清晰,因此在建模训练中引入了噪音。现有的去噪推荐方法主要集中在正向实例上,而忽略了大量采样负反馈中的噪声。本文提出了一种从损失和梯度角度对未标记数据进行标注的元学习方法,该方法同时考虑了正负两种情况下的噪声。具体地说,我们首先提出了一种逆双损耗(IDL)算法来提高真标记学习的能力,防止虚标记学习。然后进一步提出了一种基于元学习的逆梯度(IG)方法来探索正确的更新梯度,并对更新进行调整。最后,我们在基准和工业数据集上进行了广泛的实验,其中我们提出的方法与最先进的方法相比,AUC 可以显著提高9.25% 。进一步的分析验证了所提出的逆向学习框架是模型不可知的,可以改善各种推荐骨干。源代码,以及最好的超参数设置,可在以下连结找到: https://github.com/guanyu-lin/inverselearning。 code 0
Pre-trained Recommender Systems: A Causal Debiasing Perspective Ziqian Lin, Hao Ding, Trong Nghia Hoang, Branislav Kveton, Anoop Deoras, Hao Wang Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data). However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named PreRec. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios. 最近关于预先训练的视觉/语言模型的研究已经证明了人工智能中新的,有希望的解决方案构建范式的实际益处,其中模型可以在描述通用任务空间的广泛数据上预先训练,然后成功地适应于解决广泛的下游任务,即使训练数据受到严重限制(例如,在零或少数镜头学习场景中)。受到这些进展的启发,本文研究了将这种模式应用到推荐系统中的可能性和挑战,而从预训练模型的角度对这种可能性和挑战的研究较少。具体而言,我们建议开发一个通用推荐器,通过对从不同领域提取的通用用户项交互数据进行训练来捕获通用交互模式,然后可以快速适应以改善在未见的新领域(具有有限的数据)中的少镜头学习性能。然而,与在语义空间中共享强烈一致性的视觉/语言数据不同,跨不同领域(例如,不同国家或不同电子商务平台)收集的推荐数据的普遍模式通常被隐含地由其用户和项目基础的文化差异所施加的领域内和跨领域偏见所遮蔽,以及他们对不同电子商务平台的使用。正如我们的实验所显示的,这种异质偏差的数据往往会阻碍预训练模型的有效性。为了应对这一挑战,我们进一步引入并形式化了一个因果消偏的观点,这是通过一个分层贝叶斯深度学习模型,命名为 PreRec。我们对实际数据的实证研究表明,在跨市场和跨平台的情况下,该模型可以显著提高零镜头和少镜头学习环境下的推荐性能。 code 0
Interact with the Explanations: Causal Debiased Explainable Recommendation System Xu Liu, Tong Yu, Kaige Xie, Junda Wu, Shuai Li NYU, New York, NY USA; Adobe Res, San Jose, CA USA; Georgia Inst Technol, Atlanta, GA 30332 USA; Shanghai Jiao Tong Univ, Shanghai, Peoples R China In recent years, the field of recommendation systems has witnessed significant advancements, with explainable recommendation systems gaining prominence as a crucial area of research. These systems aim to enhance user experience by providing transparent and compelling recommendations, accompanied by explanations. However, a persistent challenge lies in addressing biases that can influence the recommendations and explanations offered by these systems. Such biases often stem from a tendency to favor popular items and generate explanations that highlight their common attributes, thereby deviating from the objective of delivering personalized recommendations and explanations. While existing debiasing methods have been applied in explainable recommendation systems, they often overlook the model-generated explanations in tackling biases. Consequently, biases in model-generated explanations may persist, potentially compromising system performance and user satisfaction. To address biases in both model-generated explanations and recommended items, we discern the impact of model-generated explanations in recommendation through a formulated causal graph. Inspired by this causal perspective, we propose a novel approach termed Causal Explainable Recommendation System (CERS), which incorporates model-generated explanations into the debiasing process and enacts causal interventions based on user feedback on the explanations. By utilizing model-generated explanations as intermediaries between user-item interactions and recommendation results, we adeptly mitigate the biases via targeted causal interventions. Experimental results demonstrate the efficacy of CERS in reducing popularity bias while simultaneously improving recommendation performance, leading to more personalized and tailored recommendations. Human evaluation further affirms that CERS 近年来,推荐系统领域取得了重大进展,可解释推荐系统作为一个关键的研究领域日益受到重视。这些系统旨在通过提供透明和令人信服的建议以及解释来提高用户体验。然而,一个持续的挑战在于解决可能影响这些系统提供的建议和解释的偏见。这种偏见往往源于一种倾向,即偏爱流行项目,并产生突出其共同属性的解释,从而偏离提供个性化建议和解释的目标。虽然现有的去偏方法已经应用于可解释的推荐系统,但它们在处理偏差时往往忽略了模型产生的解释。因此,模型生成的解释中的偏差可能会持续存在,潜在地损害系统性能和用户满意度。为了解决模型生成的解释和推荐项目中的偏差,我们通过一个公式化的因果图来识别模型生成的解释在推荐中的影响。受到这种因果观点的启发,我们提出了一种称为因果可解释推荐系统(CERS)的新方法,其将模型生成的解释纳入去偏过程,并根据用户对解释的反馈制定因果干预措施。通过利用模型生成的解释作为用户项目交互和推荐结果之间的中介,我们通过有针对性的因果干预来巧妙地减轻偏差。实验结果表明,CERS 在减少流行偏差的同时,提高推荐性能,导致更个性化和量身定制的推荐的功效。人体评估进一步证实 CERS code 0
Proxy-based Item Representation for Attribute and Context-aware Recommendation Jinseok Seol, Minseok Gang, Sanggoo Lee, Jaehui Park Neural network approaches in recommender systems have shown remarkable success by representing a large set of items as a learnable vector embedding table. However, infrequent items may suffer from inadequate training opportunities, making it difficult to learn meaningful representations. We examine that in attribute and context-aware settings, the poorly learned embeddings of infrequent items impair the recommendation accuracy. To address such an issue, we propose a proxy-based item representation that allows each item to be expressed as a weighted sum of learnable proxy embeddings. Here, the proxy weight is determined by the attributes and context of each item and may incorporate bias terms in case of frequent items to further reflect collaborative signals. The proxy-based method calculates the item representations compositionally, ensuring each representation resides inside a well-trained simplex and, thus, acquires guaranteed quality. Additionally, that the proxy embeddings are shared across all items allows the infrequent items to borrow training signals of frequent items in a unified model structure and end-to-end manner. Our proposed method is a plug-and-play model that can replace the item encoding layer of any neural network-based recommendation model, while consistently improving the recommendation performance with much smaller parameter usage. Experiments conducted on real-world recommendation benchmark datasets demonstrate that our proposed model outperforms state-of-the-art models in terms of recommendation accuracy by up to 17% while using only 10% of the parameters. 推荐系统中的神经网络方法通过将大量的项目表示为一个可学习的向量嵌入表,取得了显著的成功。然而,不经常学习的项目可能会受到培训机会不足的影响,从而难以学习有意义的表征。我们研究了在属性和上下文感知的设置中,不常见项目的嵌入会影响推荐的准确性。为了解决这个问题,我们提出了一种基于代理的项表示方法,该方法允许每个项表示为可学习代理嵌入的加权和。在这里,代理权重是由每个项目的属性和上下文决定的,并且在频繁项目的情况下可以加入偏倚项,以进一步反映协作信号。基于代理的方法组合计算项表示,确保每个表示驻留在一个训练有素的单纯形内,从而获得有保证的质量。此外,代理嵌入在所有项目之间共享,允许频繁项目以统一的模型结构和端到端的方式借用频繁项目的训练信号。我们提出的方法是即插即用模型,可以取代任何基于神经网络的推荐模型的项目编码层,同时以更小的参数使用率持续改善推荐性能。在真实世界的推荐基准数据集上进行的实验表明,我们提出的模型在推荐准确率方面比最先进的模型高出17% ,而只使用了10% 的参数。 code 0
LEAD: Liberal Feature-based Distillation for Dense Retrieval Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, Jingwen Lu, Yan Zhang, Linjun Yang, Rangan Majumder, Nan Duan Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model. Traditional knowledge distillation methods include response-based methods and feature-based methods. Response-based methods are used the most widely but suffer from lower upper limit of model performance, while feature-based methods have constraints on the vocabularies and tokenizers. In this paper, we propose a tokenizer-free method liberal feature-based distillation (LEAD). LEAD aligns the distribution between teacher model and student model, which is effective, extendable, portable and has no requirements on vocabularies, tokenizer, or model architecture. Extensive experiments show the effectiveness of LEAD on several widely-used benchmarks, including MS MARCO Passage, TREC Passage 19, TREC Passage 20, MS MARCO Document, TREC Document 19 and TREC Document 20. 知识提取经常被用来将知识从一个强教师模型转移到一个相对弱的学生模型。传统的知识提取方法包括基于响应的方法和基于特征的方法。基于响应的方法应用最广泛,但模型性能的上限较低,而基于特征的方法对词汇和标记有限制。本文提出了一种基于特征的自由精馏算法(LEAD)。LEAD 调整了教师模型和学生模型之间的分布,它是有效的、可扩展的、可移植的,并且对词汇表、标记器或模型架构没有要求。广泛的实验显示了 LEAD 在几个广泛使用的基准上的有效性,包括 MS MARCO Passage,TREC Passage 19,TREC Passage 20,MS MARCO Document,TREC Document 19和 TREC Document 20。 code 0
Not All Negatives Are Worth Attending to: Meta-Bootstrapping Negative Sampling Framework for Link Prediction Yakun Wang, Binbin Hu, Shuo Yang, Meiqi Zhu, Zhiqiang Zhang, Qiyang Zhang, Jun Zhou, Guo Ye, Huimei He Ant Grp, Hangzhou, Peoples R China The rapid development of graph neural networks (GNNs) encourages the rising of link prediction, achieving promising performance with various applications. Unfortunately, through a comprehensive analysis, we surprisingly find that current link predictors with dynamic negative samplers (DNSs) suffer from the migration phenomenon between "easy" and "hard" samples, which goes against the preference of DNS of choosing "hard" negatives, thus severely hindering capability. Towards this end, we propose the MeBNS framework, serving as a general plugin that can potentially improve current negative sampling based link predictors. In particular, we elaborately devise a Meta-learning Supported Teacher-student GNN (MST-GNN) that is not only built upon teacher-student architecture for alleviating the migration between "easy" and "hard" samples but also equipped with a meta learning based sample reweighting module for helping the student GNN distinguish "hard" samples in a fine-grained manner. To effectively guide the learning of MST-GNN, we prepare a Structure enhanced Training Data Generator (STD-Generator) and an Uncertainty based Meta Data Collector (UMD-Collector) for supporting the teacher and student GNN, respectively. Extensive experiments show that the MeBNS achieves remarkable performance across six link prediction benchmark datasets. code 0
Diff-MSR: A Diffusion Model Enhanced Paradigm for Cold-Start Multi-Scenario Recommendation Yuhao Wang, Ziru Liu, Yichao Wang, Xiangyu Zhao, Bo Chen, Huifeng Guo, Ruiming Tang Huawei Noahs Ark Lab, Hong Kong, Peoples R China; City Univ Hong Kong, Hong Kong, Peoples R China With the explosive growth of various commercial scenarios, there is an increasing number of studies on multi-scenario recommendation (MSR) which trains the recommender system with the data from multiple scenarios, aiming to improve the recommendation performance on all these scenarios synchronously. However, due to the large discrepancy in the number of interactions among domains, multi-scenario recommendation models usually suffer from insufficient learning and negative transfer especially on the cold-start scenarios, thus exacerbating the data sparsity issue. To fill this gap, in this work we propose a novel diffusion model enhanced paradigm tailored for the cold-start problem in multi-scenario recommendation in a data-driven generative manner. Specifically, based on all-domain data, we leverage the diffusion model with our newly designed variance schedule and the proposed classifier, which explicitly boosts the recommendation performance on the cold-start scenarios by exploiting the generated high-quality and informative embedding, leveraging the abundance of rich scenarios. Our experiments on Douban and Amazon datasets demonstrate two strengths of the proposed paradigm: (i) its effectiveness with a significant increase of 8.5% and 1% in accuracy on the two datasets, and (ii) its compatibility with various multi-scenario backbone models. The implementation code is available for easy reproduction(1,2). 随着各种商业场景的爆炸性增长,越来越多的研究开始关注多场景推荐(MSR) ,它利用来自多个场景的数据来训练推荐系统,旨在同步提高所有这些场景的推荐性能。然而,由于域间交互数量的巨大差异,多场景推荐模型通常存在学习不足和负迁移的问题,特别是在冷启动情景下,从而加剧了数据稀疏问题。为了填补这一空白,本文提出了一种新的扩散模型增强范式,以数据驱动的生成方式为多场景推荐中的冷启动问题量身定制。具体而言,基于全域数据,我们利用扩散模型和我们新设计的方差计划和提议的分类器,通过利用生成的高质量和信息嵌入,利用丰富的场景,显式提高冷启动场景的推荐性能。我们在豆瓣和亚马逊数据集上的实验证明了所提出的范式的两个优势: (i)其有效性,在两个数据集上的准确性显着提高8.5% 和1% ,以及(ii)其与各种多场景骨干模型的兼容性。该实现代码可用于方便复制(1,2)。 code 0
On the Effectiveness of Unlearning in Session-Based Recommendation Xin Xin, Liu Yang, Ziqi Zhao, Pengjie Ren, Zhumin Chen, Jun Ma, Zhaochun Ren Session-based recommendation predicts users' future interests from previous interactions in a session. Despite the memorizing of historical samples, the request of unlearning, i.e., to remove the effect of certain training samples, also occurs for reasons such as user privacy or model fidelity. However, existing studies on unlearning are not tailored for the session-based recommendation. On the one hand, these approaches cannot achieve satisfying unlearning effects due to the collaborative correlations and sequential connections between the unlearning item and the remaining items in the session. On the other hand, seldom work has conducted the research to verify the unlearning effectiveness in the session-based recommendation scenario. In this paper, we propose SRU, a session-based recommendation unlearning framework, which enables high unlearning efficiency, accurate recommendation performance, and improved unlearning effectiveness in session-based recommendation. Specifically, we first partition the training sessions into separate sub-models according to the similarity across the sessions, then we utilize an attention-based aggregation layer to fuse the hidden states according to the correlations between the session and the centroid of the data in the sub-model. To improve the unlearning effectiveness, we further propose three extra data deletion strategies, including collaborative extra deletion (CED), neighbor extra deletion (NED), and random extra deletion (RED). Besides, we propose an evaluation metric that measures whether the unlearning sample can be inferred after the data deletion to verify the unlearning effectiveness. We implement SRU with three representative session-based recommendation models and conduct experiments on three benchmark datasets. Experimental results demonstrate the effectiveness of our methods. 基于会话的推荐可以根据会话中以前的交互预测用户的未来兴趣。除了对历史样本的记忆,忘记的要求,即去除某些训练样本的影响,也出于用户隐私或模型保真等原因。然而,现有的关于忘却的研究并不适合基于会话的建议。一方面,这些方法不能达到令人满意的忘却效果,因为在忘却项目和会话中剩余项目之间存在协作关联和顺序关联。另一方面,在基于会话的推荐场景中,很少有研究验证学习效果。本文提出了一种基于会话的推荐去学习框架 SRU,该框架具有较高的去学习效率、较准确的推荐性能和较高的去学习效率。具体来说,我们首先根据训练会话间的相似性将训练会话划分为不同的子模型,然后利用基于注意力的聚合层根据会话与子模型中数据质心的相关性对隐藏状态进行融合。为了提高学习效率,我们进一步提出了三种额外的数据删除策略,包括协作额外删除(CED)、邻居额外删除(NED)和随机额外删除(RED)。此外,我们提出一个评估指标,测量在删除数据后是否可以推断出忘却样本,以验证忘却的有效性。我们使用三个具有代表性的基于会话的推荐模型实现 SRU,并在三个基准数据集上进行实验。实验结果证明了该方法的有效性。 code 0
IAI MovieBot 2.0: An Enhanced Research Platform with Trainable Neural Components and Transparent User Modeling Nolwenn Bernard, Ivica Kostric, Krisztian Balog While interest in conversational recommender systems has been on the rise, operational systems suitable for serving as research platforms for comprehensive studies are currently lacking. This paper introduces an enhanced version of the IAI MovieBot conversational movie recommender system, aiming to evolve it into a robust and adaptable platform for conducting user-facing experiments. The key highlights of this enhancement include the addition of trainable neural components for natural language understanding and dialogue policy, transparent and explainable modeling of user preferences, along with improvements in the user interface and research infrastructure. 虽然对会话推荐系统的兴趣一直在上升,但目前还缺乏适合作为综合研究的研究平台的操作系统。本文介绍了 IAI MovieBot 会话电影推荐系统的一个增强版本,旨在将其发展成一个健壮的、可适应的平台,用于进行面向用户的实验。这一增强的主要亮点包括增加了可训练的神经组件,用于自然语言理解和对话政策,对用户偏好进行透明和可解释的建模,以及改进用户界面和研究基础设施。 code 0
Domain Level Interpretability: Interpreting Black-box Model with Domain-specific Embedding YaLin Zhang, Caizhi Tang, Lu Yu, Jun Zhou, Longfei Li, Qing Cui, Fangfang Fan, Linbo Jiang, Xiaosong Zhao Ant Grp, Hangzhou, Peoples R China The importance of incorporating interpretability into machine learning models has been increasingly emphasized. While previous literature has typically focused on feature level interpretability, such as analyzing which features are important and how they influence the final decision, real-world applications often require domain level interpretability, which relates to a group of features. Domain-level interpretability holds the potential for enhanced informativeness and comprehensibility. Unfortunately, there has been limited research in this direction. In this paper, we address this issue and introduce our proposed method DIDE, which obtains domain-level interpretability from domain-specific latent embeddings. To enhance the effectiveness of the framework, we draw inspiration from the gradient smooth philosophy and propose noisy injection in the embedding space, resulting in smoothed interpretability. We conduct extensive experiments to validate the effectiveness of DIDE, and demonstrate its applications in assisting daily business tasks in Alipay(1). 将可解释性纳入机器学习模型的重要性日益受到强调。虽然以前的文献主要集中在特征级别的可解释性上,比如分析哪些特征是重要的,以及它们如何影响最终决策,但是现实世界中的应用通常需要领域级别的可解释性,这涉及到一组特征。领域级的可解释性具有增强信息性和可理解性的潜力。不幸的是,这方面的研究有限。在本文中,我们解决了这个问题,并介绍了我们提出的方法 DIDE,它从领域特定的潜在嵌入获得领域级的可解释性。为了提高该框架的有效性,我们从梯度平滑理论中获得灵感,并在嵌入空间中提出噪声注入,从而实现平滑的可解释性。我们进行了广泛的实验,以验证 DIDE 的有效性,并演示其在支付宝(1)中协助日常业务任务的应用。 code 0
Unbiased Learning to Rank: On Recent Advances and Practical Applications Shashank Gupta, Philipp Hager, Jin Huang, Ali Vardasbi, Harrie Oosterhuis Univ Amsterdam, Amsterdam, Netherlands; Radboud Univ Nijmegen, Nijmegen, Netherlands Since its inception, the field of unbiased learning to rank (ULTR) has remained very active and has seen several impactful advancements in recent years. This tutorial provides both an introduction to the core concepts of the field and an overview of recent advancements in its foundations, along with several applications of its methods. The tutorial is divided into four parts: Firstly, we give an overview of the different forms of bias that can be addressed with ULTR methods. Secondly, we present a comprehensive discussion of the latest estimation techniques in the ULTR field. Thirdly, we survey published results of ULTR in real-world applications. Fourthly, we discuss the connection between ULTR and fairness in ranking. We end by briefly reflecting on the future of ULTR research and its applications. This tutorial is intended to benefit both researchers and industry practitioners interested in developing new ULTR solutions or utilizing them in real-world applications. code 0
Leveraging User Simulation to Develop and Evaluate Conversational Information Access Agents Nolwenn Bernard We observe a change in the way users access information, that is, the rise of conversational information access (CIA) agents. However, the automatic evaluation of these agents remains an open challenge. Moreover, the training of CIA agents is cumbersome as it mostly relies on conversational corpora, expert knowledge, and reinforcement learning. User simulation has been identified as a promising solution to tackle automatic evaluation and has been previously used in reinforcement learning. In this research, we investigate how user simulation can be leveraged in the context of CIA. We organize the work in three parts. We begin with the identification of requirements for user simulators for training and evaluating CIA agents and compare existing types of simulator regarding these. Then, we plan to combine these different types of simulators into a new hybrid simulator. Finally, we aim to extend simulators to handle more complex information seeking scenarios. 我们观察到用户访问信息的方式发生了变化,即会话信息访问(CIA)代理的兴起。然而,这些代理的自动评估仍然是一个开放的挑战。此外,中情局特工的培训非常繁琐,因为它主要依赖于会话语料库、专业知识和强化学习。用户模拟已被确定为解决自动评估的一个有前途的解决方案,并且以前在强化学习中使用过。在这项研究中,我们调查如何用户模拟可以在中央情报局的背景下利用。我们把工作分为三部分。我们首先确定用户模拟器的培训和评估 CIA 代理的需求,并比较现有的模拟器类型。然后,我们计划将这些不同类型的模拟器组合成一个新的混合模拟器。最后,我们的目标是扩展模拟器以处理更复杂的信息搜索场景。 code 0
Delphic Costs and Benefits in Web Search: A Utilitarian and Historical Analysis Andrei Z. Broder We present a new framework to conceptualize and operationalize the total user experience of search, by studying the entirety of a search journey from an utilitarian point of view. Web search engines are widely perceived as "free". But search requires time and effort: in reality there are many intermingled non-monetary costs (e.g. time costs, cognitive costs, interactivity costs) and the benefits may be marred by various impairments, such as misunderstanding and misinformation. This characterization of costs and benefits appears to be inherent to the human search for information within the pursuit of some larger task: most of the costs and impairments can be identified in interactions with any web search engine, interactions with public libraries, and even in interactions with ancient oracles. To emphasize this innate connection, we call these costs and benefits Delphic, in contrast to explicitly financial costs and benefits. Our main thesis is that the users' satisfaction with a search engine mostly depends on their experience of Delphic cost and benefits, in other words on their utility. The consumer utility is correlated with classic measures of search engine quality, such as ranking, precision, recall, etc., but is not completely determined by them. To argue our thesis, we catalog the Delphic costs and benefits and show how the development of search engines over the last quarter century, from classic Information Retrieval roots to the integration of Large Language Models, was driven to a great extent by the quest of decreasing Delphic costs and increasing Delphic benefits. We hope that the Delphic costs framework will engender new ideas and new research for evaluating and improving the web experience for everyone. 我们提出了一个新的框架来概念化和操作的总体用户体验的搜索,通过研究整个搜索旅程从功利的角度来看。网络搜索引擎被广泛认为是“免费的”。但是搜索需要时间和精力: 在现实中有许多混合的非货币成本(例如时间成本、认知成本、互动成本) ,好处可能被各种损害所破坏,例如误解和错误信息。这种成本和收益的角色塑造似乎是人类在追求某种更大的任务时所固有的信息搜索: 大多数成本和损害可以通过与任何网络搜索引擎的互动、与公共图书馆的互动、甚至与古代神谕的互动来识别。为了强调这种内在的联系,我们把这些成本和收益称为德尔菲,与明确的财务成本和收益形成对比。我们的主要论点是,用户对搜索引擎的满意度主要取决于他们对德尔菲成本和收益的体验,换句话说,取决于他们的实用性。消费者效用与搜索引擎质量的经典指标相关,如排名、精确度、召回等,但并不完全由它们决定。为了证明我们的论点,我们列出了德尔菲的成本和收益,并展示了在过去25年里搜索引擎的发展,从传统的信息检索到大型语言模型的整合,在很大程度上是由降低德尔菲成本和增加德尔菲收益的追求所驱动的。我们希望德尔菲成本框架将产生新的想法和新的研究,以评估和改善每个人的网络体验。 code 0
The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG) Xin Luna Dong code 0
LabelCraft: Empowering Short Video Recommendations with Automated Label Crafting Yimeng Bai, Yang Zhang, Jing Lu, Jianxin Chang, Xiaoxue Zang, Yanan Niu, Yang Song, Fuli Feng Short video recommendations often face limitations due to the quality of user feedback, which may not accurately depict user interests. To tackle this challenge, a new task has emerged: generating more dependable labels from original feedback. Existing label generation methods rely on manual rules, demanding substantial human effort and potentially misaligning with the desired objectives of the platform. To transcend these constraints, we introduce LabelCraft, a novel automated label generation method explicitly optimizing pivotal operational metrics for platform success. By formulating label generation as a higher-level optimization problem above recommender model optimization, LabelCraft introduces a trainable labeling model for automatic label mechanism modeling. Through meta-learning techniques, LabelCraft effectively addresses the bi-level optimization hurdle posed by the recommender and labeling models, enabling the automatic acquisition of intricate label generation mechanisms.Extensive experiments on real-world datasets corroborate LabelCraft's excellence across varied operational metrics, encompassing usage time, user engagement, and retention. Codes are available at https://github.com/baiyimeng/LabelCraft. 由于用户反馈的质量问题,短视频推荐往往面临着限制,这可能不能准确地描述用户的兴趣。为了应对这一挑战,一项新的任务出现了: 从原始反馈中生成更可靠的标签。现有的标签生成方法依赖于手工规则,需要大量的人工努力,并且可能与平台的期望目标不一致。为了超越这些约束,我们引入了 LabelCraft,一种新的自动标签生成方法,它显式地优化了平台成功的关键操作指标。通过将标签生成作为推荐模型优化之上的一个更高层次的最佳化问题,LabelCraft 为自动标签机制建模引入了一个可训练的标签模型。通过元学习技术,LabelCraft 有效地解决了推荐和标签模型造成的双层优化障碍,使得复杂的标签生成机制的自动获取成为可能。在真实世界数据集上的大量实验证实了 LabelCraft 在各种操作指标上的卓越性,包括使用时间、用户参与度和保持性。密码可在 https://github.com/baiyimeng/labelcraft 索取。 code 0
Towards Mitigating Dimensional Collapse of Representations in Collaborative Filtering Huiyuan Chen, Vivian Lai, Hongye Jin, Zhimeng Jiang, Mahashweta Das, Xia Hu Contrastive Learning (CL) has shown promising performance in collaborative filtering. The key idea is to generate augmentation-invariant embeddings by maximizing the Mutual Information between different augmented views of the same instance. However, we empirically observe that existing CL models suffer from the dimensional collapse issue, where user/item embeddings only span a low-dimension subspace of the entire feature space. This suppresses other dimensional information and weakens the distinguishability of embeddings. Here we propose a non-contrastive learning objective, named nCL, which explicitly mitigates dimensional collapse of representations in collaborative filtering. Our nCL aims to achieve geometric properties of Alignment and Compactness on the embedding space. In particular, the alignment tries to push together representations of positive-related user-item pairs, while compactness tends to find the optimal coding length of user/item embeddings, subject to a given distortion. More importantly, our nCL does not require data augmentation nor negative sampling during training, making it scalable to large datasets. Experimental results demonstrate the superiority of our nCL. 对比学习(CL)在协同过滤方面表现出色。其核心思想是通过最大化同一实例中不同增强视图之间的互信息来产生增强不变嵌入。然而,我们经验地观察到现有的 CL 模型遭受维度折叠问题,其中用户/项目嵌入只跨越整个特征空间的一个低维子空间。这抑制了其他维度信息,削弱了嵌入的可区分性。在这里,我们提出了一个非对比学习目标,命名为 nCL,它明确地减轻了协同过滤表示的维度崩溃。我们的 nCL 旨在实现嵌入空间上的对齐性和紧性的几何性质。特别是,对齐试图将正相关的用户项对的表示推到一起,而紧凑性倾向于找到用户/项嵌入的最佳编码长度,受到给定的失真。更重要的是,我们的 nCL 在训练期间不需要数据增强或负采样,使其可扩展到大型数据集。实验结果证明了该方法的优越性。 code 0
CL4DIV: A Contrastive Learning Framework for Search Result Diversification Zhirui Deng, Zhicheng Dou, Yutao Zhu, JiRong Wen Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China Search result diversification aims to provide a diversified document ranking list so as to cover as many intents as possible and satisfy the various information needs of different users. Existing approaches usually represented documents by pretrained embeddings (such as doc2vec and Glove). These document representations cannot adequately represent the document's content and are hard to capture the intrinsic user's intent coverage of the given query. Moreover, the limited number of labeled data for search result diversification exacerbates the difficulty of obtaining more efficient document representations. To alleviate these problems and learn more effective document representations, we propose a Contrastive Learning framework for search result DIVersification (CL4DIV). Specifically, we design three contrastive learning tasks from the perspective of subtopics, documents, and candidate document sequences, which correspond to three essential elements in search result diversification. These training tasks are employed to pretrain the document encoder and the document sequence encoder, which are used in the diversified ranking model. Experimental results show that CL4DIV significantly outperforms all existing diversification models. Further analysis demonstrates that our method has wide applicability and can also be used to improve several existing methods. code 0
From Second to First: Mixed Censored Multi-Task Learning for Winning Price Prediction Jiani Huang, Zhenzhe Zheng, Yanrong Kang, Zixiao Wang Shanghai Jiao Tong Univ, Shanghai, Peoples R China; Tencent, Advertising & Mkt Serv, Shenzhen, Peoples R China A transformation from second-price auctions (SPA) to first-price auctions (FPA) has been observed in online advertising. The consequential coexistence of mixed FPA and SPA auction types has further led to the problem of mixed censorship, making bid landscape forecasting, the prerequisite for bid shading, more difficult. Our key insight is that the winning price under SPA can be effectively transferred to FPA scenarios if they share similar user groups, advertisers, and bidding environments. The full utilization of winning price under mixed censorship can effectively alleviate the FPA censorship problem and improve the performance of winning price prediction (also called as bid landscape forecasting). In this work, we propose aMulti-taskMixed Censorship Predictor (MMCP) that utilizes multi-task learning to leverage the winning price under SPA as supervised information for FPA. A Double-gate Mixture-of-Experts architecture has been proposed to alleviate the negative transfer problem of multi-task learning in our context. Furthermore, several auxiliary modules including the first-second mapping module and adaptive censorship loss function have been introduced to integrate multi-task learning and winning price prediction. Extensive experiments on two real-world datasets demonstrate the superior performance of the proposed MMCP compared with other state-of-the-art FPA models under various performance metrics. The implementation of the code is available on github(1). code 0
DiffKG: Knowledge Graph Diffusion Model for Recommendation Yangqin Jiang, Yuhao Yang, Lianghao Xia, Chao Huang Knowledge Graphs (KGs) have emerged as invaluable resources for enriching recommendation systems by providing a wealth of factual information and capturing semantic relationships among items. Leveraging KGs can significantly enhance recommendation performance. However, not all relations within a KG are equally relevant or beneficial for the target recommendation task. In fact, certain item-entity connections may introduce noise or lack informative value, thus potentially misleading our understanding of user preferences. To bridge this research gap, we propose a novel knowledge graph diffusion model for recommendation, referred to as DiffKG. Our framework integrates a generative diffusion model with a data augmentation paradigm, enabling robust knowledge graph representation learning. This integration facilitates a better alignment between knowledge-aware item semantics and collaborative relation modeling. Moreover, we introduce a collaborative knowledge graph convolution mechanism that incorporates collaborative signals reflecting user-item interaction patterns, guiding the knowledge graph diffusion process. We conduct extensive experiments on three publicly available datasets, consistently demonstrating the superiority of our DiffKG compared to various competitive baselines. We provide the source code repository of our proposed DiffKG model at the following link: https://github.com/HKUDS/DiffKG. 知识图(KGs)已经成为丰富推荐系统的宝贵资源,它提供了大量的事实信息,捕获了项目之间的语义关系。利用幼稚园可显著提升推荐表现。然而,并非所有幼儿园内部的关系对于目标推荐任务都同样相关或有益。事实上,某些项目-实体连接可能会引入噪音或缺乏信息价值,从而可能误导我们对用户偏好的理解。为了弥补这一研究差距,我们提出了一种新的知识图扩散推荐模型,称为迪夫 KG。我们的框架集成了一个生成扩散模型和一个数据增强范例,支持健壮的知识图表示学习。这种集成促进了知识感知项语义和协作关系建模之间的更好结合。此外,本文还引入了一种协同知识图卷积机制,该机制融合了反映用户-项目交互模式的协同信号,引导知识图的扩散过程。我们在三个公开可用的数据集上进行了广泛的实验,一致地证明了我们的 DiffKG 相对于各种竞争基线的优越性。我们在以下连结提供有关「区分幼稚园」模式的原始码储存库: https://github.com/hkuds/DiffKG。 code 0
Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation Magdalena Kaiser, Rishiraj Saha Roy, Gerhard Weikum Models for conversational question answering (ConvQA) over knowledge graphs (KGs) are usually trained and tested on benchmarks of gold QA pairs. This implies that training is limited to surface forms seen in the respective datasets, and evaluation is on a small set of held-out questions. Through our proposed framework REIGN, we take several steps to remedy this restricted learning setup. First, we systematically generate reformulations of training questions to increase robustness of models to surface form variations. This is a particularly challenging problem, given the incomplete nature of such questions. Second, we guide ConvQA models towards higher performance by feeding it only those reformulations that help improve their answering quality, using deep reinforcement learning. Third, we demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another. Finally, for a rigorous evaluation of robustness for trained models, we use and release large numbers of diverse reformulations generated by prompting GPT for benchmark test sets (resulting in 20x increase in sizes). Our findings show that ConvQA models with robust training via reformulations, significantly outperform those with standard training from gold QA pairs only. 基于知识图的会话问题回答模型通常在黄金问题回答对的基准上进行训练和测试。这意味着训练仅限于在各自的数据集中看到的表面形式,并且评估是针对一小组被拒绝的问题。通过我们提出的框架 REIGN,我们采取了几个步骤来补救这种受限制的学习设置。首先,我们系统地生成训练问题的重新编排,以增加模型对表面形式变化的鲁棒性。鉴于这些问题的不完整性,这是一个特别具有挑战性的问题。其次,我们使用深度强化学习,通过只提供那些有助于提高回答质量的重新编排来引导 convQA 模型获得更高的性能。第三,我们证明了在一个基准上训练主要模型组件并将它们应用到另一个基准上的可行性。最后,为了严格评估训练过的模型的鲁棒性,我们使用并发布了大量不同的重新编排,这些重新编排是通过提示 GPT 测试基准测试集(导致大小增加20倍)而产生的。我们的研究结果表明,使用强大的训练通过重新制定,严格质量保证模型,明显优于那些标准的训练,从黄金质量保证对。 code 0
MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation Yungi Kim, Taeri Kim, WonYong Shin, SangWook Kim In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well as user-item interactions are employed together. Our study aims to exploit multimodal features more effectively in order to accurately capture users' preferences for items. To this end, we point out following two limitations of existing GCN-based multimedia recommender systems: (L1) although multimodal features of interacted items by a user can reveal her preferences on items, existing methods utilize GCN designed to focus only on capturing collaborative signals, resulting in insufficient reflection of the multimodal features in the final user/item embeddings; (L2) although a user decides whether to prefer the target item by considering its multimodal features, existing methods represent her as only a single embedding regardless of the target item's multimodal features and then utilize her embedding to predict her preference for the target item. To address the above issues, we propose a novel multimedia recommender system, named MONET, composed of following two core ideas: modality-embracing GCN (MeGCN) and target-aware attention. Through extensive experiments using four real-world datasets, we demonstrate i) the significant superiority of MONET over seven state-of-the-art competitors (up to 30.32% higher accuracy in terms of recall@20, compared to the best competitor) and ii) the effectiveness of the two core ideas in MONET. All MONET codes are available at https://github.com/Kimyungi/MONET. 本文主要研究基于图卷积网络(GCNs)的多媒体推荐系统。我们的研究旨在更有效地利用多模态特征,以便准确地捕捉用户对项目的偏好。为此,我们指出了现有的基于 GCN 的多媒体推荐系统的两个局限性: (L1)尽管用户交互项目的多模态特征可以揭示她对项目的偏好,但是现有的方法利用 GCN 设计只关注于捕获协作信号,导致在最终用户/项目嵌入中对多模态特征的反映不足; (L2)尽管用户决定是否通过考虑其多模态特征来选择目标项目,但是现有的方法表示她只是一个单一的嵌入,而不管目标项目的多模态特征如何,然后利用她的嵌入来预测她对目标。为了解决上述问题,我们提出了一个新的多媒体推荐系统,命名为 MONET,它由以下两个核心思想组成: 包含模式的广域网(MeGCN)和目标感知注意。通过使用四个真实世界数据集的广泛实验,我们证明了 i) MONET 相对于七个最先进的竞争对手的显着优势(与最好的竞争对手相比,在召回@20方面高达30.32% 的准确性)和 ii) MONET 中两个核心思想的有效性。所有 MONET 代码均可在 https://github.com/kimyungi/MONET 下载。 code 0
Text-Video Retrieval via Multi-Modal Hypergraph Networks Qian Li, Lixin Su, Jiashu Zhao, Long Xia, Hengyi Cai, Suqi Cheng, Hengzhu Tang, Junfeng Wang, Dawei Yin Text-video retrieval is a challenging task that aims to identify relevant videos given textual queries. Compared to conventional textual retrieval, the main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content. Previous works primarily focus on aligning the query and the video by finely aggregating word-frame matching signals. Inspired by the human cognitive process of modularly judging the relevance between text and video, the judgment needs high-order matching signal due to the consecutive and complex nature of video contents. In this paper, we propose chunk-level text-video matching, where the query chunks are extracted to describe a specific retrieval unit, and the video chunks are segmented into distinct clips from videos. We formulate the chunk-level matching as n-ary correlations modeling between words of the query and frames of the video and introduce a multi-modal hypergraph for n-ary correlation modeling. By representing textual units and video frames as nodes and using hyperedges to depict their relationships, a multi-modal hypergraph is constructed. In this way, the query and the video can be aligned in a high-order semantic space. In addition, to enhance the model's generalization ability, the extracted features are fed into a variational inference component for computation, obtaining the variational representation under the Gaussian distribution. The incorporation of hypergraphs and variational inference allows our model to capture complex, n-ary interactions among textual and visual contents. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on the text-video retrieval task. 文本视频检索是一项具有挑战性的任务,其目的是识别给定文本查询的相关视频。与传统的文本检索相比,文本-视频检索的主要障碍是查询的文本性质与视频内容的视觉丰富性之间的语义差距。以前的工作主要集中在对齐查询和视频通过精细聚合字帧匹配信号。由于视频内容的连续性和复杂性,受人类对文本与视频相关性进行模块化判断的认知过程的启发,需要高阶匹配信号来进行判断。本文提出了块级文本-视频匹配算法,该算法提取查询块来描述特定的检索单元,并将视频块从视频中分割成不同的片段。将块级匹配表述为查询词与视频帧之间的 n 元相关建模,并引入多模态超图进行 n 元相关建模。通过将文本单元和视频帧表示为节点,利用超边界描述它们之间的关系,构造了一个多模态超图。通过这种方式,查询和视频可以在高阶语义空间中对齐。此外,为了提高模型的泛化能力,提取的特征被输入到一个变分推理组件中进行计算,从而获得正态分布下的变分表示。超图和变分推理的结合使我们的模型能够捕获文本和视觉内容之间复杂的 n 元交互。实验结果表明,该方法在文本视频检索任务中取得了较好的性能。 code 0
MultiFS: Automated Multi-Scenario Feature Selection in Deep Recommender Systems Dugang Liu, Chaohua Yang, Xing Tang, Yejing Wang, Fuyuan Lyu, Weihong Luo, Xiuqiang He, Zhong Ming, Xiangyu Zhao Tencent, Shenzhen, Peoples R China; City Univ Hong Kong, Hong Kong, Peoples R China; Shenzhen Univ, Shenzhen, Peoples R China; Shenzhen Univ, Guangdong Lab Artificial Intelligence & Digital E, Shenzhen, Peoples R China; Shenzhen Technol Univ, Shenzhen, Peoples R China; McGill Univ, Montreal, PQ, Canada Multi-scenario recommender systems (MSRSs) have been increasingly used in real-world industrial platforms for their excellent advantages in mitigating data sparsity and reducing maintenance costs. However, conventional MSRSs usually use all relevant features indiscriminately and ignore that different kinds of features have varying importance under different scenarios, which may cause confusion and performance degradation. In addition, existing feature selection methods for deep recommender systems may lack the exploration of scenario relations. In this paper, we propose a novel automated multi-scenario feature selection (MultiFS) framework to bridge this gap, which is able to consider scenario relations and utilize a hierarchical gating mechanism to select features for each scenario. Specifically, MultiFS first efficiently obtains feature importance across all the scenarios through a scenario-shared gate. Then, some scenario-specific gate aims to identify feature importance to individual scenarios from a subset of the former with lower importance. Subsequently, MultiFS imposes constraints on the two gates to make the learning mechanism more feasible and combines the two to select exclusive features for different scenarios. We evaluate MultiFS and demonstrate its ability to enhance the multi-scenario model performance through experiments over two public multi-scenario benchmarks. 多场景推荐系统(MSRS)以其在减少数据稀疏性和降低维护成本方面的优越性越来越多地应用于现实世界的工业平台。然而,传统的 MSRS 通常不加区分地使用所有相关特征,而忽视了不同特征在不同场景下具有不同的重要性,这可能导致混淆和性能下降。此外,现有的深度推荐系统的特征选择方法可能缺乏对场景关系的探索。本文提出了一种新的自动多场景特征选择(MultiFS)框架来弥补这一缺陷,该框架能够考虑场景之间的关系,并利用层次化的门限机制为每个场景选择特征。具体来说,MultiFS 首先通过一个场景共享门有效地获得所有场景的特性重要性。然后,一些场景特定门的目的是识别特征重要性的个别场景从前者的子集较低的重要性。随后,MultiFS 对这两个门进行约束,使学习机制更加可行,并结合这两个门来选择不同场景的专有特征。我们通过两个公开的多场景基准测试,评估了 MultiFS 并证明了其增强多场景模型性能的能力。 code 0
ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models Qijiong Liu, Nuo Chen, Tetsuya Sakai, XiaoMing Wu Personalized content-based recommender systems have become indispensable tools for users to navigate through the vast amount of content available on platforms like daily news websites and book recommendation services. However, existing recommenders face significant challenges in understanding the content of items. Large language models (LLMs), which possess deep semantic comprehension and extensive knowledge from pretraining, have proven to be effective in various natural language processing tasks. In this study, we explore the potential of leveraging both open- and closed-source LLMs to enhance content-based recommendation. With open-source LLMs, we utilize their deep layers as content encoders, enriching the representation of content at the embedding level. For closed-source LLMs, we employ prompting techniques to enrich the training data at the token level. Through comprehensive experiments, we demonstrate the high effectiveness of both types of LLMs and show the synergistic relationship between them. Notably, we observed a significant relative improvement of up to 19.32% compared to existing state-of-the-art recommendation models. These findings highlight the immense potential of both open- and closed-source of LLMs in enhancing content-based recommendation systems. We will make our code and LLM-generated data available for other researchers to reproduce our results. 基于内容的个性化推荐系统已经成为用户浏览日常新闻网站和图书推荐服务等平台上大量内容不可或缺的工具。但是,现有的推荐程序在理解项目内容方面面临重大挑战。大语言模型(LLM)具有深刻的语义理解能力和广泛的预训知识,已被证明能够有效地处理各种自然语言处理任务。在这项研究中,我们探索了利用开源和闭源 LLM 来增强基于内容的推荐的潜力。使用开源 LLM,我们利用它们的深层作为内容编码器,丰富了内容在嵌入级别的表示。对于闭源 LLM,我们使用提示技术在令牌级别上丰富训练数据。通过综合实验,我们证明了这两类 LLM 的高效性,并显示了它们之间的协同关系。值得注意的是,与现有最先进的推荐模型相比,我们观察到了高达19.32% 的显著相对改善。这些发现突出了开放和封闭的 LLM 来源在加强基于内容的推荐系统方面的巨大潜力。我们将使我们的代码和 LLM 生成的数据可用于其他研究人员重现我们的结果。 code 0
Knowledge Graph Context-Enhanced Diversified Recommendation Xiaolong Liu, Liangwei Yang, Zhiwei Liu, Mingdai Yang, Chen Wang, Hao Peng, Philip S. Yu The field of Recommender Systems (RecSys) has been extensively studied to enhance accuracy by leveraging users' historical interactions. Nonetheless, this persistent pursuit of accuracy frequently engenders diminished diversity, culminating in the well-recognized "echo chamber" phenomenon. Diversified RecSys has emerged as a countermeasure, placing diversity on par with accuracy and garnering noteworthy attention from academic circles and industry practitioners. This research explores the realm of diversified RecSys within the intricate context of knowledge graphs (KG). These KGs act as repositories of interconnected information concerning entities and items, offering a propitious avenue to amplify recommendation diversity through the incorporation of insightful contextual information. Our contributions include introducing an innovative metric, Entity Coverage, and Relation Coverage, which effectively quantifies diversity within the KG domain. Additionally, we introduce the Diversified Embedding Learning (DEL) module, meticulously designed to formulate user representations that possess an innate awareness of diversity. In tandem with this, we introduce a novel technique named Conditional Alignment and Uniformity (CAU). It adeptly encodes KG item embeddings while preserving contextual integrity. Collectively, our contributions signify a substantial stride towards augmenting the panorama of recommendation diversity within the realm of KG-informed RecSys paradigms. 推荐系统(RecSys)领域已经被广泛研究,通过利用用户的历史交互来提高准确性。尽管如此,这种对准确性的持续追求常常导致多样性的减少,最终导致公认的“回声室”现象。多样化 RecSys 已经成为一种对策,它将多样性与准确性放在同等重要的位置,并引起了学术界和业内人士的关注。本研究探讨复杂的知识图表(KG)背景下多元化的 RecSys 领域。这些幼儿园充当有关实体和项目的相互关联信息的储存库,通过纳入有见地的背景信息,为扩大建议的多样性提供了一个有利的途径。我们的贡献包括引入一个创新的度量标准,实体覆盖率和关系覆盖率,它有效地量化了 KG 领域内的多样性。此外,我们介绍了多样化嵌入学习(DEL)模块,精心设计的用户表示,具有天生的多样性意识。与此同时,我们介绍了一种新的技术,称为条件对齐和一致性(CAU)。该算法在保持上下文完整性的前提下,对 KG 项嵌入进行编码。总的来说,我们的贡献意味着在 KG 知情的 RecSys 范式领域内,在增强推荐多样性的全景方面取得了实质性的进展。 code 0
SSLRec: A Self-Supervised Learning Framework for Recommendation Xubin Ren, Lianghao Xia, Yuhao Yang, Wei Wei, Tianle Wang, Xuheng Cai, Chao Huang Self-supervised learning (SSL) has gained significant interest in recent years as a solution to address the challenges posed by sparse and noisy data in recommender systems. Despite the growing number of SSL algorithms designed to provide state-of-the-art performance in various recommendation scenarios (e.g., graph collaborative filtering, sequential recommendation, social recommendation, KG-enhanced recommendation), there is still a lack of unified frameworks that integrate recommendation algorithms across different domains. Such a framework could serve as the cornerstone for self-supervised recommendation algorithms, unifying the validation of existing methods and driving the design of new ones. To address this gap, we introduce SSLRec, a novel benchmark platform that provides a standardized, flexible, and comprehensive framework for evaluating various SSL-enhanced recommenders. The SSLRec framework features a modular architecture that allows users to easily evaluate state-of-the-art models and a complete set of data augmentation and self-supervised toolkits to help create SSL recommendation models with specific needs. Furthermore, SSLRec simplifies the process of training and evaluating different recommendation models with consistent and fair settings. Our SSLRec platform covers a comprehensive set of state-of-the-art SSL-enhanced recommendation models across different scenarios, enabling researchers to evaluate these cutting-edge models and drive further innovation in the field. Our implemented SSLRec framework is available at the source code repository https://github.com/HKUDS/SSLRec. 自监督学习(SSL)作为一种解决推荐系统中数据稀疏和噪声问题的方法,近年来引起了人们的极大兴趣。尽管有越来越多的 SSL 算法被设计用来在各种推荐场景中提供最先进的性能(例如,图形协同过滤、顺序推荐、社交推荐、 KG 增强推荐) ,但是仍然缺乏统一的框架来整合不同领域的推荐算法。这样一个框架可以作为自监督推荐算法的基石,统一现有方法的验证,并推动新方法的设计。为了弥补这一差距,我们引入了 SSLRec,这是一个新颖的基准平台,它为评估各种 SSL 增强的推荐程序提供了一个标准化、灵活和全面的框架。SSLRec 框架采用模块化架构,允许用户方便地评估最先进的模型,并提供一套完整的数据增强和自我监督工具包,以帮助创建具有特定需求的 SSL 推荐模型。此外,SSLRec 通过一致和公平的设置简化了不同推荐模型的培训和评估过程。我们的 SSLRec 平台涵盖了不同场景下一整套最先进的 SSL 增强推荐模型,使研究人员能够评估这些尖端模型,并推动该领域的进一步创新。我们已实施的 SSlrec 架构可在原始码储存库 https://github.com/hkuds/SSLRec 找到。 code 0
Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation Shuyao Wang, Yongduo Sui, Jiancan Wu, Zhi Zheng, Hui Xiong In the realm of deep learning-based recommendation systems, the increasing computational demands, driven by the growing number of users and items, pose a significant challenge to practical deployment. This challenge is primarily twofold: reducing the model size while effectively learning user and item representations for efficient recommendations. Despite considerable advancements in model compression and architecture search, prevalent approaches face notable constraints. These include substantial additional computational costs from pre-training/re-training in model compression and an extensive search space in architecture design. Additionally, managing complexity and adhering to memory constraints is problematic, especially in scenarios with strict time or space limitations. Addressing these issues, this paper introduces a novel learning paradigm, Dynamic Sparse Learning (DSL), tailored for recommendation models. DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight's significance and the model's sparsity distribution during the training. This approach ensures a consistent and minimal parameter budget throughout the full learning lifecycle, paving the way for "end-to-end" efficiency from training to inference. Our extensive experimental results underline DSL's effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance. 在基于深度学习的推荐系统领域,由于用户和项目数量的不断增加,计算需求的不断增加对实际部署提出了严峻的挑战。这个挑战主要有两个方面: 减少模型的大小,同时有效地学习用户和项目表示以获得有效的推荐。尽管在模型压缩和体系结构搜索方面取得了相当大的进步,流行的方法仍然面临着显著的限制。其中包括模型压缩中的预训练/再训练带来的大量额外计算成本,以及体系结构设计中的大量搜索空间。此外,管理复杂性和遵守内存约束是有问题的,特别是在有严格时间或空间限制的场景中。针对这些问题,本文介绍了一种新的学习范式,动态稀疏学习(DSL) ,专为推荐模型。DSL 创新地从头开始训练一个轻量级稀疏模型,在训练过程中定期评估和动态调整每个权重的重要性和模型的稀疏分布。这种方法确保了整个学习生命周期中参数预算的一致性和最小化,为从训练到推理的“端到端”效率铺平了道路。我们广泛的实验结果强调了 DSL 的有效性,显著降低了培训和推理成本,同时提供了可比较的推荐性能。 code 0
Towards Better Chinese Spelling Check for Search Engines: A New Dataset and Strong Baseline Yue Wang, Zilong Zheng, Zecheng Tang, Juntao Li, Zhihui Liu, Kunlong Chen, Jinxiong Chang, Qishen Zhang, Zhongyi Liu, Min Zhang Ant Grp, Hangzhou, Peoples R China; Soochow Univ, Suzhou, Peoples R China; Ant Grp, Beijing, Peoples R China Misspellings in search engine queries may prevent search engines from returning accurate results. For Chinese mobile search engines, due to the different input methods (e.g., hand-written and T9 input methods), more types of misspellings exist, making this problem more challenging. As an essential module of search engines, Chinese Spelling Check (CSC) models aim to detect and correct misspelled Chinese characters from user-issued queries. Despite the great value of CSC to the search engine, there is no CSC benchmark collected from real-world search engine queries. To fill this blank, we construct and release the Alipay Search Engine Query (AlipaySEQ) spelling check dataset. To the best of our knowledge, AlipaySEQ is the first Chinese Spelling Check dataset collected from the realworld scenario of Chinese mobile search engines. It consists of 15,522 high-quality human annotated and 1,175,151 automatically generated samples. To demonstrate the unique challenges of AlipaySEQ in the era of Large Language Models (LLMs), we conduct a thorough study to analyze the difference between AlipaySEQ and existing SIGHAN benchmarks and compare the performance of various baselines, including existing task-specific methods and LLMs. We observe that all baselines fail to perform satisfactorily due to the over-correction problem. Especially, LLMs exhibit below-par performance on AlipaySEQ, which is rather surprising. Therefore, to alleviate the over-correction problem, we introduce a modelagnostic CSC Self-Refine Framework (SRF) to construct a strong baseline. Comprehensive experiments demonstrate that our proposed SRF, though more effective against existing models on both the AlipaySEQ and SIGHAN15, is still far from achieving satisfactory performance on our real-world dataset. With the newly collected real-world dataset and strong baseline, we hope more progress can be achieved on such a challenging and valuable task. code 0
Neural Kalman Filtering for Robust Temporal Recommendation Jiafeng Xia, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Li Shang, Ning Gu ; Microsoft Res Asia, Shanghai, Peoples R China; Fudan Univ, Shanghai, Peoples R China Temporal recommendation methods can achieve superior accuracy due to updating user/item embeddings continuously once obtaining new interactions. However, the randomness of user behaviors will introduce noises into the user interactions and cause the deviation in the modeling of user preference, resulting in sub-optimal performance. To this end, we propose NeuFilter, a robust temporal recommendation algorithm based on neural Kalman Filtering, to learn more accurate user and item embeddings with noisy interactions. Classic Kalman Filtering is time-consuming when applied to recommendation due to its covariance matrices. Thus, we propose a neural network solution to Kalman Filtering, so as to realize higher efficiency and stronger expressivity. Specifically, NeuFilter consists of three alternating units: 1) prediction unit, which predicts user and item embeddings based on their historical embeddings; 2) estimation unit, which updates user and item embeddings in a manner similar to Kalman Filtering; 3) correction unit, which corrects the updated user and item embeddings from estimation unit to ensure reliable estimation and accurate update. Experiments on two recommendation tasks show that NeuFilter can achieve higher accuracy compared with the state-of-the-art methods, while achieving high robustness. Moreover, our empirical studies on a node classification task further confirm the importance of handling noises in tasks on temporal graph, shedding a new light on temporal graph modeling. 时态推荐方法在获得新的交互信息后,通过不断更新用户/项目的嵌入信息,可以获得更高的推荐准确率。然而,用户行为的随机性会在用户交互中引入噪声,导致用户偏好建模的偏差,从而导致系统性能的次优。为此,我们提出了一种基于神经卡尔曼滤波的鲁棒时态推荐算法 NeuFilter,以便在有噪声的交互环境下学习更精确的用户和项目嵌入。经典卡尔曼滤波由于其协方差矩阵的特点,在推荐应用中非常耗时。为此,我们提出了一种卡尔曼滤波的神经网络解决方案,以实现更高的效率和更强的表达能力。具体来说,NeuFilter 由三个交替单元组成: 1)预测单元,根据用户和项目的历史嵌入来预测用户和项目的嵌入; 2)估计单元,以类似于卡尔曼过滤的方式更新用户和项目的嵌入; 3)校正单元,从估计单元纠正更新的用户和项目的嵌入,以确保可靠的估计和准确的更新。在两个推荐任务上的实验表明,NeuFilter 算法在获得较高的鲁棒性的同时,能够获得较高的推荐精度。此外,我们对一个节点分类任务的实证研究进一步证实了时间图任务中处理噪声的重要性,为时间图建模提供了新的视角。 code 0
Unified Pretraining for Recommendation via Task Hypergraphs Mingdai Yang, Zhiwei Liu, Liangwei Yang, Xiaolong Liu, Chen Wang, Hao Peng, Philip S. Yu Although pretraining has garnered significant attention and popularity in recent years, its application in graph-based recommender systems is relatively limited. It is challenging to exploit prior knowledge by pretraining in widely used ID-dependent datasets. On one hand, user-item interaction history in one dataset can hardly be transferred to other datasets through pretraining, where IDs are different. On the other hand, pretraining and finetuning on the same dataset leads to a high risk of overfitting. In this paper, we propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs. For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction. A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation. Experimental results on three benchmark datasets verify the superiority of UPRTH. Additional detailed investigations are conducted to demonstrate the effectiveness of the proposed framework. 尽管预训练近年来得到了广泛的关注和普及,但其在基于图形的推荐系统中的应用相对有限。在广泛使用的依赖于身份的数据集中,通过预训练来利用先验知识是一个挑战。一方面,一个数据集中的用户项交互历史很难通过预训练传递给其他数据集,因为 ID 是不同的。另一方面,对同一数据集进行预训练和微调会导致过度拟合的高风险。本文提出了一种新的多任务预训练框架——基于任务超图的推荐统一预训练。为了统一学习模式来处理不同的需求和各种借口任务的细微差别,我们设计了任务超图将借口任务推广到超边缘预测。设计了一个新颖的过渡注意层,用于区分性地学习每个借口任务和推荐之间的相关性。在三个基准数据集上的实验结果验证了 UPRTH 算法的优越性。还进行了更多的详细调查,以证明拟议框架的有效性。 code 0
COTER: Conditional Optimal Transport meets Table Retrieval Xun Yao, Zhixin Zhang, Xinrong Hu, Jie (Jack) Yang, Yi Guo, Daniel (Dianliang) Zhu Western Sydney Univ, Sch Comp Data & Math Sci, Parramatta, NSW, Australia; Wuhan Text Univ, Sch Comp Sci & Artificial Intelligence, Wuhan, Hubei, Peoples R China; Univ Wollongong, Sch Comp & Informat Technol, Wollongong, NSW, Australia; Auxilis Pty Ltd, Wollongong, NSW, Australia Ad hoc table retrieval refers to the task of performing semantic matching between given queries and candidate tables. In recent years, the approach to addressing this retrieval task has undergone significant shifts, transitioning from utilizing hand-crafted features to leveraging the power of Pre-trained Language Models (PLMs). However, key challenges arise when candidate tables contain shared items, and/or queries may refer to only a subset of table items rather than the entire one. Existing models often struggle to distinguish the most informative items and fail to accurately identify the relevant items required to match with the query. To bridge this gap, we propose Conditional Optimal Transport based table retrievER (COTER). The proposed algorithm is characterized by simplifying candidate tables, where the semantic meaning of one or several words (from the original table) is enabled to be effectively "transported" to individual words (from the simplified table), under the prior condition of the query. COTER achieves two essential goals simultaneously: minimizing the semantic loss during the table simplification and ensuring that retained items from simplified tables effectively match the given query. Importantly, the theoretical foundation of COTER empowers it to adapt dynamically to different queries and enhances the overall performance of the table retrieval. Experiments on two popular Web-table retrieval benchmarks show that COTER can effectively identify informative table items without sacrificing retrieval accuracy. This leads to the new state-of-the-art with substantial gains of up to 0.48 absolute Mean Average Precision (MAP) points, compared to the previously reported best result. Ad hoc 表检索是指在给定查询和候选表之间执行语义匹配的任务。近年来,解决这一检索任务的方法经历了重大转变,从利用手工制作的特性过渡到利用预训练语言模型(PLM)的力量。但是,当候选表包含共享项时,会出现关键问题,并且/或查询可能只引用表项的一个子集,而不是整个表项。现有的模型常常难以区分信息量最大的项目,并且无法准确地识别与查询匹配所需的相关项目。为了弥补这一差距,我们提出了基于条件最优传输的表检索器(COTER)。提出的算法拥有属性是简化候选表,其中一个或几个单词(来自原始表)的语义能够在查询的前提条件下有效地“传输”到单个单词(来自简化表)。COTER 同时实现两个基本目标: 最小化表简化过程中的语义丢失,以及确保简化表中保留的项有效地匹配给定的查询。重要的是,COTER 的理论基础使其能够动态地适应不同的查询,并提高表检索的整体性能。在两个常用的 Web 表检索基准上的实验表明,COTER 能够在不牺牲检索精度的前提下有效地识别信息表项。这导致了新的国家的最先进的大幅增益高达0.48绝对平均精度(MAP)点,相比之下,以前报告的最佳结果。 code 0
IncMSR: An Incremental Learning Approach for Multi-Scenario Recommendation Kexin Zhang, Yichao Wang, Xiu Li, Ruiming Tang, Rui Zhang Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China; Ruizhang Info, Shenzhen, Peoples R China; Huawei Noahs Ark Lab, Shenzhen, Peoples R China For better performance and less resource consumption, multi-scenario recommendation (MSR) is proposed to train a unified model to serve all scenarios by leveraging data from multiple scenarios. Current works in MSR focus on designing effective networks for better information transfer among different scenarios. However, they omit two important issues when applying MSR models in industrial situations. The first is the efficiency problem brought by mixed data, which delays the update of models and further leads to performance degradation. The second is that MSR models are insensitive to the changes of distribution over time, resulting in suboptimal effectiveness in the incoming data. In this paper, we propose an incremental learning approach for MSR (IncMSR), which can not only improve the training efficiency but also perceive changes in distribution over time. Specifically, we first quantify the pair-wise distance between representations from scenario, time and time scenario dimensions respectively. Then, we decompose the MSR model into scenario-shared and scenario-specific parts and apply fine-grained constraints on the distances quantified with respect to the two different parts. Finally, all constraints are fused in an elegant way using a metric learning framework as a supplementary penalty term to the original MSR loss function. Offline experiments on two real-world datasets are conducted to demonstrate the superiority and compatibility of our proposed approach. code 0
Defense Against Model Extraction Attacks on Recommender Systems Sixiao Zhang, Hongzhi Yin, Hongxu Chen, Cheng Long The robustness of recommender systems has become a prominent topic within the research community. Numerous adversarial attacks have been proposed, but most of them rely on extensive prior knowledge, such as all the white-box attacks or most of the black-box attacks which assume that certain external knowledge is available. Among these attacks, the model extraction attack stands out as a promising and practical method, involving training a surrogate model by repeatedly querying the target model. However, there is a significant gap in the existing literature when it comes to defending against model extraction attacks on recommender systems. In this paper, we introduce Gradient-based Ranking Optimization (GRO), which is the first defense strategy designed to counter such attacks. We formalize the defense as an optimization problem, aiming to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model. Since top-k ranking lists are non-differentiable, we transform them into swap matrices which are instead differentiable. These swap matrices serve as input to a student model that emulates the surrogate model's behavior. By back-propagating the loss of the student model, we obtain gradients for the swap matrices. These gradients are used to compute a swap loss, which maximizes the loss of the student model. We conducted experiments on three benchmark datasets to evaluate the performance of GRO, and the results demonstrate its superior effectiveness in defending against model extraction attacks. 推荐系统的健壮性已经成为研究领域的一个重要课题。已经提出了许多对抗性攻击,但大多数都依赖于广泛的先验知识,例如所有的白盒攻击或大多数的黑盒攻击都假设某些外部知识是可用的。在这些攻击中,模型提取攻击是一种很有前途和实用的攻击方法,它通过重复查询目标模型来训练代理模型。然而,现有的文献在防御推荐系统的模型抽取攻击方面还存在很大的差距。本文介绍了基于梯度的排序优化(GRO) ,这是第一种针对此类攻击而设计的防御策略。我们将防御形式化为一个最佳化问题,目的是最小化受保护目标模型的丢失,同时最大化攻击者代理模型的丢失。由于 top-k 排序列表是不可微的,所以我们将它们转换成交换矩阵,这些交换矩阵是可微的。这些交换矩阵作为学生模型的输入,模拟代理模型的行为。通过反向传播学生模型的损失,我们得到了交换矩阵的梯度。这些梯度用于计算交换损失,使学生模型的损失最大化。我们在三个基准数据集上进行了实验,对 GRO 的性能进行了评估,结果表明 GRO 在抵御模型提取攻击方面具有优越的效果。 code 0
GEMRec: Towards Generative Model Recommendation Yuanhe Guo, Haoming Liu, Hongyi Wen Recommender Systems are built to retrieve relevant items to satisfy users' information needs. The candidate corpus usually consists of a finite set of items that are ready to be served, such as videos, products, or articles. With recent advances in Generative AI such as GPT and Diffusion models, a new form of recommendation task is yet to be explored where items are to be created by generative models with personalized prompts. Taking image generation as an example, with a single prompt from the user and access to a generative model, it is possible to generate hundreds of new images in a few minutes. How shall we attain personalization in the presence of "infinite" items? In this preliminary study, we propose a two-stage framework, namely Prompt-Model Retrieval and Generated Item Ranking, to approach this new task formulation. We release GEMRec-18K, a prompt-model interaction dataset with 18K images generated by 200 publicly-available generative models paired with a diverse set of 90 textual prompts. Our findings demonstrate the promise of generative model recommendation as a novel personalization problem and the limitations of existing evaluation metrics. We highlight future directions for the RecSys community to advance towards generative recommender systems. Our code and dataset are available at https://github.com/MAPS-research/GEMRec. 建立推荐系统是为了检索相关项目,以满足用户的信息需求。候选语料库通常由一组有限的可供服务的项目组成,例如视频、产品或文章。随着生成式人工智能(Generative AI)的最新进展,如 GPT 模型和扩散模型,一种新形式的推荐任务尚待探索,其中项目将由具有个性化提示的生成式模型创建。以图像生成为例,用户只需提示一下,就可以访问一个生成模型,在几分钟内就可以生成数百张新图像。我们如何在“无限”的事物面前实现个性化?在这个初步的研究中,我们提出了一个两阶段的框架,即提示模型检索和生成的项目排序,以接近这个新的任务制定。我们发布了 GEMRec-18K,一个提示模型交互数据集,由200个公开可用的生成模型与90个文本提示配对生成18K 图像。我们的发现证明了生成模型推荐作为一个新的个性化问题的前景,以及现有评估指标的局限性。我们强调 RecSys 社区向生成推荐系统发展的未来方向。我们的代码和数据集可在 https://github.com/maps-research/gemrec 下载。 code 0
Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters Yukang Xie, Chengyu Wang, Junbing Yan, Jiyong Zhou, Feiqi Deng, Jun Huang Recently, Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks, especially for text generative tasks. Yet, the large size of LLMs often leads to the high computational cost of model training and online deployment. In our work, we present ALTER, a system that effectively builds the multi-tAsk Learners with mixTure-of-task-adaptERs upon small language models (with <1B parameters) to address multiple NLP tasks simultaneously, capturing the commonalities and differences between tasks, in order to support domain-specific applications. Specifically, in ALTER, we propose the Mixture-of-Task-Adapters (MTA) module as an extension to the transformer architecture for the underlying model to capture the intra-task and inter-task knowledge. A two-stage training method is further proposed to optimize the collaboration between adapters at a small computational cost. Experimental results over a mixture of NLP tasks show that our proposed MTA architecture and the two-stage training method achieve good performance. Based on ALTER, we have also produced MTA-equipped language models for various domains. 近年来,大语言模型(LLM)在自然语言处理(NLP)任务中取得了令人惊讶的零拍学习效果,尤其是在文本生成任务中。然而,大规模的 LLM 往往导致高计算成本的模型训练和在线部署。在我们的工作中,我们提出了 ALTER,一个系统,有效地建立多任务学习者与混合任务适配器的小语言模型(小于1B 参数)同时处理多 NLP 任务,捕捉任务之间的共性和差异,以支持领域特定的应用。具体来说,在 ALTER 中,我们提出将任务混合适配器(MTA)模块作为转换器体系结构的扩展,用于底层模型捕获任务内和任务间的知识。进一步提出了一种两阶段训练方法,以较小的计算代价优化适配器之间的协作。实验结果表明,我们提出的 MTA 体系结构和两阶段训练方法取得了良好的性能。基于 ALTER,我们还为不同领域制作了 MTA 语言模型。 code 0
Grounded and Transparent Response Generation for Conversational Information-Seeking Systems Weronika Lajewska Univ Stavanger, Stavanger, Norway While previous conversational information-seeking (CIS) research has focused on passage retrieval, reranking, and query rewriting, the challenge of synthesizing retrieved information into coherent responses remains. The proposed research delves into the intricacies of response generation in CIS systems. Open-ended information-seeking dialogues introduce multiple challenges that may lead to potential pitfalls in system responses. The study focuses on generating responses grounded in the retrieved passages and being transparent about the system's limitations. Specific research questions revolve around obtaining confidence-enriched information nuggets, automatic detection of incomplete or incorrect responses, generating responses communicating the system's limitations, and evaluating enhanced responses. By addressing these research tasks the study aspires to contribute to the advancement of conversational response generation, fostering more trustworthy interactions in CIS dialogues, and paving the way for grounded and transparent systems to meet users' needs in an information-driven world. 以往的会话信息搜索(CIS)研究主要集中在文章检索、重新排序和查询重写等方面,但如何将检索到的信息综合成连贯的反应仍然是一个挑战。这项研究深入探讨了 CIS 系统中响应生成的复杂性。不限成员名额的信息寻求对话带来了多重挑战,可能导致系统对策中的潜在陷阱。这项研究的重点是根据检索到的段落产生反应,并对系统的局限性保持透明。具体的研究问题围绕获得增强信心的信息金块,自动检测不完整或不正确的反应,产生反应传达系统的局限性,并评估增强的反应。通过解决这些研究任务,本研究渴望有助于推进会话反应的产生,促进独联体对话中更可信赖的互动,并为基础和透明的系统铺平道路,以满足用户在信息驱动的世界中的需求。 code 0
Augmenting Keyword-based Search in Mobile Applications Using LLMs Harikrishnan C, Giridhar Sreenivasa Murthy, Kumar Rangarajan Slang Labs, Engn, Bangalore, Karnataka, India Search in mobile applications has traditionally been keyword driven and limited to simple queries, such as searching for product names, even when the apps support much richer, transactional experiences. On the other hand, search on the web has evolved into queries that are complex, objective-based and most often in natural language. The recent advances in Generative AI make it possible to bring the power of web-like, conversational searches into mobile applications. In this talk, we present the various problems, opportunities and challenges in harnessing LLMs to augment the traditional search experience in mobile applications. code 0
Recent Advances in Refinement Recommendations Akshay Jagatap, Sachin Farfade Amazon, Seattle, WA 98109 USA Navigating vast e-commerce websites with extensive product catalogs can be a daunting challenge for shoppers. To assist customers in finding the products they desire, e-commerce platforms provide product attribute filters, commonly referred to as "refinements." These refinements serve as a vital navigational aid, enabling customers to refine their search results based on specific product attributes such as material, color, size, brand, etc. However, on mobile devices refinements are not easily discoverable due to lack of screen space. To improve discoverability, contextually relevant refinements are suggested in-line on search page by refinement recommendation systems. In the work, we discuss the evolution of refinement recommendations strategies i.e, a) search query-based classification approach, for a given search query we train a classification model, with the refinements as labels b) session-based classification approach, for the given sequence of session interactions we train a sequence classification model, with the refinements as labels and c) session-based generation approach, with the sequence of session interactions as input and output as the refinement name. 对于购物者来说,浏览带有大量产品目录的大型电子商务网站可能是一个艰巨的挑战。为了帮助客户找到他们想要的产品,电子商务平台提供了产品属性过滤器,通常称为“细化”这些改进是一个重要的航标,使客户能够根据特定的产品属性(如材质、颜色、尺寸、品牌等)来改进他们的搜索结果。然而,由于缺乏屏幕空间,在移动设备上不容易发现细化。为了提高可发现性,通过改进推荐系统,在搜索页面上提出与上下文相关的改进建议。在工作中,我们讨论了细化推荐策略的演化,即: a)基于搜索查询的分类方法,对于给定的搜索查询,我们训练一个分类模型,对于给定的会话交互序列,我们训练一个序列分类模型,对于给定的会话交互序列,我们训练一个基于会话的生成方法,对于会话交互序列作为输入和输出作为细化名称。 code 0
Scaling Up LLM Reviews for Google Ads Content Moderation Wei Qiao, Tushar Dogra, Otilia Stretcu, YuHan Lyu, Tiantian Fang, Dongjin Kwon, ChunTa Lu, Enming Luo, Yuan Wang, ChihChun Chia, Ariel Fuxman, Fangzhou Wang, Ranjay Krishna, Mehmet Tek Large language models (LLMs) are powerful tools for content moderation, but their inference costs and latency make them prohibitive for casual use on large datasets, such as the Google Ads repository. This study proposes a method for scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. We then use LLMs to review only the representative ads. Finally, we propagate the LLM decisions for the representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a baseline non-LLM model. The success of this approach is a strong function of the representations used in clustering and label propagation; we found that cross-modal similarity representations yield better results than uni-modal representations. 大型语言模型(LLM)是内容管理的强大工具,但是它们的推理成本和延迟使得它们不适合在大型数据集上随意使用,比如 Google 广告库。这项研究提出了一种方法,扩大 LLM 审查的内容审核在谷歌广告。首先,我们使用启发式方法通过过滤和重复删除来选择候选者,并创建广告集群,我们为每个集群选择一个代表性的广告。然后,我们使用 LLM 只审查代表性的广告。最后,我们将代表性广告的 LLM 决策传播回它们的集群。这种方法与基线非 LLM 模型相比,减少了超过3个数量级的评论数量,同时实现了2倍的召回率。这种方法的成功与聚类和标签传播中使用的表示有很大关系,我们发现跨模态相似表示比单模态表示产生更好的结果。 code 0
Customer Understanding for Recommender Systems Md. Mostafizur Rahman, Yu Hirate Rakuten Grp Inc, Rakuten Inst Technol, Tokyo, Japan Recommender systems are powerful tools for enhancing customer engagement and driving sales for Rakuten businesses. However, to achieve their full potential, these systems must possess a profound understanding of customer behaviors. This understanding can be gained from a variety of sources, including customer purchase history, customer feedback, and customer behavioral patterns. One of the most important aspects of customer understanding is the ability to identify lookalike customers, understand their behavioral patterns, and predict lifestyles, for example, whether a customer is married or unmarried, owns a car or plays golf, etc. Rakuten provides more than 70 different services and heavily relies on recommendations for many of its products. In our platforms, we can observe groups of customers who share similar interests, needs, or behaviors often end up being attracted to similar products or services. Customer preferences can change over time, so it is important for recommender systems to adapt those changes. This can be achieved by tracking customer behavior, static or dynamic environment changes around targeted customers, and their feedback. We utilize various graph and deep learning based models to address the customer understanding problem. The objective of this paper is to offer a comprehensive overview of customer understanding and modeling for complex recommender systems, leading to increased customer satisfaction, loyalty, and sales. We also present various empirical results on both Rakuten data as well as public benchmark datasets, providing evidence of the benefits of customer understanding on various downstream tasks. We also share insights to characterize the circumstances in which the graph-based and deep learning-based models offer the most significant improvements. We conclude this study by briefly showcasing real-life applications and discussing potential future developments and improvements. 推荐系统是强大的工具,以提高客户参与和推动销售乐天业务。然而,要充分发挥它们的潜力,这些系统必须对客户行为有深刻的理解。这种理解可以从各种来源获得,包括客户购买历史、客户反馈和客户行为模式。客户理解的最重要方面之一是识别相似客户的能力,了解他们的行为模式,并预测生活方式,例如,客户是否已婚或未婚,拥有一辆汽车或打高尔夫球,等等。乐天提供70多种不同的服务,其许多产品严重依赖推荐。在我们的平台中,我们可以观察到一组具有相似兴趣、需求或行为的客户,他们通常最终会被相似的产品或服务所吸引。客户的偏好可以随着时间的推移而变化,因此推荐系统适应这些变化非常重要。这可以通过跟踪客户行为、目标客户周围的静态或动态环境变化及其反馈来实现。我们利用各种图形和基于深度学习的模型来解决客户理解问题。本文的目的是为复杂的推荐系统提供客户理解和建模的全面概述,从而提高客户满意度、忠诚度和销售额。我们还展示了乐天数据和公共基准数据集的各种实证结果,为客户理解各种下游任务的好处提供了证据。我们还分享了对基于图表和基于深度学习的模型在哪些情况下提供了最重要的改进的见解。我们通过简要展示实际应用并讨论未来的潜在发展和改进来结束这项研究。 code 0
"Maya"- A Conversational Shopping Assistant for Fashion at Myntra Akhil Raj, Hrishikesh Ganu, Saikat Kumar Das, R. Sandeep, Satyajeet Singh, Sreekanth Vempati code 0
Fresh Content Recommendation at Scale: A Multi-funnel Solution and the Potential of LLMs Jianling Wang, Haokai Lu, Minmin Chen Google DeepMind, London, England Recommendation system serves as a conduit connecting users to an incredibly large, diverse and ever growing collection of contents. In practice, missing information on fresh contents needs to be filled in order for them to be exposed and discovered by their audience. In this context, we are delighted to share our success stories in building a dedicated fresh content recommendation stack on a large commercial platform and also shed a light on the utilization of Large Language Models (LLMs) for fresh content recommendations within an industrial framework. To nominate fresh contents, we built a multi-funnel nomination system that combines (i) a two-tower model with strong generalization power for coverage, and (ii) a sequence model with near real-time update on user feedback for relevance, which effectively balances between coverage and relevance. Beyond that, by harnessing the reasoning and generalization capabilities of LLMs, we are presented with exciting prospects to enhance recommendation systems. We share our initial efforts on employing LLMs as data augmenters to bridge the knowledge gap on cold-start items during the training phase. This innovative approach circumvents the costly generation process during inference, presenting a model-agnostic, forward-looking solution for fresh content recommendation. 推荐系统作为一个渠道,连接用户到一个令人难以置信的庞大,多样化和不断增长的内容集合。在实践中,需要填补关于新内容的缺失信息,以便读者能够揭示和发现这些信息。在这种背景下,我们很高兴分享我们在一个大型商业平台上建立一个专门的新内容推荐堆栈的成功故事,同时也为在一个行业框架内利用大型语言模型(LLM)建立新内容推荐提供了一些启示。为了提名新内容,我们建立了一个多漏斗提名系统,该系统结合了(i)具有很强的覆盖泛化能力的双塔模型和(ii)具有近实时更新用户反馈相关性的序列模型,有效地平衡了覆盖和相关性。除此之外,通过利用 LLM 的推理和泛化能力,我们展示了增强推荐系统的令人兴奋的前景。我们分享了我们在使用 LLM 作为数据增强器以弥合培训阶段冷启动项目的知识差距方面的初步努力。这种创新的方法规避了推理过程中代价高昂的生成过程,为新内容推荐提供了一种与模型无关的前瞻性解决方案。 code 0
Lessons Learnt from Building Friend Recommendation Systems Jun Yu Snap Inc, Santa Monica, CA 90405 USA Friend recommendation systems in online social networks such as Snapchat help users find friends and build meaningful connections, leading to heightened user engagement and retention. While friend recommendation systems follow the classical recommendation system paradigm that consists of retrieval and ranking, they pose distinctive challenges different from item recommendation systems (e.g. Youtube videos, Amazon products, Netflix movies), and require special considerations in building one. In this paper, we elucidate the unique challenges encountered and share invaluable insights from developing the friend recommendation system for hundreds of millions of users on Snapchat. Snapchat 等在线社交网络中的朋友推荐系统可以帮助用户找到朋友,建立有意义的关系,从而提高用户的参与度和保持率。虽然朋友推荐系统遵循由检索和排名组成的传统推荐系统范式,但它们构成了与项目推荐系统(如 YouTube 视频、亚马逊产品、 Netflix 电影)不同的独特挑战,在构建这样一个系统时需要特别的考虑。在本文中,我们阐述了在 Snapchat 上为数亿用户开发朋友推荐系统所遇到的独特挑战,并分享了宝贵的见解。 code 0
Leveraging Multimodal Features and Item-level User Feedback for Bundle Construction Yunshan Ma, Xiaohao Liu, Yinwei Wei, Zhulin Tao, Xiang Wang, TatSeng Chua Automatic bundle construction is a crucial prerequisite step in various bundle-aware online services. Previous approaches are mostly designed to model the bundling strategy of existing bundles. However, it is hard to acquire large-scale well-curated bundle dataset, especially for those platforms that have not offered bundle services before. Even for platforms with mature bundle services, there are still many items that are included in few or even zero bundles, which give rise to sparsity and cold-start challenges in the bundle construction models. To tackle these issues, we target at leveraging multimodal features, item-level user feedback signals, and the bundle composition information, to achieve a comprehensive formulation of bundle construction. Nevertheless, such formulation poses two new technical challenges: 1) how to learn effective representations by optimally unifying multiple features, and 2) how to address the problems of modality missing, noise, and sparsity problems induced by the incomplete query bundles. In this work, to address these technical challenges, we propose a Contrastive Learning-enhanced Hierarchical Encoder method (CLHE). Specifically, we use self-attention modules to combine the multimodal and multi-item features, and then leverage both item- and bundle-level contrastive learning to enhance the representation learning, thus to counter the modality missing, noise, and sparsity problems. Extensive experiments on four datasets in two application domains demonstrate that our method outperforms a list of SOTA methods. The code and dataset are available at https://github.com/Xiaohao-Liu/CLHE. 在各种捆绑感知在线服务中,自动捆绑包构造是关键的先决条件。以前的方法主要用于对现有的捆绑包的捆绑策略进行建模。然而,很难获得大规模、组织良好的捆绑数据集,特别是对于那些以前没有提供捆绑服务的平台。即使对于具有成熟捆绑服务的平台来说,仍然有很多项目被包含在很少甚至没有捆绑包中,这在捆绑包构建模型中引起了稀疏性和冷启动的挑战。为了解决这些问题,我们的目标是利用多模态特征、项目级用户反馈信号和捆绑包组合信息,以实现捆绑包构造的全面表述。然而,这样的表述提出了两个新的技术挑战: 1)如何通过最优地统一多个特征来学习有效的表示,2)如何解决由不完整查询包引起的模态缺失、噪声和稀疏问题。在这项工作中,为了解决这些技术挑战,我们提出了一个对比学习增强的层次编码器方法(CLHE)。具体来说,我们使用自我注意模块来结合多模态和多项目特征,然后利用项目和捆绑层的对比学习来增强表征学习,从而克服模态缺失、噪声和稀疏问题。在两个应用程序域中对四个数据集进行的大量实验表明,我们的方法优于 SOTA 方法列表。代码和数据集可在 https://github.com/xiaohao-liu/clhe 下载。 code 0
Cost-Effective Active Learning for Bid Exploration in Online Advertising Zixiao Wang, Zhenzhe Zheng, Yanrong Kang, Jiani Huang Shanghai Jiao Tong Univ, Shanghai, Peoples R China; Tencent, Advertising & Mkt Serv, Shenzhen, Peoples R China As a bid optimization algorithm in the first-price auction (FPA), bid shading is used in online advertising to avoid overpaying for advertisers. However, we find the bid shading approach would incur serious local optima. This effect prevents the advertisers from maximizing long-term surplus. In this work, we identify the reasons behind this local optima - it comes from the lack of winning price information, which results in the conflict between short-term surplus and the winning rate prediction model training, and is further propagated through the over-exploitation of the model. To rectify this problem, we propose a cost-effective active learning strategy, namely CeBE, for bid exploration. Specifically, we comprehensively consider the uncertainty and density of samples to calculate exploration utility, and use a 2 +epsilon-approximation greedy algorithm to control exploration costs. Instead of selecting bid prices that maximize the expected surplus for all bid requests, we employ the bid exploration strategy to determine the bid prices. By trading off a portion of surplus, we can train the model using higher-quality data to enhance its performance, enabling the system to achieve a long-term surplus. Our method is straightforward and applicable to real-world industrial environment: it is effective across various categories of winning rate prediction models. We conducted empirical studies to validate the efficacy of our approach. In comparison to the traditional bid shading system, CeBE can yield an average surplus improvement of 8.16% across various models and datasets. 作为一级价格拍卖(FPA)中的一种投标优化算法,在网络广告中使用了投标遮蔽技术,以避免广告商支付过高的广告费用。然而,我们发现投标阴影的方法会产生严重的局部最优。这种效应阻碍了广告商最大化长期盈余。在本文中,我们找出了这种局部最优的原因-它来自于缺乏中标价格信息,导致短期盈余与中标率预测模型训练之间的冲突,并通过模型的过度开发进一步传播。为了解决这个问题,我们提出了一个具有成本效益的主动学习策略,即 CeBE,用于投标探索。具体来说,我们综合考虑样本的不确定性和密度来计算勘探效用,并使用2 + ε 近似贪婪算法来控制勘探成本。我们不选择使所有投标请求的预期盈余最大化的投标价格,而是采用投标探索策略来确定投标价格。通过权衡一部分盈余,我们可以训练模型使用更高质量的数据,以提高其性能,使系统能够实现长期盈余。我们的方法是直接和适用于现实世界的工业环境: 它是有效的各种类型的中标率预测模型。我们进行了实证研究,以验证我们的方法的有效性。与传统的投标着色系统相比,CeBE 在不同的模型和数据集中可以产生平均8.16% 的剩余改进。 code 0
LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection Zijian Cai, Zhaoxuan Tan, Zhenyu Lei, Zifeng Zhu, Hongrui Wang, Qinghua Zheng, Minnan Luo As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-the-art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-consuming and may introduce bias. At the same time, we find that after finetuning on Twitter bot detection, pretrained language models achieve competitive performance and do not require a graph structure during deployment. Inspired by this finding, we propose a novel bot detection framework LMBot that distills the knowledge of graph neural networks (GNNs) into language models (LMs) for graph-less deployment in Twitter bot detection to combat the challenge of data dependency. Moreover, LMBot is compatible with graph-based and graph-less datasets. Specifically, we first represent each user as a textual sequence and feed them into the LM for domain adaptation. For graph-based datasets, the output of LMs provides input features for the GNN, enabling it to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process. Armed with the LM, we can perform graph-less inference, which resolves the graph data dependency and sampling bias issues. For datasets without graph structure, we simply replace the GNN with an MLP, which has also shown strong performance. Our experiments demonstrate that LMBot achieves state-of-the-art performance on four Twitter bot detection benchmarks. Extensive studies also show that LMBot is more robust, versatile, and efficient compared to graph-based Twitter bot detection methods. 随着恶意行为者使用日益先进和广泛的机器人传播错误信息和操纵公众舆论,检测 Twitter 机器人已成为一项至关重要的任务。虽然基于图的 Twitter 机器人检测方法取得了很好的性能,但是我们发现它们的推理依赖于离目标多跳的邻居用户,而且提取邻居非常耗时,并且可能会引入偏差。同时,我们发现在 Twitter 机器人检测上进行微调之后,预先训练的语言模型在部署过程中不需要图形结构就可以获得有竞争力的性能。受到这一发现的启发,我们提出了一种新的机器人检测框架 LMBot,该框架将图神经网络(GNN)的知识提取为语言模型(LMs) ,用于 Twitter 机器人检测中的无图部署,以应对数据依赖的挑战。此外,LMBot 还兼容基于图的和无图的数据集。具体来说,我们首先将每个用户表示为一个文本序列,并将它们提供给 LM 以进行域适配。对于基于图形的数据集,LM 的输出为 GNN 提供了输入特性,使其能够优化机器人检测,并在迭代、相互增强的过程中将知识提取回 LM。在 LM 的支持下,我们可以进行无图推理,解决了图数据的依赖性和抽样偏差问题。对于没有图形结构的数据集,我们简单地用 MLP 替换 GNN,这也显示出很强的性能。我们的实验表明,LMBot 在四个 Twitter 机器人检测基准上实现了最先进的性能。大量的研究还表明,与基于图形的 Twitter 机器人检测方法相比,LMBot 更加健壮、通用和高效。 code 0
Long-Term Value of Exploration: Measurements, Findings and Algorithms Yi Su, Xiangyu Wang, Elaine Ya Le, Liang Liu, Yuening Li, Haokai Lu, Benjamin Lipshitz, Sriraj Badam, Lukasz Heldt, Shuchao Bi, Ed H. Chi, Cristos Goodrow, SuLin Wu, Lexi Baugher, Minmin Chen Effective exploration is believed to positively influence the long-term user experience on recommendation platforms. Determining its exact benefits, however, has been challenging. Regular A/B tests on exploration often measure neutral or even negative engagement metrics while failing to capture its long-term benefits. We here introduce new experiment designs to formally quantify the long-term value of exploration by examining its effects on content corpus, and connecting content corpus growth to the long-term user experience from real-world experiments. Once established the values of exploration, we investigate the Neural Linear Bandit algorithm as a general framework to introduce exploration into any deep learning based ranking systems. We conduct live experiments on one of the largest short-form video recommendation platforms that serves billions of users to validate the new experiment designs, quantify the long-term values of exploration, and to verify the effectiveness of the adopted neural linear bandit algorithm for exploration. 有效的探索被认为会对推荐平台的长期用户体验产生积极的影响。然而,确定它的确切好处却是一个挑战。定期的 A/B 探索测试常常衡量中性甚至负面的参与度量,但未能获得其长期利益。我们在这里引入新的实验设计,通过考察探索对内容语料库的影响,并将内容语料库的增长与现实世界中的长期用户体验联系起来,形式地量化探索的长期价值。一旦确立了探索的价值,我们研究神经线性班迪特算法作为一个一般框架,引入探索任何深度学习的排序系统。我们在一个最大的短格式视频推荐平台上进行现场实验,该平台为数十亿用户提供服务,以验证新的实验设计,量化探索的长期价值,并验证所采用的神经线性土匪算法在探索中的有效性。 code 0
Unified Visual Preference Learning for User Intent Understanding Yihua Wen, Si Chen, Yu Tian, Wanxian Guan, Pengjie Wang, Hongbo Deng, Jian Xu, Bo Zheng, Zihao Li, Lixin Zou, Chenliang Li Wuhan Univ, Minist Educ, Sch Cyber Sci & Engn, Key Lab Aerosp Informat Secur & Trusted Comp, Wuhan, Peoples R China; Alibaba Grp China, Hangzhou, Peoples R China In the world of E-Commerce, the core task is to understand the personalized preference from various kinds of heterogeneous information, such as textual reviews, item images and historical behaviors. In current systems, these heterogeneous information are mainly exploited to generate better item or user representations. For example, in scenario of visual search, the importance of modeling query image has been widely acknowledged. But, these existing solutions focus on improving the representation quality of the query image, overlooking the personalized visual preference of the user. Note that the visual features affect the user's decision significantly, e.g., a user could be more likely to click the items with her preferred design. Hence, it is fruitful to exploit the visual preference to deliver better capacity for personalization. To this end, we propose a simple yet effective target-aware visual preference learning framework (named Tavern) for both item recommendation and search. The proposed Tavern works as an individual and generic model that can be smoothly plugged into different downstream systems. Specifically, for visual preference learning, we utilize the image of the target item to derive the visual preference signals for each historical clicked item. This procedure is modeled as a form of representation disentanglement, where the visual preference signals are extracted by taking off the noisy information irrelevant to visual preference from the shared visual information between the target and historical items. During this process, a novel selective orthogonality disentanglement is proposed to avoid the significant information loss. Then, a GRU network is utilized to aggregate these signals to form the final visual preference representation. Extensive experiments over three large-scale real-world datasets covering visual search, product search and recommendation well demonstrate the superiority of our proposed Tavern against existing technical alternatives. Further ablation study also confirms the validity of each design choice 电子商务的核心任务是从文本评论、商品图像和历史行为等各种异质信息中理解个性化偏好。在当前的系统中,这些异构信息主要被用来生成更好的条目或用户表示。例如,在可视化搜索场景中,对查询图像进行建模的重要性得到了广泛的认可。但是,现有的解决方案侧重于提高查询图像的表示质量,忽视了用户的个性化视觉偏好。请注意,视觉特性对用户的决定有显著影响,例如,用户可能更有可能按照自己喜欢的设计单击项目。因此,利用视觉偏好来提供更好的个性化能力是有成效的。为此,我们提出了一个简单而有效的目标感知的视觉偏好学习框架(名为 Tavern)项目推荐和搜索。建议的酒馆作为一个独立的和通用的模式,可以顺利地插入到不同的下游系统。具体来说,对于视觉偏好学习,我们利用目标项的图像来获得每个历史单击项的视觉偏好信号。这个过程被建模为一种表示分离的形式,通过从目标和历史项目之间共享的视觉信息中去除与视觉偏好无关的噪声信息来提取视觉偏好信号。在此过程中,提出了一种新的选择性正交解纠缠算法,以避免重大的信息损失。然后,利用 GRU 网络对这些信号进行聚合,形成最终的视觉偏好表示。通过对包括视觉搜索、产品搜索和推荐在内的三个大规模现实世界数据集的大量实验,充分证明了我们提出的 Tavern 相对于现有技术选择的优越性。进一步的消融研究也证实了每个设计选择的有效性 code 0
Framework for Bias Detection in Machine Learning Models: A Fairness Approach Alveiro Alonso Rosado Gomez, Maritza Liliana CalderónBenavides Univ Autonoma Bucaramanga, Bucaramanga, Colombia; Univ Francisco Paula Santander Ocana, Ocana, Colombia The research addresses bias and inequity in binary classification problems in machine learning. Despite existing ethical frameworks for artificial intelligence, detailed guidance on practices and techniques to address these issues is lacking. The main objective is to identify and analyze theoretical and practical components related to the detection and mitigation of biases and inequalities in machine learning. The proposed approach combines best practices, ethics, and technology to promote the responsible use of artificial intelligence in Colombia. The methodology covers the definition of performance and fairness interests, interventions in preprocessing, processing, and post-processing, and the generation of recommendations and explainability of the model. 研究了机器学习中二进制分类问题中的偏差和不公平问题。尽管现有人工智能的伦理框架,但缺乏解决这些问题的实践和技术的详细指导。主要目标是识别和分析与检测和减轻机器学习中的偏差和不平等有关的理论和实践成分。提议的方法结合了最佳实践、伦理和技术,以促进哥伦比亚负责任地使用人工智能。该方法包括性能和公平利益的定义,预处理、处理和后处理的干预,以及模型的建议和可解释性的生成。 code 0
IDoFew: Intermediate Training Using Dual-Clustering in Language Models for Few Labels Text Classification Abdullah Alsuhaibani, Hamad Zogan, Imran Razzak, Shoaib Jameel, Guandong Xu Language models such as Bidirectional Encoder Representations from Transformers (BERT) have been very effective in various Natural Language Processing (NLP) and text mining tasks including text classification. However, some tasks still pose challenges for these models, including text classification with limited labels. This can result in a cold-start problem. Although some approaches have attempted to address this problem through single-stage clustering as an intermediate training step coupled with a pre-trained language model, which generates pseudo-labels to improve classification, these methods are often error-prone due to the limitations of the clustering algorithms. To overcome this, we have developed a novel two-stage intermediate clustering with subsequent fine-tuning that models the pseudo-labels reliably, resulting in reduced prediction errors. The key novelty in our model, IDoFew, is that the two-stage clustering coupled with two different clustering algorithms helps exploit the advantages of the complementary algorithms that reduce the errors in generating reliable pseudo-labels for fine-tuning. Our approach has shown significant improvements compared to strong comparative models. 语言模型,例如变换器的双向编码器表示(BERT) ,在各种自然语言处理(NLP)和文本挖掘任务(包括文本分类)中都是非常有效的。然而,一些任务仍然对这些模型构成挑战,包括使用有限标签的文本分类。这可能导致冷启动问题。虽然一些方法试图通过单阶段聚类作为一个中间训练步骤加上预训练语言模型来解决这个问题,从而产生伪标签来改善分类,但由于聚类算法的局限性,这些方法往往容易出错。为了克服这个问题,我们开发了一个新的两阶段中间聚类和随后的微调,可靠地建模伪标签,从而减少预测误差。在我们的模型中,IDofew 的关键新颖之处在于,两阶段聚类和两种不同的聚类算法相结合,有助于利用互补算法的优势,减少生成可靠的伪标签进行微调时的错误。与强大的比较模型相比,我们的方法已经显示出显著的改进。 code 0
MAD: Multi-Scale Anomaly Detection in Link Streams Esteban Bautista, Laurent Brisson, Cécile Bothorel, Grégory Smits IMT Atlantique, Comp Sci Dept, Lab STICC, UMR CNRS 6285, Brest, France; IMT Atlantique, LUSSI Dept, Lab STICC UMR CNRS 6285, Brest, France Given an arbitrary group of computers, how to identify abnormal changes in their communication pattern? How to assess if the absence of some communications is normal or due to a failure? How to distinguish local from global events when communication data are extremely sparse and volatile? Existing approaches for anomaly detection in interaction streams, focusing on edge, nodes or graphs, lack flexibility to monitor arbitrary communication topologies. Moreover, they rely on structural features that are not adapted to highly sparse settings. In this work, we introduce MAD, a novel Multi-scale Anomaly Detection algorithm that (i) allows to query for the normality/abnormality state of an arbitrary group of observed/non-observed communications at a given time; and (ii) handles the highly sparse and uncertain nature of interaction data through a scoring method that is based on a novel probabilistic and multi-scale analysis of sub-graphs. In particular, MAD is (a) flexible: it can assess if any time-stamped subgraph is anomalous, making edge, node and graph anomalies particular instances; (b) interpretable: its multi-scale analysis allows to characterize the scope and nature of the anomalies; (c) efficient: given historical data of length.. and.. observed/non-observed communications to analyze, MAD produces an anomaly score in O(NM); and (d) effective: it significantly outperforms state-of-the-art alternatives tailored for edge, node or graph anomalies. 给定一组任意的计算机,如何识别其通信模式中的异常变化?如何评估缺少某些通信是正常的还是由于故障造成的?当通信数据非常稀疏和不稳定时,如何区分本地事件和全局事件?现有的交互流异常检测方法,主要集中在边缘、节点或图形上,缺乏监控任意通信拓扑的灵活性。此外,它们依赖于不适应高度稀疏环境的结构特征。在这项工作中,我们介绍了 MAD,一种新的多尺度异常检测算法,它(i)允许在给定的时间查询任意组观察/未观察通信的正常/异常状态,(ii)通过基于新的概率和子图的多尺度分析的评分方法处理交互数据的高度稀疏和不确定性。具体来说,MAD 是(a)灵活的: 它可以评估是否有任何时间戳子图是异常的,使边缘,节点和图异常的特殊实例; (b)可解释的: 它的多尺度分析允许描述异常的范围和性质; (c)有效的: 给定长度的历史数据。.还有。.观察/未观察到的通信进行分析,MAD 在 O (NM)中产生异常评分; (d)有效: 它显著优于为边缘、节点或图形异常量身定制的最先进替代方案。 code 0
Customized and Robust Deep Neural Network Watermarking TzuYun Chien, ChihYa Shen Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan As the excellent performance of deep neural networks (DNNs) enhances a wide spectrum of applications, the protection of intellectual property (IP) of DNNs receives increasing attention recently, and DNN watermarking approaches are thus proposed for ownership verification to avoid potential misuses or thefts of DNN models. However, we observe that existing DNN watermark methods suffer from two major weaknesses: i) Incomplete protection to advanced watermark removal attacks, such as fine-tune attack with large learning rates, re-train after pruning, and most importantly, the distillation attack; ii) Limited customization ability, where multiple watermarked models cannot be uniquely identified, especially after removal attacks. To address these critical issues, we propose two new DNN watermarking approaches, Unified Soft-label Perturbation (USP), which provides robust watermark to detect model thefts, and Customized Soft-label Perturbation (CSP), which is able to embed a different watermark in each copy of the DNN model to enable customized watermarking. Experimental results show that our proposed USP and CSP resist all the watermark removal attacks, especially for the distillation attack, and the proposed CSP achieves very promising watermark customization ability, significantly outperforming the other state-of-the-art baselines. 由于深度神经网络(DNN)的优异性能使其广泛应用,DNN 的知识产权保护近年来受到越来越多的关注,为了避免 DNN 模型的潜在误用或盗窃,提出了 DNN 水印技术用于所有权验证。然而,我们观察到现有的 DNN 水印方法存在两个主要弱点: 1)对高级水印去除攻击的不完全保护,例如具有较高学习率的微调攻击,修剪后重新训练,最重要的是蒸馏攻击; 2)定制能力有限,其中多个水印模型不能唯一识别,特别是在去除攻击之后。为了解决这些关键问题,我们提出了两种新的 DNN 水印方法,统一软标签扰动(USP)和定制软标签扰动(CSP) ,前者提供了检测模型盗窃的稳健水印,后者能够在 DNN 模型的每个副本中嵌入不同的水印,从而实现定制水印。实验结果表明,我们提出的 USP 和 CSP 能够抵抗所有的水印去除攻击,特别是对于蒸馏攻击,并且该 CSP 能够实现非常有前景的水印定制能力,明显优于其他最先进的基线。 code 0
Incomplete Graph Learning via Attribute-Structure Decoupled Variational Auto-Encoder Xinke Jiang, Zidi Qin, Jiarong Xu, Xiang Ao Inst Intelligent Comp Technol, Suzhou, Peoples R China; Fudan Univ, Shanghai, Peoples R China; Chinese Acad Sci, Univ Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China Graph Neural Networks (GNNs) conventionally operate under the assumption that node attributes are entirely observable. Their performance notably deteriorates when confronted with incomplete graphs due to the inherent message-passing mechanisms. Current solutions either employ classic imputation techniques or adapt GNNs to tolerate missed attributes. However, their ability to generalize is impeded especially when dealing with high rates of missing attributes. To address this, we harness the representations of the essential views on graphs, attributes and structures, into a common shared latent space, ensuring robust tolerance even at high missing rates. Our proposed neural model, named ASD-VAE, parameterizes such space via a coupled-and-decoupled learning procedure, reminiscent of brain cognitive processes and multimodal fusion. Initially, ASD-VAE separately encodes attributes and structures, generating representations for each view. A shared latent space is then learned by maximizing the likelihood of the joint distribution of different view representations through coupling. Then, the shared latent space is decoupled into separate views, and the reconstruction loss of each view is calculated. Finally, the missing values of attributes are imputed from this learned latent space. In this way, the model offers enhanced resilience against skewed and biased distributions typified by missing information and subsequently brings benefits to downstream graph machine-learning tasks. Extensive experiments conducted on four typical real-world incomplete graph datasets demonstrate the superior performance of ASD-VAE against the state-of-the-art. 图神经网络(GNN)通常在假设节点属性完全可观察的情况下运行。由于固有的消息传递机制,当遇到不完整的图时,它们的性能明显下降。当前的解决方案要么使用经典的插补技术,要么使 GNN 能够容忍缺失的属性。然而,他们的概括能力受到阻碍,特别是当处理高比率的缺失属性。为了解决这个问题,我们利用关于图、属性和结构的基本视图的表示进入一个共享的潜在空间,确保即使在高丢失率下也能保持强大的容忍度。我们提出的神经模型,命名为 ASD-VAE,通过耦合和解耦的学习过程参数化这样的空间,让人想起大脑认知过程和多模态融合。最初,ASD-VAE 分别对属性和结构进行编码,为每个视图生成表示。然后通过耦合最大化不同视图表示的联合分布的可能性来学习共享潜空间。然后,将共享潜空间解耦为独立的视图,并计算每个视图的重构损失。最后,从这个学习的潜在空间推算出缺失的属性值。通过这种方式,该模型提供了增强的弹性对倾斜和有偏见的分布典型缺少信息,并随后带来的好处,下游图形机器学习任务。在四个典型的真实世界不完整图形数据集上进行的大量实验表明,ASD-VAE 具有比最先进的图形数据集更好的性能。 code 0
Source Free Graph Unsupervised Domain Adaptation Haitao Mao, Lun Du, Yujia Zheng, Qiang Fu, Zelin Li, Xu Chen, Shi Han, Dongmei Zhang Graph Neural Networks (GNNs) have achieved great success on a variety of tasks with graph-structural data, among which node classification is an essential one. Unsupervised Graph Domain Adaptation (UGDA) shows its practical value of reducing the labeling cost for node classification. It leverages knowledge from a labeled graph (i.e., source domain) to tackle the same task on another unlabeled graph (i.e., target domain). Most existing UGDA methods heavily rely on the labeled graph in the source domain. They utilize labels from the source domain as the supervision signal and are jointly trained on both the source graph and the target graph. However, in some real-world scenarios, the source graph is inaccessible because of either unavailability or privacy issues. Therefore, we propose a novel scenario named Source Free Unsupervised Graph Domain Adaptation (SFUGDA). In this scenario, the only information we can leverage from the source domain is the well-trained source model, without any exposure to the source graph and its labels. As a result, existing UGDA methods are not feasible anymore. To address the non-trivial adaptation challenges in this practical scenario, we propose a model-agnostic algorithm for domain adaptation to fully exploit the discriminative ability of the source model while preserving the consistency of structural proximity on the target graph. We prove the effectiveness of the proposed algorithm both theoretically and empirically. The experimental results on four cross-domain tasks show consistent improvements of the Macro-F1 score up to 0.17. 图形神经网络(GNN)在处理图形结构数据的各种任务上取得了巨大的成功,其中节点分类是其中的一个重要内容。无监督图域自适应算法(UGDA)在降低节点分类的标记代价方面具有一定的实用价值。它利用来自标记图(即源域)的知识来处理另一个未标记图(即目标域)上的相同任务。大多数现有的 UGDA 方法严重依赖于源域中的标记图。它们利用来自源域的标签作为监督信号,并在源图和目标图上联合训练。但是,在一些真实场景中,源图是不可访问的,这是由于不可用性或隐私问题。因此,我们提出了一种新的场景称为源自由无监督图域适应(SFUGDA)。在这个场景中,我们可以从源域中利用的唯一信息是经过良好训练的源模型,而不需要暴露于源图及其标签。因此,现有的 UGDA 方法不再可行。为了解决这一实际场景中的非平凡自适应问题,我们提出了一种领域自适应的模型无关算法,以充分利用源模型的鉴别能力,同时保持目标图上结构接近度的一致性。从理论和实验两方面证明了该算法的有效性。在四个跨领域任务上的实验结果表明,宏观 F1得分一致提高到0.17。 code 0
PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization Ziqi Yuan, Haoyi Zhou, Tianyu Chen, Jianxin Li A multitude of toxic online behaviors, ranging from network attacks to anonymous traffic and spam, have severely disrupted the smooth operation of networks. Due to the inherent sender-receiver nature of network behaviors, graph-based frameworks are commonly used for detecting anomalous behaviors. However, in real-world scenarios, the boundary between normal and anomalous behaviors tends to be ambiguous. The local heterophily of graphs interferes with the detection, and existing methods based on nodes or edges introduce unwanted noise into representation results, thereby impacting the effectiveness of detection. To address these issues, we propose PhoGAD, a graph-based anomaly detection framework. PhoGAD leverages persistent homology optimization to clarify behavioral boundaries. Building upon this, the weights of adjacent edges are designed to mitigate the effects of local heterophily. Subsequently, to tackle the noise problem, we conduct a formal analysis and propose a disentangled representation-based explicit embedding method, ultimately achieving anomaly behavior detection. Experiments on intrusion, traffic, and spam datasets verify that PhoGAD has surpassed the performance of state-of-the-art (SOTA) frameworks in detection efficacy. Notably, PhoGAD demonstrates robust detection even with diminished anomaly proportions, highlighting its applicability to real-world scenarios. The analysis of persistent homology demonstrates its effectiveness in capturing the topological structure formed by normal edge features. Additionally, ablation experiments validate the effectiveness of the innovative mechanisms integrated within PhoGAD. 从网络攻击到匿名流量和垃圾邮件,大量有毒的网络行为严重扰乱了网络的正常运行。由于网络行为固有的发送方-接收方特性,基于图的框架通常用于检测异常行为。然而,在真实的场景中,正常行为和异常行为之间的界限往往是模糊的。图的局部异质性干扰检测,现有的基于节点或边的检测方法在表示结果中引入了不必要的噪声,影响了检测的有效性。为了解决这些问题,我们提出了 PhogAD,一个基于图形的异常检测框架。PhoGAD 利用持久同源优化来澄清行为边界。在此基础上,设计相邻边的权值以减轻局部异质性的影响。随后,针对噪声问题,进行了形式化分析,提出了一种基于分离表示的显式嵌入方法,最终实现了异常行为检测。在入侵、流量和垃圾邮件数据集上的实验证明,PhoGAD 在检测效率方面已经超过了最先进(State-of-art,SOTA)框架的性能。值得注意的是,PhoGAD 即使在异常比例减少的情况下仍然表现出强大的检测能力,突出了它对真实场景的适用性。对持久同调性的分析表明,该方法能够有效地捕获由正常边缘特征构成的拓扑结构。此外,烧蚀实验验证了集成在 PhoGAD 中的创新机制的有效性。 code 0
The Devil is in the Data: Learning Fair Graph Neural Networks via Partial Knowledge Distillation Yuchang Zhu, Jintang Li, Liang Chen, Zibin Zheng Graph neural networks (GNNs) are being increasingly used in many high-stakes tasks, and as a result, there is growing attention on their fairness recently. GNNs have been shown to be unfair as they tend to make discriminatory decisions toward certain demographic groups, divided by sensitive attributes such as gender and race. While recent works have been devoted to improving their fairness performance, they often require accessible demographic information. This greatly limits their applicability in real-world scenarios due to legal restrictions. To address this problem, we present a demographic-agnostic method to learn fair GNNs via knowledge distillation, namely FairGKD. Our work is motivated by the empirical observation that training GNNs on partial data (i.e., only node attributes or topology data) can improve their fairness, albeit at the cost of utility. To make a balanced trade-off between fairness and utility performance, we employ a set of fairness experts (i.e., GNNs trained on different partial data) to construct the synthetic teacher, which distills fairer and informative knowledge to guide the learning of the GNN student. Experiments on several benchmark datasets demonstrate that FairGKD, which does not require access to demographic information, significantly improves the fairness of GNNs by a large margin while maintaining their utility. 图形神经网络(GNN)在许多高风险任务中的应用越来越广泛,其公平性也越来越受到人们的关注。GNN 被证明是不公平的,因为他们倾向于对某些人口群体作出歧视性的决定,除以性别和种族等敏感特征。虽然最近的工作致力于改善他们的公平性表现,他们往往需要可访问的人口统计信息。由于法律限制,这极大地限制了它们在现实世界场景中的适用性。为了解决这个问题,我们提出了一种通过知识提取来学习公平 GNN 的人口不可知方法,即 FairGKD。我们的工作是基于实证观察,即部分数据(即,只有节点属性或拓扑数据)训练 GNN 可以提高它们的公平性,尽管代价是效用。为了在公平和效用绩效之间做出平衡的权衡,我们雇佣了一组公平专家(即,接受不同部分数据训练的 GNN)来构建合成教师,它提取更公平和信息丰富的知识来指导 GNN 学生的学习。在几个基准数据集上的实验表明,不需要获取人口统计信息的 FairGKD 在保持效用的同时,显著提高了 GNN 的公平性。 code 0
Dance with Labels: Dual-Heterogeneous Label Graph Interaction for Multi-intent Spoken Language Understanding Zhihong Zhu, Xuxin Cheng, Hongxiang Li, Yaowei Li, Yuexian Zou Peking Univ, Sch ECE, Beijing, Peoples R China Multi-intent spoken language understanding (SLU) has garnered increasing attention since it can handle complex utterances expressing multiple intents in real-world scenarios. However, existing joint models are disturbed by label statistical frequencies, or adopt homogeneous graphs to capture interactions between the different types (e.g., intent and slot) of label nodes, thereby limiting the performance. To overcome these limitations, we propose Dual Heterogeneous Graph Label Interaction for multi-intent SLU, named DHLG. Concretely, we propose a global static heterogeneous label graph interaction layer to model both intra- and inter-label statistical dependencies across the entire training corpus. Based on this, we introduce a local dynamic heterogeneous label graph layer to further facilitate adaptive interactions between intents and slots for each utterance. Extensive experiments and analyses on two widely-used benchmark datasets demonstrate the superiority of our proposed DHLG over state-of-the-art methods. 多意图口语理解(SLU)由于能够在现实场景中处理表达多种意图的复杂话语而受到越来越多的关注。然而,现有的联合模型受到标签统计频率的干扰,或采用同质图来捕捉标签节点的不同类型(例如意图和时隙)之间的相互作用,从而限制了性能。为了克服这些限制,我们提出了针对多意图 SLU 的双异构图标交互,命名为 DHLG。具体来说,我们提出了一个全局静态异构标签图交互层来模拟整个训练语料库中标签内和标签间的统计依赖关系。在此基础上,我们引入了一个局部动态异构标签图层,以进一步促进每个语句的意图和时隙之间的自适应交互。对两个广泛使用的基准数据集的大量实验和分析表明,我们提出的 DHLG 方法优于最先进的方法。 code 0
Wildfire: A Twitter Social Sensing Platform for Layperson Zeyu Zhang, Zhengyuan Zhu, Haiqi Zhang, Foram Patel, Josue Caraballo, Patrick Hennecke, Chengkai Li Univ Texas Arlington, Arlington, TX 76019 USA We present Wildfire, an innovative social sensing platform designed for laypersons. The goal is to support users in conducting social sensing tasks using Twitter data without programming and data analytics skills. Existing open-source and commercial social sensing tools only support data collection using simple keyword-based or account-based search. On the contrary, Wildfire employs a heuristic graph exploration method to selectively expand the collected tweet-account graph in order to further retrieve more task-relevant tweets and accounts. This approach allows for the collection of data to support complex social sensing tasks that cannot be met with a simple keyword search. In addition, Wildfire provides a range of analytic tools, such as text classification, topic generation, and entity recognition, which can be crucial for tasks such as trend analysis. The platform also provides a web-based user interface for creating and monitoring tasks, exploring collected data, and performing analytics. 我们介绍 Wildfire,一个为外行人设计的创新的社会感知平台。其目标是支持用户在没有编程和数据分析技能的情况下使用 Twitter 数据执行社会感应任务。现有的开源和商业社会感应工具只支持使用简单的基于关键字或基于帐户的搜索进行数据收集。相反,Wildfire 采用启发式图形搜索方法,有选择地扩展收集到的 tweet 帐户图,以进一步检索更多与任务相关的 tweet 和帐户。这种方法允许收集数据,以支持复杂的社会感应任务,不能满足一个简单的关键字搜索。此外,Wildfire 还提供了一系列分析工具,如文本分类、主题生成和实体识别,这些工具对于趋势分析等任务至关重要。该平台还提供了一个基于网络的用户界面,用于创建和监视任务、探索收集的数据和执行分析。 code 0
Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery Bowen Jin, Yu Zhang, Sha Li, Jiawei Han code 0
Practical Bandits: An Industry Perspective Bram van den Akker, Olivier Jeunen, Ying Li, Ben London, Zahra Nazari, Devesh Parekh The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty. Because many business metrics can be viewed as rewards (a.k.a. utilities) that result from actions, bandit algorithms have seen a large and growing interest from industrial applications, such as search, recommendation and advertising. Indeed, with the bandit lens comes the promise of direct optimisation for the metrics we care about. Nevertheless, the road to successfully applying bandits in production is not an easy one. Even when the action space and rewards are well-defined, practitioners still need to make decisions regarding multi-arm or contextual approaches, on- or off-policy setups, delayed or immediate feedback, myopic or long-term optimisation, etc. To make matters worse, industrial platforms typically give rise to large action spaces in which existing approaches tend to break down. The research literature on these topics is broad and vast, but this can overwhelm practitioners, whose primary aim is to solve practical problems, and therefore need to decide on a specific instantiation or approach for each project. This tutorial will take a step towards filling that gap between the theory and practice of bandits. Our goal is to present a unified overview of the field and its existing terminology, concepts and algorithms -- with a focus on problems relevant to industry. We hope our industrial perspective will help future practitioners who wish to leverage the bandit paradigm for their application. 土匪模型为需要在不确定条件下进行决策的问题提供了统一的建模框架。由于许多业务指标可以被视为行动带来的回报(又称公用事业) ,盗贼算法已经引起了工业应用程序(如搜索、推荐和广告)越来越大的兴趣。事实上,随着“强盗镜头”的出现,我们对所关心的指标进行直接优化的希望也随之而来。然而,成功地把土匪运用到生产中并不是一条容易的道路。即使行动空间和奖励是明确定义的,从业者仍然需要做出决定,关于多臂或上下文方法,在-或非政策设置,延迟或即时反馈,短视或长期优化,等等。更糟糕的是,工业平台通常会产生巨大的行动空间,其中现有的方法往往会崩溃。关于这些主题的研究文献是广泛和浩瀚的,但这可能会压倒从业人员,他们的主要目的是解决实际问题,因此需要决定具体的实例或方法为每个项目。本教程将朝着填补土匪的理论和实践之间的差距迈出一步。我们的目标是提出该领域及其现有术语、概念和算法的统一概述,重点是与行业相关的问题。我们希望我们的行业视角能够帮助那些希望利用强盗范例来应用它们的未来从业者。 code 0
Automated Tailoring of Large Language Models for Industry-Specific Downstream Tasks Shreya Saxena, Siva Prasad, Muneeswaran I, Advaith Shankar, Varun V, Saisubramaniam Gopalakrishnan, Vishal Vaddina Quantiphi Analyt Solut Pvt Ltd, Appl Res, Bengaluru, India Foundational Large Language Models (LLMs) are pre-trained generally on huge corpora encompassing broad subjects to become versatile and generalize to future downstream tasks. However, their effectiveness falls short when dealing with tasks that are highly specialized to a specific use case. Even when adopting current prompt engineering techniques like few-shot or Chain-of-Thought reasoning prompts, the required level of results is not yet achievable directly with foundational models alone. The alternative approach is to fine-tune the LLM, but a common challenge is the limited availability of task-specific training data. In this talk, we will introduce an end-to-end automated framework to tailor a model to specific downstream tasks for an industry where the first step is to generate task-specific custom data from unstructured documents. Next, we will discuss our optimized distributed training pipeline for fine-tuning LLMs on the generated data. Finally, we will provide an overview of the statistical metrics and customized metrics we employ for assessing the performance of the fine-tuned LLM. This automated framework alleviates the burden of manual adjustments and streamlines the process to provide a model that is fully customized to suit the unique requirements of any specific business use case. 基础大型语言模型(LLM)通常是在包含广泛主题的庞大语料库上进行预先培训,以使其变得多才多艺,并推广到未来的下游任务。然而,当处理特定用例的高度专门化的任务时,它们的有效性不足。即使采用了当前的快速工程技术,如少数镜头或思维链推理提示,所需的结果水平还不能直接用基础模型实现。另一种方法是对 LLM 进行微调,但一个常见的挑战是特定于任务的训练数据的可用性有限。在这个演讲中,我们将介绍一个端到端的自动化框架,为行业的特定下游任务定制模型,其中第一步是从非结构化文档生成特定于任务的自定义数据。接下来,我们将讨论优化的分布式训练流水线,以便在生成的数据上微调 LLM。最后,我们将提供用于评估微调 LLM 性能的统计指标和定制指标的概述。这个自动化框架减轻了手工调整的负担,并简化了流程,以提供一个完全定制的模型,以适应任何特定业务用例的独特需求。 code 0
Unlocking Human Curiosity Elizabeth Reid code 0
Unveiling AI-Driven Collective Action for a Worker-Centric Future Saiph Savage code 0
What I Learned from Spending a Dozen Years in the Dark Web Nicolas Christin code 0
Professional Network Matters: Connections Empower Person-Job Fit Hao Chen, Lun Du, Yuxuan Lu, Qiang Fu, Xu Chen, Shi Han, Yanbin Kang, Guangming Lu, Zi Li Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions. While existing works leverage historical or contextual information, they often disregard a crucial aspect: job seekers' social relationships in professional networks. This paper emphasizes the importance of incorporating professional networks into the Person-Job Fit model. Our innovative approach consists of two stages: (1) defining a Workplace Heterogeneous Information Network (WHIN) to capture heterogeneous knowledge, including professional connections and pre-training representations of various entities using a heterogeneous graph neural network; (2) designing a Contextual Social Attention Graph Neural Network (CSAGNN) that supplements users' missing information with professional connections' contextual information. We introduce a job-specific attention mechanism in CSAGNN to handle noisy professional networks, leveraging pre-trained entity representations from WHIN. We demonstrate the effectiveness of our approach through experimental evaluations conducted across three real-world recruitment datasets from LinkedIn, showing superior performance compared to baseline models. 在线招聘平台通常在核心服务中采用人员-工作匹配模式,自动为合适的求职者匹配合适的工作岗位。虽然现有的工作利用历史或背景信息,他们往往忽视了一个关键的方面: 求职者的社会关系在专业网络。本文强调了将职业网络融入人-工匹配模型的重要性。我们的创新方法包括两个阶段: (1)定义一个工作场所异构信息网络(WHIN)来捕获异构知识,包括使用异构图形神经网络的专业连接和各种实体的预训练表示; (2)设计一个上下文社会注意图形神经网络(CSAGNN) ,用专业连接的上下文信息补充用户缺失的信息。我们在 CSAGNN 中引入了一种针对具体工作的注意机制来处理嘈杂的专业网络,利用来自 WHIN 的预先训练的实体表示。我们通过对来自 LinkedIn 的三个实际招聘数据集进行实验性评估,展示了我们的方法的有效性,与基线模型相比表现出更好的性能。 code 0
Empathetic Response Generation with Relation-aware Commonsense Knowledge Changyu Chen, Yanran Li, Chen Wei, Jianwei Cui, Bin Wang, Rui Yan Renmin Univ China, Gaoling Sch AI GSAI, Beijing, Peoples R China; Xiaomi AI Lab, Beijing, Peoples R China The development of AI in mental health is a growing field with potential global impact. Machine agents need to perceive users' mental states and respond empathically. Since mental states are often latent and implicit, building such chatbots requires both knowledge learning and knowledge utilization. Our work contributes to this by developing a chatbot that aims to recognize and empathetically respond to users' mental states. We introduce a Conditional Variational Autoencoders (CVAE)-based model that utilizes relation-aware commonsense knowledge to generate responses. This model, while not a replacement for professional mental health support, demonstrates promise in offering informative and empathetic interactions in a controlled environment. On the dataset EmpatheticDialogues, we compare with several SOTA methods and empirically validate the effectiveness of our approach on response informativeness and empathy exhibition. Detailed analysis is also given to demonstrate the learning capability as well as model interpretability. Our code is accessible at http://github.com/ChangyuChen347/COMET-VAE. 人工智能在心理健康领域的发展是一个具有潜在全球影响的新兴领域。机器代理人需要感知用户的心理状态,并以移情的方式作出反应。由于心理状态往往是潜在的和隐含的,建立这样的聊天机器人既需要知识学习和知识利用。我们的工作有助于开发一个聊天机器人,旨在识别和同情地回应用户的心理状态。我们介绍了一个基于条件变分自动编码器(CVAE)的模型,该模型利用关系感知的常识知识来产生响应。这种模式,虽然不能取代专业的心理健康支持,表明在控制环境中提供信息和移情互动的前景。在同理心对话的数据集上,我们比较了几种 SOTA 方法,并通过实验验证了该方法在反应信息量和同理心表现上的有效性。文中还对模型的学习能力和模型的可解释性进行了详细的分析。我们的代码可以在 http://github.com/changyuchen347/comet-vae 访问。 code 0
Exploiting Duality in Open Information Extraction with Predicate Prompt Zhen Chen, Jingping Liu, Deqing Yang, Yanghua Xiao, Huimin Xu, Zongyu Wang, Rui Xie, Yunsen Xian Open information extraction (OpenIE) aims to extract the schema-free triplets in the form of (subject, predicate, object) from a given sentence. Compared with general information extraction (IE), OpenIE poses more challenges for the IE models, especially when multiple complicated triplets exist in a sentence. To extract these complicated triplets more effectively, in this paper we propose a novel generative OpenIE model, namely DualOIE, which achieves a dual task at the same time as extracting some triplets from the sentence, i.e., converting the triplets into the sentence. Such dual task encourages the model to correctly recognize the structure of the given sentence and thus is helpful to extract all potential triplets from the sentence. Specifically, DualOIE extracts the triplets in two steps: 1) first extracting a sequence of all potential predicates, 2) then using the predicate sequence as a prompt to induce the generation of triplets. Our experiments on two benchmarks and our dataset constructed from Meituan demonstrate that DualOIE achieves the best performance among the state-of-the-art baselines. Furthermore, the online A/B test on Meituan platform shows that 0.93% improvement of QV-CTR and 0.56% improvement of UV-CTR have been obtained when the triplets extracted by DualOIE were leveraged in Meituan's search system. 开放式信息抽取(OpenIE)旨在从一个给定的句子中以主语、谓语、宾语的形式提取出与图式无关的三联词。与一般信息抽取(IE)相比,OpenIE 对 IE 模型提出了更多的挑战,尤其是当一个句子中存在多个复杂的三联体时。为了更有效地提取这些复杂的三联体,本文提出了一种新的生成式 OpenIE 模型—— DualOIE,该模型在从句子中提取三联体的同时完成双重任务,即将三联体转换为句子。这种双重任务鼓励模型正确识别给定句子的结构,因此有助于从句子中提取所有潜在的三联体。具体来说,DualOIE 通过两个步骤提取三联体: 1)首先提取所有潜在谓词的序列,2)然后使用谓词序列作为提示来诱导三联体的生成。我们在两个基准上的实验和从美团构建的数据集表明,DualoIE 在最先进的基准中取得了最好的性能。此外,在美团平台上进行的在线 A/B 测试显示,在美团搜索系统中利用 DualOIE 提取的三联体,QV-CTR 和 UV-CTR 分别提高了0.93% 和0.56% 。 code 0
Overlapping and Robust Edge-Colored Clustering in Hypergraphs Alex Crane, Brian Lavallee, Blair D. Sullivan, Nate Veldt A recent trend in data mining has explored (hyper)graph clustering algorithms for data with categorical relationship types. Such algorithms have applications in the analysis of social, co-authorship, and protein interaction networks, to name a few. Many such applications naturally have some overlap between clusters, a nuance which is missing from current combinatorial models. Additionally, existing models lack a mechanism for handling noise in datasets. We address these concerns by generalizing Edge-Colored Clustering, a recent framework for categorical clustering of hypergraphs. Our generalizations allow for a budgeted number of either (a) overlapping cluster assignments or (b) node deletions. For each new model we present a greedy algorithm which approximately minimizes an edge mistake objective, as well as bicriteria approximations where the second approximation factor is on the budget. Additionally, we address the parameterized complexity of each problem, providing FPT algorithms and hardness results. 数据挖掘最近的一个趋势是探索具有分类关系类型的数据的(超)图聚类算法。这样的算法在分析社会、合著者和蛋白质相互作用网络等方面都有应用。许多这样的应用程序在集群之间自然有一些重叠,这是当前组合模型所缺少的细微差别。此外,现有的模型缺乏处理数据集中噪声的机制。我们通过推广边彩色聚类来解决这些问题,边彩色聚类是一个最新的超图分类聚类框架。我们的一般化允许预算中的数量(a)重叠的集群分配或(b)节点删除。对于每一个新的模型,我们提出了一个贪婪算法,近似最小化边缘错误的目标,以及双准则近似,其中第二近似因子的预算。此外,我们解决每个问题的参数化复杂度,提供 FPT 算法和硬度结果。 code 0
TemporalMed: Advancing Medical Dialogues with Time-Aware Responses in Large Language Models Yuyan Chen, Jin Zhao, Zhihao Wen, Zhixu Li, Yanghua Xiao Fudan Univ, Sch Comp Sci, Shanghai Key Lab Data Sci, Shanghai, Peoples R China; Singapore Management Univ, Singapore, Singapore Medical dialogue models predominantly emphasize generating coherent and clinically accurate responses. However, in many clinical scenarios, time plays a pivotal role, often dictating subsequent patient management and interventions. Recognizing the latent importance of temporal dynamics, this paper introduces a novel dimension to medical dialogues: timestamps. We advocate that the integration of time-sensitive directives can profoundly impact medical advice, using an illustrative example of post-surgery care with and without timestamps. Our contributions are three-fold: Firstly, we highlight the intrinsic significance of timestamps in medical conversations, marking a paradigm shift in dialogue modeling. Secondly, we present an innovative dataset and framework explicitly tailored for time-stamped medical dialogues, facilitating the model to not only provide medical counsel but also chronologically outline care regimens. Lastly, empirical evaluations indicate our method's proficiency in time-stamped tasks and reveal an uptick in performance in broader medical Q&A domains. Through our endeavors, we aspire to set new benchmarks in patient-centric and time-sensitive medical dialogue systems. 医学对话模型主要强调产生连贯和临床准确的反应。然而,在许多临床情况下,时间起着关键作用,往往决定随后的病人管理和干预。认识到时间动力学的潜在重要性,本文介绍了医学对话的一个新的维度: 时间戳。我们主张,时间敏感指示的整合可以深刻影响医疗建议,使用一个例子说明手术后护理有和没有时间戳。我们的贡献有三个方面: 首先,我们强调了时间戳在医学对话中的内在意义,标志着对话建模的范式转变。其次,我们提出了一个创新的数据集和框架,明确定制的时间戳医疗对话,促进模型不仅提供医疗咨询,而且按时间顺序概述护理方案。最后,实证评估表明,我们的方法在时间戳任务的熟练程度,并揭示了在更广泛的医疗问答领域的表现上升。通过我们的努力,我们渴望在以病人为中心和时间敏感的医疗对话系统中建立新的基准。 code 0
CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking Shohreh Deldari, Dimitris Spathis, Mohammad Malekzadeh, Fahim Kawsar, Flora D. Salim, Akhil Mathur Limited availability of labeled data for machine learning on multimodal time-series extensively hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without relying on labels. However, existing SSL methods require expensive computations of negative pairs and are typically designed for single modalities, which limits their versatility. We introduce CroSSL (Cross-modal SSL), which puts forward two novel concepts: masking intermediate embeddings produced by modality-specific encoders, and their aggregation into a global embedding through a cross-modal aggregator that can be fed to down-stream classifiers. CroSSL allows for handling missing modalities and end-to-end cross-modal learning without requiring prior data preprocessing for handling missing inputs or negative-pair sampling for contrastive learning. We evaluate our method on a wide range of data, including motion sensors such as accelerometers or gyroscopes and biosignals (heart rate, electroencephalograms, electromyograms, electrooculograms, and electrodermal) to investigate the impact of masking ratios and masking strategies for various data types and the robustness of the learned representations to missing data. Overall, CroSSL outperforms previous SSL and supervised benchmarks using minimal labeled data, and also sheds light on how latent masking can improve cross-modal learning. Our code is open-sourced a https://github.com/dr-bell/CroSSL 多模态时间序列的机器学习标记数据的有限可用性广泛地阻碍了该领域的进展。自监督学习(SSL)是一种很有前途的不依赖于标签的数据表示学习方法。然而,现有的 SSL 方法需要对负数对进行昂贵的计算,并且通常针对单模式设计,这限制了它们的通用性。提出了两个新概念: 掩盖特定模态编码器产生的中间嵌入,以及通过一个可以提供给下游分类器的跨模态聚合器将其聚合为一个全局嵌入。CroSSL 允许处理缺失模式和端到端跨模式学习,而不需要事先数据预处理来处理缺失输入或负对抽样来进行对比学习。我们在广泛的数据上评估我们的方法,包括运动传感器如加速度计或陀螺仪和生物信号(心率,脑电图,肌电图,眼电图和皮肤电图) ,以研究掩蔽比率和掩蔽策略对各种数据类型的影响以及学习表示对缺失数据的鲁棒性。总的来说,使用最少的标记数据,CroSSL 的性能优于以前的 SSL 和监督基准测试,并且还揭示了潜在掩蔽如何改善跨模态学习。我们的代码是开源的 https://github.com/dr-bell/crossl code 0
Guardian: Guarding against Gradient Leakage with Provable Defense for Federated Learning Mingyuan Fan, Yang Liu, Cen Chen, Chengyu Wang, Minghui Qiu, Wenmeng Zhou ByteDance, Shanghai, Peoples R China; Alibaba Grp, Hangzhou, Peoples R China; East China Normal Univ, Sch Data Sci & Engn, Shanghai, Peoples R China; Xidian Univ, Xian, Shaanxi, Peoples R China Federated learning is a privacy-focused learning paradigm, which trains a global model with gradients uploaded from multiple participants, circumventing explicit exposure of private data. However, previous research of gradient leakage attacks suggests that gradients alone are sufficient to reconstruct private data, rendering the privacy protection mechanism of federated learning unreliable. Existing defenses commonly craft transformed gradients based on ground-truth gradients to obfuscate the attacks, but often are less capable of maintaining good model performance together with satisfactory privacy protection. In this paper, we propose a novel yet effective defense framework named guarding against gradient leakage (Guardian) that produces transformed gradients by jointly optimizing two theoretically-derived metrics associated with gradients for performance maintenance and privacy protection. In this way, the transformed gradients produced via Guardian can achieve minimal privacy leakage in theory with the given performance maintenance level. Moreover, we design an ingenious initialization strategy for faster generation of transformed gradients to enhance the practicality of Guardian in real-world applications, while demonstrating theoretical convergence of Guardian to the performance of the global model. Extensive experiments on various tasks show that, without sacrificing much accuracy, Guardian can effectively defend state-of-the-art gradient leakage attacks, compared with the slight effects of baseline defense approaches. 联合学习是一种以隐私为中心的学习范式,它通过从多个参与者上传的梯度来训练一个全球模型,从而避免显性暴露私人数据。然而,以往对梯度泄漏攻击的研究表明,梯度本身就足以重建私有数据,使得联邦学习的隐私保护机制变得不可靠。现有的防御系统通常基于地面真实度梯度进行变换,以模糊攻击,但往往不能保持良好的模型性能和令人满意的隐私保护。在本文中,我们提出了一个新颖而有效的防御框架,即防止梯度泄漏(Guardian) ,该框架通过联合优化两个理论导出的与梯度相关的指标,从而产生转换的梯度,用于性能维护和隐私保护。这样,在给定的性能维护水平下,通过 Guardian 产生的变换梯度可以在理论上实现最小的隐私泄漏。此外,我们设计了一个巧妙的初始化策略,可以更快地生成转换后的梯度,以提高 Guardian 在实际应用中的实用性,同时证明了 Guardian 的理论收敛性与全局模型的性能。在各种任务上的大量实验表明,与基线防御方法的轻微效果相比,Guardian 可以在不牺牲太多精确性的情况下有效防御最先进的梯度泄漏攻击。 code 0
TTC-QuAli: A Text-Table-Chart Dataset for Multimodal Quantity Alignment Haoyu Dong, Haochen Wang, Anda Zhou, Yue Hu Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China; Univ Edinburgh, Beijing, Peoples R China; Peking Univ, Beijing, Peoples R China In modern documents, numerical information is often presented using multimodal formats such as text, tables, and charts. However, the heterogeneity of these sources poses a challenge for machines attempting to jointly read and understand the numerical semantics conveyed through text, tables, and charts. In this paper, we introduce a multimodal dataset called Text-Table-Chart Quantity Alignment (TTC-QuAli). This dataset is designed to facilitate a new task that involves linking related quantities across text, tables, and charts. TTC-QuAli is a comprehensive dataset that contains 4,498 quantities in text, aligned with 1,086 chart images and 1,503 tables from real-world statistical reports. It is the first dataset to provide high-quality annotations for linking quantities across multiple modalities, and it includes challenging composite (aggregated/calculated) quantity linking. To address the challenge of bridging representation gaps between different modalities and capturing their shared contextual semantic meaning, we introduce ConTTC, a novel transformer-based cross-modal contrastive learning architecture. This is the first architecture to jointly model text, tables, and charts, and contrastive learning is employed for multimodal quantity linking towards unified representation learning. Our experiments demonstrate that TTC-QuAli presents a significant challenge for existing baselines and serves as a valuable benchmark for future research. Experiment results show that ConTTC significantly outperforms all baseline methods. 在现代文档中,数值信息通常使用文本、表格和图表等多模态格式表示。然而,这些来源的异质性给试图共同阅读和理解通过文本、表格和图表传达的数字语义的机器带来了挑战。本文介绍了一个多模态数据集 TTC-QuAli。这个数据集旨在促进一项新任务,该任务涉及跨文本、表格和图表链接相关数量。TTC-QuAli 是一个全面的数据集,包含4,498个数量的文本,与1,086个图表图像和1,503个来自真实世界统计报告的表格对齐。它是第一个提供高质量注释的数据集,用于跨多种模式的连接数量,它包括具有挑战性的组合(聚合/计算)数量连接。为了解决不同模式之间的表征差异以及获取它们共享的语境语义的问题,我们引入了一种新的基于转换器的跨模式对比学习架构 ConTTC。这是第一个联合建模文本、表格和图表的体系结构,对比学习被用于多模态数量连接到统一的表示学习。我们的实验表明,TTC-QuAli 对现有的基线提出了重大的挑战,并为未来的研究提供了有价值的基准。实验结果表明,ConTTC 方法的性能明显优于所有基线方法。 code 0
DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting Tianyu Fu, Chiyue Wei, Yu Wang, Rex Ying Subgraph counting is the problem of counting the occurrences of a given query graph in a large target graph. Large-scale subgraph counting is useful in various domains, such as motif counting for social network analysis and loop counting for money laundering detection on transaction networks. Recently, to address the exponential runtime complexity of scalable subgraph counting, neural methods are proposed. However, existing neural counting approaches fall short in three aspects. Firstly, the counts of the same query can vary from zero to millions on different target graphs, posing a much larger challenge than most graph regression tasks. Secondly, current scalable graph neural networks have limited expressive power and fail to efficiently distinguish graphs in count prediction. Furthermore, existing neural approaches cannot predict the occurrence position of queries in the target graph. Here we design DeSCo, a scalable neural deep subgraph counting pipeline, which aims to accurately predict the query count and occurrence position on any target graph after one-time training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs. The technique greatly reduces the count variation while guaranteeing no missing or double-counting. Secondly, neighborhood counting uses an expressive subgraph-based heterogeneous graph neural network to accurately perform counting in each neighborhood. Finally, gossip propagation propagates neighborhood counts with learnable gates to harness the inductive biases of motif counts. DeSCo is evaluated on eight real-world datasets from various domains. It outperforms state-of-the-art neural methods with 137x improvement in the mean squared error of count prediction, while maintaining the polynomial runtime complexity. 子图计数是在大目标图中计算给定查询图的出现次数的问题。大规模子图计数在各个领域都很有用,例如社交网络分析中的主题计数和交易网络中的洗钱检测中的循环计数。近年来,为了解决可伸缩子图计数的指数运行时复杂性问题,提出了神经网络方法。然而,现有的神经计数方法在三个方面存在不足。首先,在不同的目标图上,同一个查询的计数可能从零到数百万不等,这比大多数图形回归任务带来了更大的挑战。其次,现有的可扩展图形神经网络表达能力有限,无法有效地区分计数预测中的图形。此外,现有的神经网络方法不能预测目标图中查询的出现位置。本文设计了一个可扩展的神经深子图计数流水线 DeSCo,其目的是在一次性训练后准确预测任意目标图上的查询计数和出现位置。首先,DeSCo 使用一种新的规范划分方法,将大目标图划分为小邻域图;。该技术大大减少了计数的变化,同时保证没有丢失或重复计数。其次,邻域计数采用基于表达子图的异构图神经网络对每个邻域进行精确计数。最后,八卦传播利用可学的门来传播邻域计数,以利用主题计数的归纳偏差。DeSCo 在来自不同领域的八个真实世界数据集上进行评估。它比最先进的神经学方法在计数预测均方差上提高了137倍,同时保持了多项式运行时的复杂性。 code 0
CausalMMM: Learning Causal Structure for Marketing Mix Modeling Chang Gong, Di Yao, Lei Zhang, Sheng Chen, Wenbin Li, Yueyang Su, Jingping Bi ; Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China In online advertising, marketing mix modeling (MMM) is employed to predict the gross merchandise volume (GMV) of brand shops and help decision-makers to adjust the budget allocation of various advertising channels. Traditional MMM methods leveraging regression techniques can fail in handling the complexity of marketing. Although some efforts try to encode the causal structures for better prediction, they have the strict restriction that causal structures are prior-known and unchangeable. In this paper, we define a new causal MMM problem that automatically discovers the interpretable causal structures from data and yields better GMV predictions. To achieve causal MMM, two essential challenges should be addressed: (1) Causal Heterogeneity. The causal structures of different kinds of shops vary a lot. (2) Marketing Response Patterns. Various marketing response patterns i.e., carryover effect and shape effect, have been validated in practice. We argue that causal MMM needs dynamically discover specific causal structures for different shops and the predictions should comply with the prior known marketing response patterns. Thus, we propose CausalMMM that integrates Granger causality in a variational inference framework to measure the causal relationships between different channels and predict the GMV with the regularization of both temporal and saturation marketing response patterns. Extensive experiments show that CausalMMM can not only achieve superior performance of causal structure learning on synthetic datasets with improvements of 5.7% similar to 7.1%, but also enhance the GMV prediction results on a representative E-commerce platform. 在网络广告中,运用营销组合模型(MMM)预测品牌商店的总商品量(GMV) ,帮助决策者调整各种广告渠道的预算分配。利用回归技术的传统 MMM 方法可能无法处理营销的复杂性。尽管有些人试图对因果结构进行编码以便更好地进行预测,但他们受到因果结构是先验已知且不可变的这一严格限制。在本文中,我们定义了一个新的因果 MMM 问题,它可以自动地从数据中发现可解释的因果结构,并且产生更好的 GMV 预测。要实现因果 MMM,必须解决两个基本问题: (1)因果异质性。不同类型商店的因果关系结构差异很大。(2)市场反应模式。各种营销反应模式,即结转效应和形态效应,已在实践中得到验证。我们认为因果 MMM 需要动态地发现不同商店的特定因果结构,并且预测应该符合先前已知的营销反应模式。因此,我们提出了“因果 MMM”,将格兰杰因果关系整合到一个变化推理框架中,以衡量不同渠道之间的因果关系,并通过调整时间和饱和营销反应模式来预测 GMV。大量的实验表明,在一个具有代表性的电子商务平台上,CausalMMM 不仅在合成数据集上取得了与7.1% 相似的5.7% 的因果结构学习优异性能,而且提高了 GMV 预测结果。 code 0
SCAD: Subspace Clustering based Adversarial Detector Xinrong Hu, Wushuan Chen, Jie Yang, Yi Guo, Xun Yao, Bangchao Wang, Junping Liu, Ce Xu Western Sydney Univ, Sch Comp Data & Math Sci, Parramatta, NSW, Australia; Wuhan Text Univ, Sch Comp Sci & Artificial Intelligence, Wuhan, Hubei, Peoples R China; Univ Wollongong, Sch Comp & Informat Technol, Wollongong, NSW, Australia Adversarial examples pose significant challenges for Natural Language Processing (NLP) model robustness, often causing notable performance degradation. While various detection methods have been proposed with the aim of differentiating clean and adversarial inputs, they often require fine-tuning with ample data, which is problematic for low-resource scenarios. To alleviate this issue, a Subspace Clustering based Adversarial Detector (termed SCAD) is proposed in this paper, leveraging a union of subspaces to model the clean data distribution. Specifically, SCAD estimates feature distribution across semantic subspaces, assigning unseen examples to the nearest one for effective discrimination. The construction of semantic subspaces does not require many observations and hence ideal for the low-resource setting. The proposed algorithm achieves detection results better than or competitive with previous state-of-the-arts on a combination of three well-known text classification benchmarks and four attacking methods. Further empirical analysis suggests that SCAD effectively mitigates the low-resource setting where clean training data is limit. 对抗性示例对自然语言处理(NLP)模型的健壮性提出了重大挑战,常常导致显著的性能下降。虽然提出了各种检测方法,目的是区分清洁投入和对抗性投入,但这些方法往往需要利用充足的数据进行微调,这对于资源匮乏的情况是有问题的。为了解决这一问题,本文提出了一种基于子空间聚类的对抗检测器(SCAD) ,利用子空间的联合来建立干净的数据分布模型。具体来说,SCAD 估计特征在语义子空间中的分布,将看不见的例子分配给最近的一个以进行有效的区分。语义子空间的构造不需要很多观察,因此对于低资源设置是理想的。该算法结合了三种常用的文本分类基准和四种攻击方法,取得了比以往更好的检测效果。进一步的实证分析表明,SCAD 有效地缓解了清洁培训数据有限的低资源环境。 code 0
Capturing Temporal Node Evolution via Self-supervised Learning: A New Perspective on Dynamic Graph Learning Lingwen Liu, Guangqi Wen, Peng Cao, Jinzhu Yang, Weiping Li, Osmar R. Zaïane Univ Alberta, Alberta Machine Intelligence Inst, Edmonton, AB, Canada; Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Liaoning, Peoples R China; Peking Univ, Sch Software & Microelect, Beijing, Peoples R China Dynamic graphs play an important role in many fields like social relationship analysis, recommender systems and medical science, as graphs evolve over time. It is fundamental to capture the evolution patterns for dynamic graphs. Existing works mostly focus on constraining the temporal smoothness between neighbor snapshots, however, fail to capture sharp shifts, which can be beneficial for graph dynamics embedding. To solve it, we assume the evolution of dynamic graph nodes can be split into temporal shift embedding and temporal consistency embedding. Thus, we propose the Self-supervised Temporal-aware Dynamic Graph representation Learning framework (STDGL) for disentangling the temporal shift embedding from temporal consistency embedding via a welldesigned auxiliary task from the perspectives of both node local and global connectivity modeling in a self-supervised manner, further enhancing the learning of interpretable graph representations and improving the performance of various downstream tasks. Extensive experiments on link prediction, edge classification and node classification tasks demonstrate STDGL successfully learns the disentangled temporal shift and consistency representations. Furthermore, the results indicate significant improvements in our STDGL over the state-of-the-art methods, and appealing interpretability and transferability owing to the disentangled node representations. 随着时间的推移,动态图在社会关系分析、推荐系统和医学科学等领域发挥着重要作用。捕捉动态图的演化模式是基础。现有的工作主要集中在限制相邻快照之间的时间平滑,但是未能捕捉到快速的移动,这有利于图动态嵌入。为了解决这个问题,我们假设动态图节点的演化可以分为时间移位嵌入和时间一致性嵌入。因此,我们提出了自监督时间感知动态图表示学习框架(STDGL) ,以自监督的方式从节点局部和全局连通性建模的角度,通过一个设计良好的辅助任务将时间移位嵌入与时间一致性嵌入分离,进一步提高可解释图表示的学习能力,改善各种下游任务的性能。在链路预测、边缘分类和节点分类任务上的大量实验表明,STDGL 成功地学习了分离时间漂移和一致性表示。此外,研究结果显示我们的 STDGL 比最先进的方法有显著的改进,并且由于分离的节点表示而具有吸引力的可解释性和可转移性。 code 0
Generative Models for Complex Logical Reasoning over Knowledge Graphs Yu Liu, Yanan Cao, Shi Wang, Qingyue Wang, Guanqun Bi Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China Answering complex logical queries over knowledge graphs (KGs) is a fundamental yet challenging task. Recently, query representation has been a mainstream approach to complex logical reasoning, making the target answer and query closer in the embedding space. However, there are still two limitations. First, prior methods model the query as a fixed vector, but ignore the uncertainty of relations on KGs. In fact, different relations may contain different semantic distributions. Second, traditional representation frameworks fail to capture the joint distribution of queries and answers, which can be learned by generative models that have the potential to produce more coherent answers. To alleviate these limitations, we propose a novel generative model, named DiffCLR, which exploits the diffusion model for complex logical reasoning to approximate query distributions. Specifically, we first devise a query transformation to convert logical queries into input sequences by dynamically constructing contextual subgraphs. Then, we integrate them into the diffusion model to execute a multi-step generative process, and a structure-enhanced self-attention is further designed for incorporating the structural features embodied in KGs. Experimental results on two benchmark datasets show our model effectively outperforms state-of-the-art methods, particularly in multi-hop chain queries with significant improvement. 回答知识图表(KGs)上的复杂逻辑查询是一个基本但具有挑战性的任务。最近,查询表示已经成为复杂逻辑推理的主流方法,使得目标回答和查询在嵌入空间中更加接近。但是,仍然存在两个限制。首先,先验方法将查询建模为一个固定的向量,但忽略了 KG 上关系的不确定性。事实上,不同的关系可能包含不同的语义分布。其次,传统的表示框架未能捕捉到查询和答案的联合分布,这可以通过生成模型学习,这些模型有可能产生更一致的答案。为了减轻这些局限性,我们提出了一种新的生成模型,名为 ddCLR,它利用复杂逻辑推理的扩散模型来近似查询分布。具体来说,我们首先设计一个查询转换,通过动态构造上下文子图将逻辑查询转换为输入序列。然后,我们将它们整合到扩散模型中,执行一个多步骤的生成过程,并进一步设计一个结构增强的自我注意,以整合幼稚园所体现的结构特征。在两个基准数据集上的实验结果表明,该模型的性能优于目前最先进的方法,尤其是在多跳链查询方面有明显的改进。 code 0
A Linguistic Grounding-Infused Contrastive Learning Approach for Health Mention Classification on Social Media Usman Naseem, Jinman Kim, Matloob Khushi, Adam G. Dunn Brunel Univ, Dept Comp Sci, London, England; Univ Sydney, Sch Comp Sci, Sydney, Australia; Univ Sydney, Sch Med Sci, Sydney, Australia Social media users use disease and symptoms words in different ways, including describing their personal health experiences figuratively or in other general discussions. The health mention classification (HMC) task aims to separate how people use terms, which is important in public health applications. Existing HMC studies address this problem using pretrained language models (PLMs). However, the remaining gaps in the area include the need for linguistic grounding, the requirement for large volumes of labelled data, and that solutions are often only tested on Twitter or Reddit, which provides limited evidence of the transportability of models. To address these gaps, we propose a novel method that uses a transformer-based PLM to obtain a contextual representation of target (disease or symptom) terms coupled with a contrastive loss to establish a larger gap between target terms' literal and figurative uses using linguistic theories. We introduce the use of a simple and effective approach for harvesting candidate instances from the broad corpus and generalising the proposed method using selftraining to address the label scarcity challenge. Our experiments on publicly available health-mention datasets from Twitter (HMC2019) and Reddit (RHMD) demonstrate that our method outperforms the state-of-the-art HMC methods on both datasets for the HMC task. We further analyse the transferability and generalisability of our method and conclude with a discussion on the empirical and ethical considerations of our study. 社交媒体用户使用疾病和症状词的方式各不相同,包括比喻性地描述他们的个人健康经历或在其他一般性讨论中使用。健康提及分类(HMC)任务旨在区分人们如何使用术语,这在公共卫生应用中非常重要。现有的 HMC 研究使用预训练语言模型(PLM)来解决这个问题。然而,这一领域仍然存在的差距包括需要语言基础、需要大量有标签的数据,以及解决方案往往只能在 Twitter 或 Reddit 上测试,因为它们提供的关于模型可移植性的证据有限。为了解决这些差距,我们提出了一种新的方法,使用基于变换器的 PLM 来获得目标(疾病或症状)术语的上下文表示,加上对比损失,使用语言学理论在目标术语的字面和比喻用法之间建立更大的差距。我们介绍了一种简单而有效的方法,用于从广泛的语料库中收集候选实例,并将提出的方法推广使用自我训练来解决标签稀缺性的挑战。我们在来自 Twitter (HMC2019)和 Reddit (RHMD)的公开可用的健康提及数据集上的实验表明,我们的方法在 HMC 任务的两个数据集上优于最先进的 HMC 方法。我们进一步分析了我们的方法的可转换性和普遍性,最后讨论了我们的研究的经验和伦理考虑。 code 0
GAD-NR: Graph Anomaly Detection via Neighborhood Reconstruction Amit Roy, Juan Shu, Jia Li, Carl Yang, Olivier Elshocht, Jeroen Smeets, Pan Li Graph Anomaly Detection (GAD) is a technique used to identify abnormal nodes within graphs, finding applications in network security, fraud detection, social media spam detection, and various other domains. A common method for GAD is Graph Auto-Encoders (GAEs), which encode graph data into node representations and identify anomalies by assessing the reconstruction quality of the graphs based on these representations. However, existing GAE models are primarily optimized for direct link reconstruction, resulting in nodes connected in the graph being clustered in the latent space. As a result, they excel at detecting cluster-type structural anomalies but struggle with more complex structural anomalies that do not conform to clusters. To address this limitation, we propose a novel solution called GAD-NR, a new variant of GAE that incorporates neighborhood reconstruction for graph anomaly detection. GAD-NR aims to reconstruct the entire neighborhood of a node, encompassing the local structure, self-attributes, and neighbor attributes, based on the corresponding node representation. By comparing the neighborhood reconstruction loss between anomalous nodes and normal nodes, GAD-NR can effectively detect any anomalies. Extensive experimentation conducted on six real-world datasets validates the effectiveness of GAD-NR, showcasing significant improvements (by up to 30% in AUC) over state-of-the-art competitors. The source code for GAD-NR is openly available. Importantly, the comparative analysis reveals that the existing methods perform well only in detecting one or two types of anomalies out of the three types studied. In contrast, GAD-NR excels at detecting all three types of anomalies across the datasets, demonstrating its comprehensive anomaly detection capabilities. 图形异常检测(GAD)是一种识别图形中异常节点的技术,用于寻找网络安全、欺诈检测、社交媒体垃圾邮件检测以及其他领域的应用程序。GAD 的一种常用方法是图形自动编码器(GAE) ,它将图形数据编码成节点表示形式,并通过评估基于这些表示形式的图形的重构质量来识别异常。然而,现有的 GAE 模型主要针对直接链路重构进行了优化,导致图中连接的节点在潜在空间中聚集。因此,他们擅长探测集群式结构异常,但与更复杂的结构异常斗争,不符合集群。为了解决这个问题,我们提出了一种新的解决方案,称为 GAD-NR,这是一种新的 GAE 变体,它结合了图形异常检测的邻域重建。GAD-NR 的目标是在相应节点表示的基础上重构节点的整个邻域,包括节点的局部结构、自身属性和邻居属性。通过比较异常节点和正常节点的邻域重构损失,GAD-NR 可以有效地检测任何异常。在六个真实世界的数据集上进行的广泛实验验证了 GAD-NR 的有效性,显示了与最先进的竞争对手相比的显著改进(AUC 的改进幅度高达30%)。GAD-NR 的源代码是公开的。重要的是,比较分析表明,现有的方法只有在检测一个或两个类型的异常研究的三种类型中执行良好。相比之下,GAD-NR 在检测数据集中所有三种类型的异常方面表现出色,展示了其全面的异常检测能力。 code 0
Ad-load Balancing via Off-policy Learning in a Content Marketplace Hitesh Sagtani, Madan Gopal Jhawar, Rishabh Mehrotra, Olivier Jeunen Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and revenue while maintaining a satisfactory user experience. This requires the optimization of conflicting objectives, such as user satisfaction and ads revenue. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. In this paper, we present an approach that leverages off-policy learning and evaluation from logged bandit feedback. We start by presenting a motivating analysis of the ad-load balancing problem, highlighting the conflicting objectives between user satisfaction and ads revenue. We emphasize the nuances that arise due to user heterogeneity and the dependence on the user's position within a session. Based on this analysis, we define the problem as determining the optimal ad-load for a particular feed fetch. To tackle this problem, we propose an off-policy learning framework that leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and Doubly Robust (DR) to learn and estimate the policy values using offline collected stochastic data. We present insights from online A/B experiments deployed at scale across over 80 million users generating over 200 million sessions, where we find statistically significant improvements in both user satisfaction metrics and ads revenue for the platform. 广告负载平衡是在线广告系统中的一个关键挑战,特别是在社交媒体平台的背景下,其目标是最大限度地提高用户参与度和收入,同时保持令人满意的用户体验。这需要优化相互冲突的目标,如用户满意度和广告收入。传统的广告负载平衡方法依赖于静态分配策略,不能适应用户偏好和上下文因素的变化。在本文中,我们提出了一种方法,利用非政策学习和评估的日志土匪反馈。我们首先对广告负载平衡问题进行了激励性分析,强调了用户满意度和广告收入之间的矛盾目标。我们强调由于用户异质性和对用户在会话中位置的依赖性而产生的细微差别。在此基础上,我们将该问题定义为确定特定提要的最佳广告负载。为了解决这个问题,我们提出了一个非策略学习框架,利用无偏估计量,如逆倾向评分(IPS)和双稳健(DR)来学习和估计使用离线收集的随机数据的策略值。我们展示了在线 A/B 实验的深刻见解,这些实验在超过8千万用户中进行,产生了超过2亿次的会话,我们发现在用户满意度指标和平台广告收入方面都有统计学意义上的显著改善。 code 0
ProGAP: Progressive Graph Neural Networks with Differential Privacy Guarantees Sina Sajadmanesh, Daniel GaticaPerez Graph Neural Networks (GNNs) have become a popular tool for learning on graphs, but their widespread use raises privacy concerns as graph data can contain personal or sensitive information. Differentially private GNN models have been recently proposed to preserve privacy while still allowing for effective learning over graph-structured datasets. However, achieving an ideal balance between accuracy and privacy in GNNs remains challenging due to the intrinsic structural connectivity of graphs. In this paper, we propose a new differentially private GNN called ProGAP that uses a progressive training scheme to improve such accuracy-privacy trade-offs. Combined with the aggregation perturbation technique to ensure differential privacy, ProGAP splits a GNN into a sequence of overlapping submodels that are trained progressively, expanding from the first submodel to the complete model. Specifically, each submodel is trained over the privately aggregated node embeddings learned and cached by the previous submodels, leading to an increased expressive power compared to previous approaches while limiting the incurred privacy costs. We formally prove that ProGAP ensures edge-level and node-level privacy guarantees for both training and inference stages, and evaluate its performance on benchmark graph datasets. Experimental results demonstrate that ProGAP can achieve up to 5%-10% higher accuracy than existing state-of-the-art differentially private GNNs. 图形神经网络(GNN)已经成为一种流行的图形学习工具,但它的广泛使用引起了人们对隐私的关注,因为图形数据可以包含个人或敏感信息。最近提出了差异私有 GNN 模型,以保护隐私,同时仍然允许对图形结构的数据集进行有效的学习。然而,由于图的内在结构连通性,在 GNN 中实现准确性和隐私性之间的理想平衡仍然具有挑战性。在本文中,我们提出了一个新的差分私有 GNN 称为 ProGAP,它使用一个渐进的训练方案来提高这种准确性-隐私权衡。结合聚合扰动技术以确保差分隐私,ProGAP 将一个 GNN 分割成一系列重叠的子模型,这些子模型被逐步训练,从第一个子模型扩展到完整的模型。具体来说,每个子模型都是在以前的子模型学习和缓存的私有聚合节点嵌入上进行训练的,与以前的方法相比,导致表达能力增加,同时限制了产生的隐私成本。我们正式证明了 ProGAP 在训练阶段和推理阶段都保证了边界层和节点层的隐私保护,并对其在基准图数据集上的性能进行了评估。实验结果表明,ProGAP 可以达到高达5% -10% 的准确率比现有的国家最先进的差分私有 GNN。 code 0
Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling Marija Sakota, Maxime Peyrard, Robert West Generative language models (LMs) have become omnipresent across data science. For a wide variety of tasks, inputs can be phrased as natural language prompts for an LM, from whose output the solution can then be extracted. LM performance has consistently been increasing with model size - but so has the monetary cost of querying the ever larger models. Importantly, however, not all inputs are equally hard: some require larger LMs for obtaining a satisfactory solution, whereas for others smaller LMs suffice. Based on this fact, we design a framework for Cost-Effective Language Model Choice (CELMOC). Given a set of inputs and a set of candidate LMs, CELMOC judiciously assigns each input to an LM predicted to do well on the input according to a so-called meta-model, aiming to achieve high overall performance at low cost. The cost-performance trade-off can be flexibly tuned by the user. Options include, among others, maximizing total expected performance (or the number of processed inputs) while staying within a given cost budget, or minimizing total cost while processing all inputs. We evaluate CELMOC on 14 datasets covering five natural language tasks, using four candidate LMs of vastly different size and cost. With CELMOC, we match the performance of the largest available LM while achieving a cost reduction of 63%. Via our publicly available library, researchers as well as practitioners can thus save large amounts of money without sacrificing performance. 生成语言模型(Generative language model,LMs)已经在数据科学中无处不在。对于各种各样的任务,输入可以表示为 LM 的自然语言提示,然后从 LM 的输出中提取解决方案。LM 的性能一直随着模型的大小而增加,但是查询越来越大的模型的成本也在增加。然而,重要的是,并非所有的输入都同样困难: 有些需要较大的 LM 来获得满意的解,而对于其他较小的 LM 就足够了。基于这一事实,我们设计了一个具有成本效益的语言模型选择框架(CELMOC)。给定一组输入和一组候选 LM,CELMOC 明智地将每个输入分配给一个 LM,根据所谓的元模型预测该 LM 在输入上表现良好,旨在以较低的成本实现较高的整体性能。用户可以灵活地调整成本-性能权衡。选项包括,除其他外,最大化总预期性能(或处理投入的数量) ,同时保持在给定的成本预算,或最小化总成本,同时处理所有投入。我们评估 CELMOC 的14个数据集涵盖五个自然语言任务,使用四个候选 LM 的大小和成本差异很大。与 CELMOC 合作,我们在降低成本63% 的同时达到了现有最大的长征系统的性能。通过我们的公共图书馆,研究人员和从业人员可以在不牺牲性能的情况下节省大量资金。 code 0
Causality Guided Disentanglement for Cross-Platform Hate Speech Detection Paras Sheth, Raha Moraffah, Tharindu S. Kumarage, Aman Chadha, Huan Liu Social media platforms, despite their value in promoting open discourse, are often exploited to spread harmful content. Current deep learning and natural language processing models used for detecting this harmful content overly rely on domain-specific terms affecting their capabilities to adapt to generalizable hate speech detection. This is because they tend to focus too narrowly on particular linguistic signals or the use of certain categories of words. Another significant challenge arises when platforms lack high-quality annotated data for training, leading to a need for cross-platform models that can adapt to different distribution shifts. Our research introduces a cross-platform hate speech detection model capable of being trained on one platform's data and generalizing to multiple unseen platforms. To achieve good generalizability across platforms, one way is to disentangle the input representations into invariant and platform-dependent features. We also argue that learning causal relationships, which remain constant across diverse environments, can significantly aid in understanding invariant representations in hate speech. By disentangling input into platform-dependent features (useful for predicting hate targets) and platform-independent features (used to predict the presence of hate), we learn invariant representations resistant to distribution shifts. These features are then used to predict hate speech across unseen platforms. Our extensive experiments across four platforms highlight our model's enhanced efficacy compared to existing state-of-the-art methods in detecting generalized hate speech. 社交媒体平台,尽管它们在促进开放话语方面有价值,但经常被用来传播有害内容。目前用于检测这种有害内容的深度学习和自然语言处理模型过度依赖于影响其适应可推广的仇恨语音检测能力的特定领域术语。这是因为他们往往过于狭隘地关注特定的语言信号或某些类别的词的使用。当平台缺乏用于培训的高质量注释数据时,另一个重大挑战出现了,这导致需要能够适应不同分布变化的跨平台模型。我们的研究引入了一个跨平台的仇恨语音检测模型,该模型能够在一个平台的数据上进行训练,并且能够推广到多个看不见的平台。为了实现跨平台的良好通用性,一种方法是将输入表示分解为不变的和与平台相关的特征。我们还认为,学习因果关系,在不同的环境中保持不变,可以显着帮助理解仇恨言论的不变表征。通过将输入分离成与平台相关的特征(用于预测仇恨目标)和与平台无关的特征(用于预测仇恨的存在) ,我们学习了抗分布偏移的不变表示。然后,这些特性被用来预测跨看不见的平台的仇恨言论。我们在四个平台上的广泛实验突出了我们的模型相对于现有的最先进的检测广义仇恨言论的方法的增强功效。 code 0
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, Dongmei Zhang Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, there is still much to learn about how well LLMs understand structured data, such as tables. Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark we create includes seven tasks, each with its own unique challenges, e.g., cell lookup, row retrieval, and size detection. We perform a series of evaluations on GPT-3.5 and GPT-4. We find that performance varied depending on several input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we propose self-augmentation for effective structural prompting, such as critical value / range identification using internal knowledge of LLMs. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, e.g., TabFact(↑2.31%), HybridQA(↑2.13%), SQA(↑2.72%), Feverous(↑0.84%), and ToTTo(↑5.68%). We believe that our open source benchmark and proposed prompting methods can serve as a simple yet generic selection for future research. 大型语言模型(LLM)作为解决自然语言(NL)相关任务的少量推理工具正变得越来越有吸引力。然而,关于 LLM 对结构化数据(如表)的理解程度,还有很多需要了解的地方。虽然表格可以作为序列化 LLM 的输入,但是缺乏全面的研究来检查 LLM 是否能够真正理解这些数据。在本文中,我们试图通过设计一个基准来评估 LLM 的结构理解能力(SUC)来理解这一点。我们创建的基准测试包括七个任务,每个任务都有其独特的挑战,例如单元格查找、行检索和大小检测。我们对 GPT-3.5和 GPT-4进行了一系列的评估。我们发现,性能取决于几种输入选择,包括表输入格式、内容顺序、角色提示和分区标记。基于基准评估所获得的见解,我们提出了有效结构激励的自我增强方法,例如利用 LLM 的内部知识进行临界值/范围识别。当结合精心选择的输入选项时,这些结构化的提示方法可以在各种表格任务中提高 LLM 的性能,例如 TabFact (惊2.31%) ,HybridQA (惊2.13%) ,SQA (惊2.72%) ,Feverous (惊0.84%)和 ToTTo (惊5.68%)。我们相信,我们的开源基准和提议的激励方法可以作为一个简单而通用的选择,为未来的研究。 code 0
Rethinking and Simplifying Bootstrapped Graph Latents Wangbin Sun, Jintang Li, Liang Chen, Bingzhe Wu, Yatao Bian, Zibin Zheng Graph contrastive learning (GCL) has emerged as a representative paradigm in graph self-supervised learning, where negative samples are commonly regarded as the key to preventing model collapse and producing distinguishable representations. Recent studies have shown that GCL without negative samples can achieve state-of-the-art performance as well as scalability improvement, with bootstrapped graph latent (BGRL) as a prominent step forward. However, BGRL relies on a complex architecture to maintain the ability to scatter representations, and the underlying mechanisms enabling the success remain largely unexplored. In this paper, we introduce an instance-level decorrelation perspective to tackle the aforementioned issue and leverage it as a springboard to reveal the potential unnecessary model complexity within BGRL. Based on our findings, we present SGCL, a simple yet effective GCL framework that utilizes the outputs from two consecutive iterations as positive pairs, eliminating the negative samples. SGCL only requires a single graph augmentation and a single graph encoder without additional parameters. Extensive experiments conducted on various graph benchmarks demonstrate that SGCL can achieve competitive performance with fewer parameters, lower time and space costs, and significant convergence speedup. 图形对比学习(GCL)已经成为图形自监督学习的一种典型范式,负样本通常被认为是防止模型崩溃和产生可区分表示的关键。最近的研究表明,无负样本的 GCL 可以实现最先进的性能和可扩展性的提高,引导图潜伏(BGRL)是一个突出的进步。然而,BGRL 依赖于一个复杂的体系结构来维护分散表示的能力,而使其成功的底层机制在很大程度上仍然是未知的。在本文中,我们引入了一个实例级的去相关视角来解决上述问题,并利用它作为一个跳板来揭示 BGRL 中潜在的不必要的模型复杂性。基于我们的发现,我们提出了 SGCL,一个简单而有效的 GCL 框架,利用两个连续迭代的输出作为正对,消除了负样本。SGCL 只需要一个图形扩展和一个没有附加参数的图形编码器。在各种图形基准上进行的大量实验表明,SGCL 能以较少的参数、较低的时间和空间开销以及显著的收敛速度获得具有竞争力的性能。 code 0
Temporal Blind Spots in Large Language Models Jonas Wallat, Adam Jatowt, Avishek Anand Delft Univ Technol, Dept Software Technol, Delft, Netherlands; L3S Res Ctr, Hannover, Germany; Univ Innsbruck, Dept Comp Sci, Innsbruck, Austria Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available1. 大型语言模型(LLM)由于其在各种自然语言处理任务中无与伦比的“零打击”性能,近年来受到了广泛的关注。然而,在 LLM 中使用的预训练数据往往局限于特定的语料库,导致固有的新鲜度和时间范围的限制。因此,这引起了关于 LLM 对于涉及时间意图的任务的有效性的关注。在这项研究中,我们的目标是调查通用 LLM 的潜在局限性时,部署的任务,需要一个时间的理解。我们特别关注通过三个流行的时态 QA 数据集处理实际的时态知识。具体来说,我们观察到在关于过去的详细问题上表现不佳,令人惊讶的是,对于相当新的信息。在手动和自动测试中,我们发现了多个时间错误,并描述了 QA 性能恶化的条件。我们的分析有助于理解 LLM 的局限性,并为开发能够更好地满足面向时间的任务需求的未来模型提供了有价值的见解。密码是可用的。 code 0
Efficient, Direct, and Restricted Black-Box Graph Evasion Attacks to Any-Layer Graph Neural Networks via Influence Function Binghui Wang, Minhua Lin, Tianxiang Zhou, Pan Zhou, Ang Li, Meng Pang, Hai Helen Li, Yiran Chen Penn State Univ, State Coll, PA USA; Huazhong Univ Sci & Technol, Wuhan, Peoples R China; Duke Univ, Durham, NC USA; IIT, Chicago, IL 60616 USA; Univ Maryland, College Pk, MD 20742 USA; Nanchang Univ, Nanchang, Peoples R China Graph neural network (GNN), the mainstream method to learn on graph data, is vulnerable to graph evasion attacks, where an attacker slightly perturbing the graph structure can fool trainedGNNmodels. Existing work has at least one of the following drawbacks: 1) limited to directly attack two-layer GNNs; 2) inefficient; and 3) impractical, as they need to know full or part of GNN model parameters. We address the above drawbacks and propose an influence-based efficient, direct, and restricted black-box evasion attack to any-layer GNNs. Specifically, we first introduce two influence functions, i.e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively. Then we observe that GNNs and LP are strongly connected in terms of our defined influences. Based on this, we can then reformulate the evasion attack to GNNs as calculating label influence on LP, which is inherently applicable to any-layer GNNs, while no need to know information about the internal GNN model. Finally, we propose an efficient algorithm to calculate label influence. Experimental results on various graph datasets show that, compared to state-of-the-art white-box attacks, our attack can achieve comparable attack performance, but has a 5-50x speedup when attacking two-layer GNNs. Moreover, our attack is effective to attack multi-layer GNNs1. 图形神经网络(GNN)是对图形数据进行学习的主流方法,容易受到图形规避攻击,攻击者稍微扰动图形结构就可以欺骗训练有素的 GNN 模型。现有的工作至少有以下一个缺点: 1)仅限于直接攻击两层 GNN; 2)效率低下; 3)不切实际,因为他们需要知道全部或部分 GNN 模型参数。针对上述缺点,本文提出了一种基于影响力的高效、直接、有限制的黑盒规避攻击方法。具体来说,我们首先介绍了两个影响函数,即特征标签影响和标签影响,它们分别定义在 GNN 和标签传播(LP)上。然后我们观察到 GNN 和 LP 在我们所定义的影响方面是强相关的。在此基础上,我们可以将对 GNN 的规避攻击重新表述为计算标签对 LP 的影响,这本质上适用于任意层的 GNN,而不需要知道内部 GNN 模型的信息。最后,提出了一种计算标签影响的有效算法。在各种图形数据集上的实验结果表明,与最先进的白盒攻击相比,我们的攻击可以达到相当的攻击性能,但是在攻击两层 GNN 时有5-50倍的加速效果。此外,我们的攻击是有效的攻击多层 GNNs1。 code 0
CityCAN: Causal Attention Network for Citywide Spatio-Temporal Forecasting Chengxin Wang, Yuxuan Liang, Gary Tan code 0
Distribution Consistency based Self-Training for Graph Neural Networks with Sparse Labels Fali Wang, Tianxiang Zhao, Suhang Wang Few-shot node classification poses a significant challenge for Graph Neural Networks (GNNs) due to insufficient supervision and potential distribution shifts between labeled and unlabeled nodes. Self-training has emerged as a widely popular framework to leverage the abundance of unlabeled data, which expands the training set by assigning pseudo-labels to selected unlabeled nodes. Efforts have been made to develop various selection strategies based on confidence, information gain, etc. However, none of these methods takes into account the distribution shift between the training and testing node sets. The pseudo-labeling step may amplify this shift and even introduce new ones, hindering the effectiveness of self-training. Therefore, in this work, we explore the potential of explicitly bridging the distribution shift between the expanded training set and test set during self-training. To this end, we propose a novel Distribution-Consistent Graph Self-Training (DC-GST) framework to identify pseudo-labeled nodes that are both informative and capable of redeeming the distribution discrepancy and formulate it as a differentiable optimization task. A distribution-shift-aware edge predictor is further adopted to augment the graph and increase the model's generalizability in assigning pseudo labels. We evaluate our proposed method on four publicly available benchmark datasets and extensive experiments demonstrate that our framework consistently outperforms state-of-the-art baselines. 由于标记节点和未标记节点之间的监督不足和潜在的分布转移,少镜头节点分类对图神经网络(GNN)提出了严峻的挑战。自我训练已经成为一种广泛流行的利用大量未标记数据的框架,它通过为选定的未标记节点分配伪标记来扩展训练集。人们努力发展各种基于信心、获取信息等的选择策略。然而,这些方法都没有考虑训练和测试节点集之间的分布转移。伪标记步骤可能会放大这种转变,甚至引入新的转变,从而阻碍自我训练的有效性。因此,在这项工作中,我们探索了在自我训练过程中明确地桥接扩展训练集和测试集之间的分布转移的潜力。为此,我们提出了一种新的分布一致图自训练(DC-GST)框架来识别信息量大且能够弥补分布差异的伪标记节点,并将其表述为一个可微优化任务。进一步采用分布移位感知的边缘预测器来增强图形,提高模型在分配伪标签时的泛化能力。我们在四个公开可用的基准数据集上评估了我们提出的方法,大量的实验表明,我们的框架始终优于最先进的基准。 code 0
FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes Haonan Wang, Ziwei Wu, Jingrui He Natl Univ Singapore, Sch Comp, Singapore, Singapore; Univ Illinois, Sch Informat Sci, Champaign, IL USA Empirical loss minimization during machine learning training can inadvertently introduce bias, stemming from discrimination and societal prejudices present in the data. To address the shortcomings of traditional fair machine learning methods-which often rely on sensitive information of training data or mandate significant model alterations-we present FairIF, a unique two-stage training framework. Distinctly, FairIF enhances fairness by recalibrating training sample weights using the influence function. Notably, it employs sensitive information from a validation set, rather than the training set, to determine these weights. This approach accommodates situations with missing or inaccessible sensitive training data. Our FairIF ensures fairness across demographic groups by retraining models on the reweighted data. It stands out by offering a plug-and-play solution, obviating the need for changes in model architecture or the loss function. We demonstrate that the fairness performance of FairIF is guaranteed during testing with only a minimal impact on classification performance. Additionally, we analyze that our framework adeptly addresses issues like group size disparities, distribution shifts, and class size discrepancies. Empirical evaluations on three synthetic and five real-world datasets across six model architectures confirm FairIF's efficiency and scalability. The experimental results indicate superior fairness-utility trade-offs compared to other methods, regardless of bias types or architectural variations. Moreover, the adaptability of FairIF to utilize pretrained models for subsequent tasks and its capability to rectify unfairness originating during the pretraining phase are further validated through our experiments. 在机器学习训练期间,经验性损失最小化可能无意中引入偏见,这些偏见源于数据中存在的歧视和社会偏见。为了解决传统的公平机器学习方法的缺陷——这些方法通常依赖于敏感的训练数据信息或者要求重要的模型改变——我们提出了一个独特的两阶段训练框架 FairIF。显然,FairIF 通过使用影响函数重新校准训练样本权重来提高公平性。值得注意的是,它使用来自验证集而不是训练集的敏感信息来确定这些权重。这种方法适用于缺少或无法访问敏感训练数据的情况。我们的 FairIF 通过对重新加权数据的再培训模型来确保不同人口群体之间的公平性。它通过提供即插即用的解决方案而脱颖而出,避免了对模型架构或损失函数进行更改的需要。我们证明了在测试过程中,FairIF 的公平性能得到了保证,对分类性能的影响最小。此外,我们还分析了我们的框架能够很好地解决诸如团队规模差异、分布变化和班级规模差异等问题。通过对三个合成数据集和五个实际数据集在六个模型架构上的实验评估,证实了 FairIF 的有效性和可扩展性。实验结果表明,与其他方法相比,无论偏差类型或架构变化如何,公平-效用权衡优于其他方法。此外,通过实验进一步验证了 FairIF 对预训练模型用于后续任务的适应性以及对预训练阶段产生的不公平现象的纠正能力。 code 0
NeuralReconciler for Hierarchical Time Series Forecasting Shiyu Wang Ant Grp, Hangzhou, Zhejiang, Peoples R China Time series forecasting has wide-ranging applications in business intelligence, including predicting logistics demand and estimating power consumption in a smart grid, which subsequently facilitates decision-making processes. In many real-world scenarios, such as department sales of multiple Walmart stores across different locations, time series data possess hierarchical structures with non-linear and non-Gaussian properties. Thus, the task of leveraging structural information among hierarchical time series while learning from non-linear correlations and non-Gaussian data distributions becomes crucial to enhance prediction accuracy. This paper proposes a novel approach named NeuralReconciler for Hierarchical Time Series (HTS) prediction through trainable attention-based reconciliation and Normalizing Flow (NF). The latter is used to approximate the complex (usually non-Gaussian) data distribution for multivariate time series forecasting. To reconcile the HTS data, a new flexible reconciliation strategy via the attention-based encoder decoder neural network is proposed, which is distinct from current methods that rely on strong assumptions (e.g., all forecasts being unbiased estimates and the noise distribution being Gaussian). Furthermore, using the reparameterization trick, each independent component (i.e., forecasts via NF and attention-based reconciliation) is integrated into a trainable end-to-end model. Our proposed NeuralReconciler has been extensively experimented on real-world datasets and achieved consistent state-of-the-art performance compared to well-acknowledged and advanced baselines, with a 20% relative improvement on five benchmarks. 时间序列预测在商业智能中有着广泛的应用,包括预测物流需求和估计智能电网的功耗,从而促进决策过程。在许多实际场景中,例如不同地点的多个沃尔玛商店的部门销售,时间序列数据具有具有非线性和非高斯属性的层次结构。因此,利用层次时间序列之间的结构信息,同时学习非线性相关和非高斯数据分布的任务成为提高预测精度的关键。提出了一种基于注意力协调和归一化流(NF)的层次时间序列(HTS)预测神经协调器方法。后者用于逼近多变量时间序列预测的复杂(通常是非高斯)数据分布。为了协调高温超导数据,提出了一种新的基于注意的编码器解码器神经网络的柔性协调策略,该策略不同于目前依赖于强假设的方法(例如,所有的预测都是无偏估计,噪声分布是高斯分布)。此外,使用重新参数化技巧,每个独立的组成部分(即,通过 NF 和基于注意力的协调预测)被集成到一个可训练的端到端模型中。我们提出的 NeuralReconciler 已经在真实世界的数据集上进行了广泛的实验,与公认的和先进的基线相比,取得了一致的最先进的性能,相对于5个基准有20% 的改善。 code 0
Follow the LIBRA: Guiding Fair Policy for Unified Impression Allocation via Adversarial Rewarding Xiaoyu Wang, Yonghui Guo, Bin Tan, Tao Yang, Dongbo Huang, Lan Xu, Hao Zhou, Xiangyang Li Tencent, Shenzhen, Peoples R China; Univ Sci & Technol China, Hefei, Peoples R China; Tencent Co, Shenzhen, Peoples R China The diverse advertiser demands (brand effects or immediate outcomes) lead to distinct selling (pre-agreed volumes with an underdelivery penalty or compete per auction) and pricing (fixed prices or varying bids) patterns in Guaranteed delivery (GD) and realtime bidding (RTB) advertising. This necessitates fair impression allocation to unify the two markets for promoting ad content diversity and overall revenue. Existing approaches often deprive RTB ads of equal exposure opportunities by prioritizing GD ads, and coarse-grained methods are inferior to 1) Ambiguous reward due to varied objectives and constraints of GD fulfillment and RTB utility, hindering measurement of each allocation's contribution to the global interests; 2) Intensified competition by the coexistence of GD and RTB ads, complicating their mutual relationships; 3) Policy degradation caused by evolving user traffic and bid landscape, requiring adaptivity to distribution shifts. We propose LIBRA, a generative-adversarial framework that unifies GD and RTB ads through request-level modeling. To guide the generative allocator, we solve convex optimization on historical data to derive hindsight optimal allocations that balance fairness and utility. We then train a discriminator to distinguish the generated actions from these solved latent expert policy's demonstrations, providing an integrated reward to align LIBRA with the optimal fair policy. LIBRA employs a self-attention encoder to capture the competitive relations among varying amounts of candidate ads per allocation. Further, it enhances the discriminator with information bottlenecks-based summarizer against overfitting to irrelevant distractors in the ad environment. LIBRA adopts a decoupled structure, where the offline discriminator continuously finetunes with newly-coming allocations and periodically guides the online allocation policy's updates to accommodate online dynamics. LIBRA has been deployed on the Tencent advertising system for over four months, with extensive experiments conducted. Online A/B tests demonstrate significant lifts in ad income (3.17%), overall click-through rate (1.56%), and cost-per-mille (3.20%), contributing a daily revenue increase of hundreds of thousands of RMB. 不同的广告客户需求(品牌效应或直接结果)导致不同的销售(预先商定的数量与交付不足的惩罚或竞争每拍卖)和定价(固定价格或不同出价)模式的保证交付(GD)和实时投标(RTB)广告。这需要公平的印象分配,以统一两个市场,促进广告内容的多样性和总体收入。现有的方法通常通过优先考虑 GD 广告来剥夺 RTB 广告的平等曝光机会,粗粒度的方法不如1)由于 GD 实现和 RTB 效用的不同目标和限制而产生的模糊奖励,阻碍了衡量每个分配对全球利益的贡献; 2)由于 GD 和 RTB 广告共存而加剧的竞争,使它们的相互关系复杂化; 3)由于用户流量和投标环境的变化而导致的政策退化,需要适应分配变化。我们提出了 LIBRA,一个通过请求级建模统一 GD 和 RTB 广告的生成对抗框架。为了指导生成分配器,我们解决历史数据的凸优化,得出事后的最优分配,平衡公平和效用。然后我们训练一个鉴别器来区分这些已解决的潜在专家政策的演示所产生的行为,提供一个综合的奖励来使 LIBRA 与最优公平政策保持一致。LIBRA 使用自我关注编码器来捕捉每次分配中不同数量的候选广告之间的竞争关系。进一步提高了基于信息瓶颈汇总器的识别能力,避免了广告环境中对不相关干扰物的过度拟合。LIBRA 采用解耦结构,离线鉴别器不断调整新的分配,并定期引导在线分配策略的更新以适应在线动态。LIBRA 已经在腾讯广告系统上部署了4个多月,并进行了广泛的实验。在线 a/b 测试显示,广告收入(3.17%)、总点进率(1.56%)和每公里成本(3.20%)都有显著提升,每天的收入增长达到数十万元人民币。 code 0
Continuous-time Autoencoders for Regular and Irregular Time Series Imputation Hyowon Wi, Yehjin Shin, Noseong Park Time series imputation is one of the most fundamental tasks for time series. Real-world time series datasets are frequently incomplete (or irregular with missing observations), in which case imputation is strongly required. Many different time series imputation methods have been proposed. Recent self-attention-based methods show the state-of-the-art imputation performance. However, it has been overlooked for a long time to design an imputation method based on continuous-time recurrent neural networks (RNNs), i.e., neural controlled differential equations (NCDEs). To this end, we redesign time series (variational) autoencoders based on NCDEs. Our method, called continuous-time autoencoder (CTA), encodes an input time series sample into a continuous hidden path (rather than a hidden vector) and decodes it to reconstruct and impute the input. In our experiments with 4 datasets and 19 baselines, our method shows the best imputation performance in almost all cases. 时间序列插补是时间序列最基本的任务之一。真实世界的时间序列数据集经常是不完整的(或不规则的,缺少观察值) ,在这种情况下强烈需要插补。人们提出了许多不同的时间序列插补方法。最近的基于自我注意的方法显示了最先进的插补性能。然而,基于连续时间递归神经网络(RNN)的插补方法,即神经控制微分方程(NCDE) ,长期以来一直被忽视。为此,我们重新设计了基于 NCDE 的时间序列(变分)自动编码器。我们的方法称为连续时间自动编码器(CTA) ,将输入的时间序列样本编码成一个连续的隐藏路径(而不是一个隐藏向量) ,并对其进行解码以重建和计算输入。在我们的4个数据集和19个基线的实验中,我们的方法在几乎所有情况下都显示了最佳的插补性能。 code 0
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu The growing prevalence of visually rich documents, such as webpages and scanned/digital-born documents (images, PDFs, etc.), has led to increased interest in automatic document understanding and information extraction across academia and industry. Although various document modalities, including image, text, layout, and structure, facilitate human information retrieval, the interconnected nature of these modalities presents challenges for neural networks. In this paper, we introduce WebLM, a multimodal pre-training network designed to address the limitations of solely modeling text and structure modalities of HTML in webpages. Instead of processing document images as unified natural images, WebLM integrates the hierarchical structure of document images to enhance the understanding of markup-language-based documents. Additionally, we propose several pre-training tasks to model the interaction among text, structure, and image modalities effectively. Empirical results demonstrate that the pre-trained WebLM significantly surpasses previous state-of-the-art pre-trained models across several webpage understanding tasks. The pre-trained models and code are available at https://github.com/X-LANCE/weblm. 视觉丰富的文档(如网页和扫描/数字文档(图像、 PDF 等))日益流行,导致学术界和行业对自动文档理解和信息抽取的兴趣增加。尽管包括图像、文本、布局和结构在内的各种文档模式有助于人类信息检索,但这些模式的相互关联性对神经网络提出了挑战。本文介绍了 WebLM,这是一个多模式预训练网络,旨在解决网页中纯文本建模和 HTML 结构模式的局限性。WebLM 不再将文档图像作为统一的自然图像处理,而是整合了文档图像的层次结构,以增强对基于标记语言的文档的理解。此外,我们提出了几个预训练任务,以建模文本,结构和图像模式之间的交互作用有效。实证结果表明,预先训练的 WebLM 在几个网页理解任务中显著超过了先前最先进的预先训练的模型。预先训练的模型和代码可在 https://github.com/x-lance/weblm 获得。 code 0
Towards Alignment-Uniformity Aware Representation in Graph Contrastive Learning Rong Yan, Peng Bao, Xiao Zhang, Zhongyi Liu, Hui Liu TravelSky Technol Ltd, Key Lab Intelligent Passenger, Serv Civil Aviat, Beijing, Peoples R China; Beijing Jiaotong Univ, Beijing, Peoples R China Graph Contrastive Learning (GCL) methods benefit from two key properties: alignment and uniformity, which encourage the representation of related objects together while pushing apart different objects. Most GCL methods aim to preserve alignment and uniformity through random graph augmentation strategies and indiscriminately negative sampling. However, their performance is highly sensitive to graph augmentation, which requires cumbersome trialand-error and expensive domain-specific knowledge as guidance. Besides, these methods perform negative sampling indiscriminately, which inevitably suffers from sampling bias, i.e., negative samples from the same class as the anchor. To remedy these issues, we propose a unified GCL framework towards Alignment-Uniformity Aware Representation learning (AUAR), which can achieve better alignment while improving uniformity without graph augmentation and negative sampling. Specifically, we propose intra- and inter-alignment loss to align the representations of the node with itself and its cluster centroid to maintain label-invariant. Furthermore, we introduce a uniformity loss with theoretical analysis, which pushes the representations of unrelated nodes from different classes apart and tends to provide informative variance from different classes. Extensive experiments demonstrate that our method gains better performance than existing GCL methods in node classification and clustering tasks across three widely-used datasets. 图形对比学习(GCL)方法受益于两个关键属性: 对齐和一致性,这两个属性鼓励相关对象一起表示,同时推开不同的对象。大多数 GCL 方法的目的是通过随机图增强策略和不加区分的负采样来保持对齐和一致性。然而,它们的性能对图增强非常敏感,这需要繁琐的试验和昂贵的特定领域的知识作为指导。此外,这些方法不加区分地进行负抽样,不可避免地会受到抽样偏差的影响,即来自与锚点同一类别的负抽样。为了解决这些问题,我们提出了一个统一的 GCL 框架来实现对齐-一致性感知表示学习(AUAR) ,该框架可以在不增加图增强和负抽样的情况下实现更好的对齐,同时提高一致性。具体来说,我们提出了内部和内部对齐丢失来对齐节点的表示和它自己以及它的聚类中心来保持标签不变性。此外,我们还引入了理论分析中的一致性损失,它将不同类别的不相关节点的表示分开,并倾向于提供来自不同类别的信息方差。大量的实验表明,在三个广泛使用的数据集上,我们的方法在节点分类和聚类任务方面比现有的 GCL 方法获得了更好的性能。 code 0
GAP: A Grammar and Position-Aware Framework for Efficient Recognition of Multi-Line Mathematical Formulas Zhe Yang, Qi Liu, Kai Zhang, Shiwei Tong, Enhong Chen Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Anhui, Peoples R China Formula recognition endeavors to automatically identify mathematical formulas from images. Currently, the Encoder-Decoder model has significantly advanced the translation from image to corresponding formula markups. Nonetheless, previous research primarily concentrated on single-line formula recognition, ignoring the recognition of multi-line formulas, which presents additional challenges such as more stringent grammatical restrictions and twodimensional positions. In this work, we present GAP (Grammar And Position-Aware formula recognition), a comprehensive framework designed to tackle the challenges in multi-line mathematical formula recognition. First, to overcome the limitations imposed by grammar, we design a novel Grammar Aware Contrastive Learning (GACL) module, integrating complex grammar rules into the transcription model through a contrastive learning mechanism. Furthermore, primitive contrastive learning lacks clear directions for comprehending grammar rules and can lead to unstable convergence or prolonged training cycles. To enhance training efficiency, we propose Rank-Based Sampling (RBS) specialized for multi-line formulas, which guides the learning process by the importance ranking of different grammar errors. Finally, spatial location information is critical considering the two-dimensional nature of multiline formulas. To aid the model in keeping track of that global information, we introduced a Visual Coverage (VC) mechanism that incorporates historical attention information into the image features via a parameter-free way. To validate the effectiveness of our GAP framework, we construct a new dataset Multi-Line containing 12,002 multi-line formulas and conduct extensive experiments to show the efficacy of our GAP framework in capturing grammatical rules, enhancing recognition accuracy, and enhancing training efficiency. Codes and datasets are available at https://github.com/Sinon02/GAP. 公式识别致力于从图像中自动识别数学公式。目前,编解码器模型已经大大提高了从图像到相应公式标记的转换。然而,以往的研究主要集中在单行公式的识别上,忽视了对多行公式的识别,这带来了更严格的语法限制和二维位置等额外的挑战。在这项工作中,我们提出了 GAP (语法和位置感知公式识别) ,一个全面的框架,旨在解决多线数学公式识别的挑战。首先,为了克服语法的局限性,我们设计了一个新的语法感知对比学习(GACL)模块,通过对比学习机制将复杂的语法规则集成到转录模型中。此外,原始对比学习缺乏理解语法规则的清晰方向,可能导致收敛不稳定或训练周期延长。为了提高训练效率,提出了一种基于排序的多行公式抽样方法(RBS) ,该方法通过对不同语法错误的重要性排序来指导学习过程。最后,考虑到多线性公式的二维性质,空间位置信息是至关重要的。为了帮助模型跟踪全局信息,我们引入了一种可视化覆盖(VC)机制,通过一种无参数的方式将历史注意力信息合并到图像特征中。为了验证我们的 GAP 框架的有效性,我们构建了一个包含12,002个多行公式的新数据集 Multi-Line,并进行了广泛的实验,以显示我们的 GAP 框架在捕获语法规则、提高识别准确性和提高训练效率方面的有效性。代码和数据集可在 https://github.com/sinon02/gap 获得。 code 0
Maximizing Malicious Influence in Node Injection Attack Xiao Zhang, Peng Bao, Shirui Pan Griffith Univ, Brisbane, Australia; Beijing Jiaotong Univ, Beijing, Peoples R China Graph neural networks (GNNs) have achieved impressive performance in various graph-related tasks. However, recent studies have found that GNNs are vulnerable to adversarial attacks. Node injection attacks (NIA) become an emerging scenario of graph adversarial attacks, where the attacks are performed by injecting malicious nodes into the original graph instead of directly modifying it. In this paper, we focus on a more realistic scenario of NIA, where the attacker is only allowed to inject a small number of nodes to degrade the performance of GNNs with very limited information. We analyze the susceptibility of nodes, and based on this we propose a global node injection attack framework, MaxiMal, to maximize malicious information under a strict black-box setting. MaxiMal first introduces a susceptible-reverse influence sampling strategy to select neighbor nodes that are able to spread malicious information widely. Then contrastive loss is introduced to optimize the objective by updating the edges and features of the injected nodes. Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed MaxiMal over the state-of-the-art approaches. 图神经网络(GNN)在各种与图有关的任务中取得了令人印象深刻的性能。然而,最近的研究发现 GNN 很容易受到敌对攻击。节点注入攻击(NIA)是一种新兴的图形对抗性攻击方案,通过将恶意节点注入原始图形而不是直接修改原始图形来实施攻击。在本文中,我们着重于一个更加真实的 NIA 场景,在这个场景中,攻击者只允许注入少量的节点,以非常有限的信息降低 GNN 的性能。分析了节点的易感性,并在此基础上提出了一种全局节点注入攻击框架 MaxMal,该框架可以在严格的黑盒设置下最大化恶意信息。MaxMal 首先引入敏感-反向影响采样策略来选择能够广泛传播恶意信息的邻居节点。然后引入对比损失,通过更新注入节点的边缘和特征来优化目标。在三个基准数据集上的大量实验证明了我们提出的 MaxMal 相对于最先进的方法的优越性。 code 0
Interpretable Imitation Learning with Dynamic Causal Relations Tianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang, Yuncong Chen, Yanchi Liu, Wei Cheng, Haifeng Chen Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents' decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, . The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy. 模仿学习通过模仿专家演示来学习智能体策略,在医疗体制和自动驾驶车辆等许多应用领域取得了良好的效果。但是,要解释代理所学到的控制策略仍然是一项困难的任务。难点主要来自两个方面: 1)模拟学习中的智能体通常被实现为深层神经网络,这是黑箱模型,缺乏可解释性; 2)智能体决策背后的潜在因果机制可能会随着轨迹而变化,而不是在整个时间步骤中保持静态。为了增加透明度并提供更好的神经代理的可解释性,我们建议以有向无环因果图的形式揭示其捕获的知识,其中节点是作用,状态变量和边表示预测背后的因果关系。此外,我们设计这个因果发现过程是依赖于状态的,使它能够模拟潜在因果图中的动态。具体来说,我们从格兰杰因果关系的角度进行因果发现,并提出了一个自我解释的模仿学习框架。该框架由动态因果发现模块、因果编码模块和预测模块三部分组成,并以端到端的方式进行训练。在学习模型之后,我们可以获得其决策背后的状态和行动变量之间的因果关系,揭示它所学习的政策。实验结果表明,该方法能够有效地学习动态因果图,从而理解模拟学习的决策过程,同时保持较高的预测精度。 code 0
RDGCN: Reinforced Dependency Graph Convolutional Network for Aspect-based Sentiment Analysis Xusheng Zhao, Hao Peng, Qiong Dai, Xu Bai, Huailiang Peng, Yanbing Liu, Qinglang Guo, Philip S. Yu Aspect-based sentiment analysis (ABSA) is dedicated to forecasting the sentiment polarity of aspect terms within sentences. Employing graph neural networks to capture structural patterns from syntactic dependency parsing has been confirmed as an effective approach for boosting ABSA. In most works, the topology of dependency trees or dependency-based attention coefficients is often loosely regarded as edges between aspects and opinions, which can result in insufficient and ambiguous syntactic utilization. To address these problems, we propose a new reinforced dependency graph convolutional network (RDGCN) that improves the importance calculation of dependencies in both distance and type views. Initially, we propose an importance calculation criterion for the minimum distances over dependency trees. Under the criterion, we design a distance-importance function that leverages reinforcement learning for weight distribution search and dissimilarity control. Since dependency types often do not have explicit syntax like tree distances, we use global attention and mask mechanisms to design type-importance functions. Finally, we merge these weights and implement feature aggregation and classification. Comprehensive experiments on three popular datasets demonstrate the effectiveness of the criterion and importance functions. RDGCN outperforms state-of-the-art GNN-based baselines in all validations. 基于体的情感分析(ABSA)致力于预测句子中体词的情感极性。利用图形神经网络从句法依赖分析中获取结构模式已被证实是提高 ABSA 的有效途径。在大多数作品中,依赖树或基于依赖的注意系数的拓扑结构往往被松散地视为方面和观点之间的边缘,这可能导致句法利用的不足和歧义。为了解决这些问题,我们提出了一种新的增强依赖图卷积网络(RDGCN) ,它改进了距离视图和类型视图中依赖关系的重要性计算。首先,我们提出了一个依赖树上最小距离的重要计算准则。根据该准则,我们设计了一个距离重要性函数,利用强化学习进行权重分布搜索和差异控制。由于依赖类型通常没有像树距离这样明确的语法,所以我们使用全局注意力和掩码机制来设计类型重要性函数。最后,合并这些权重,实现特征的聚合和分类。通过对三个常用数据集的综合实验,验证了该判据和重要性函数的有效性。在所有验证中,RDGCN 的性能优于最先进的基于 GNN 的基线。 code 0
CreST: A Credible Spatiotemporal Learning Framework for Uncertainty-aware Traffic Forecasting Zhengyang Zhou, Jiahao Shi, Hongbo Zhang, Qiongyu Chen, Xu Wang, Hongyang Chen, Yang Wang Univ Sci & Technol China USTC, Hefei, Peoples R China; Univ Sci & Technol China, Hefei, Peoples R China; Zhejiang Lab, Hangzhou, Peoples R China Spatiotemporal traffic forecasting plays a critical role in intelligent transportation systems, which empowers diverse urban services. Existing traffic forecasting frameworks usually devise various learning strategies to capture spatiotemporal correlations from the perspective of volume itself. However, we argue that previous traffic predictions are still unreliable due to two aspects. First, the influences of context factor-wise interactions on dynamic region-wise correlations are under exploitation. Second, the dynamics induce the credibility issue of forecasting that has not been well-explored. In this paper, we exploit the informative traffic-related context factors to jointly tackle the dynamic regional heterogeneity and explain the stochasticity, towards a credible uncertainty-aware traffic forecasting. Specifically, to internalize the dynamic contextual influences into learning process, we design a context-cross relational embedding to capture interactions between each context, and generate virtual graph topology to dynamically relate pairwise regions with context embedding. To quantify the prediction credibility, we attribute data-side aleatoric uncertainty to contexts and re-utilize them for aleatoric uncertainty quantification. Then we couple a dual-pipeline learning with the same objective to produce the discrepancy of model outputs and quantify model-side epistemic uncertainty. These two uncertainties are fed through a spatiotemporal network for extracting uncertainty evolution patterns. Finally, comprehensive experiments and model deployments have corroborated the credibility of our framework. 时空交通预测在智能交通系统中起着至关重要的作用,为城市提供多样化的服务。现有的交通流量预测框架通常设计各种学习策略,从体积本身的角度来捕捉时空相关性。然而,由于两个方面的原因,我们认为以前的流量预测仍然是不可靠的。首先,上下文因素相互作用对动态区域相关性的影响正在研究之中。其次,这种动态导致预测的可信度问题尚未得到很好的探讨。本文利用信息化交通相关背景因素,共同解决动态区域异质性和随机性问题,从而实现可靠的不确定性交通预测。为了将动态语境的影响内在化到学习过程中,我们设计了一个跨语境的关系嵌入来捕获每个语境之间的交互,并生成虚拟图拓扑来动态关联成对区域与语境嵌入。为了量化预测的可信度,我们将数据边的风险不确定性归因于上下文,并重新利用上下文对风险不确定性进行量化。然后,我们耦合具有相同目标的双流水线学习来产生模型输出的差异和量化模型边的认知不确定性。这两个不确定性通过一个时空网络提取不确定性演化模式。最后,全面的实验和模型部署验证了我们的框架的可信性。 code 0
Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices Jing Zhu, Yuhang Zhou, Vassilis N. Ioannidis, Shengyi Qian, Wei Ai, Xiang Song, Danai Koutra Gait recognition is one of the most critical long-distance identification technologies and increasingly gains popularity in both research and industry communities. Despite the significant progress made in indoor datasets, much evidence shows that gait recognition techniques perform poorly in the wild. More importantly, we also find that some conclusions drawn from indoor datasets cannot be generalized to real applications. Therefore, the primary goal of this paper is to present a comprehensive benchmark study for better practicality rather than only a particular model for better performance. To this end, we first develop a flexible and efficient gait recognition codebase named OpenGait. Based on OpenGait, we deeply revisit the recent development of gait recognition by re-conducting the ablative experiments. Encouragingly,we detect some unperfect parts of certain prior woks, as well as new insights. Inspired by these discoveries, we develop a structurally simple, empirically powerful, and practically robust baseline model, GaitBase. Experimentally, we comprehensively compare GaitBase with many current gait recognition methods on multiple public datasets, and the results reflect that GaitBase achieves significantly strong performance in most cases regardless of indoor or outdoor situations. Code is available at https://github.com/ShiqiYu/OpenGait. 步态识别是最关键的远程识别技术之一,在研究和工业界越来越受欢迎。尽管在室内数据集方面取得了重大进展,但大量证据表明,步态识别技术在野外的表现较差。更重要的是,我们还发现,从室内数据集中得出的一些结论不能推广到实际应用中。因此,本文的主要目标是提出一个更好的实用性的全面的基准研究,而不仅仅是一个特定的模型,以获得更好的性能。为此,我们首先开发了一个灵活高效的步态识别代码库 OpenGait。基于 OpenGait,我们重新进行烧蚀实验,深入回顾了近年来步态识别的发展。令人鼓舞的是,我们发现一些不完美的部分,某些以前的工作,以及新的见解。受到这些发现的启发,我们开发了一个结构简单、经验强大、实际可靠的基线模型 GaitBase。在实验上,我们全面比较了 GaitBase 和现有的多个公共数据集上的步态识别方法,结果表明 GaitBase 在大多数情况下无论室内还是室外都取得了显著的性能。密码可于 https://github.com/shiqiyu/opengait 索取。 code 0
MultiSPANS: A Multi-range Spatial-Temporal Transformer Network for Traffic Forecast via Structural Entropy Optimization Dongcheng Zou, Senzhang Wang, Xuefeng Li, Hao Peng, Yuandong Wang, Chunyang Liu, Kehua Sheng, Bo Zhang Traffic forecasting is a complex multivariate time-series regression task of paramount importance for traffic management and planning. However, existing approaches often struggle to model complex multi-range dependencies using local spatiotemporal features and road network hierarchical knowledge. To address this, we propose MultiSPANS. First, considering that an individual recording point cannot reflect critical spatiotemporal local patterns, we design multi-filter convolution modules for generating informative ST-token embeddings to facilitate attention computation. Then, based on ST-token and spatial-temporal position encoding, we employ the Transformers to capture long-range temporal and spatial dependencies. Furthermore, we introduce structural entropy theory to optimize the spatial attention mechanism. Specifically, The structural entropy minimization algorithm is used to generate optimal road network hierarchies, i.e., encoding trees. Based on this, we propose a relative structural entropy-based position encoding and a multi-head attention masking scheme based on multi-layer encoding trees. Extensive experiments demonstrate the superiority of the presented framework over several state-of-the-art methods in real-world traffic datasets, and the longer historical windows are effectively utilized. The code is available at https://github.com/SELGroup/MultiSPANS. 交通量预测是一个复杂的多元时间序列回归任务,对交通管理和规划至关重要。然而,现有的方法往往难以利用局部时空特征和道路网层次知识建模复杂的多范围依赖关系。为了解决这个问题,我们提出了 MultiSPANS。首先,考虑到单个记录点不能反映关键的时空局部模式,我们设计了多滤波卷积模块来生成信息性 ST 令牌嵌入,以方便注意力计算。然后,基于 ST 标记和空间-时间位置编码,我们使用变形金刚来捕获长距离的时间和空间依赖。在此基础上,引入结构熵理论对空间注意机制进行优化。具体而言,采用结构熵最小化算法生成最优路网层次结构,即编码树。在此基础上,提出了一种基于相对结构熵的位置编码和基于多层编码树的多头注意掩蔽方案。通过大量的实验证明了该框架相对于现实世界交通数据集中的几种最新方法的优越性,并且有效地利用了较长的历史窗口。密码可在 https://github.com/selgroup/multispans 查阅。 code 0
WordGraph: A Python Package for Reconstructing Interactive Causal Graphical Models from Text Data Amine Ferdjaoui, Séverine Affeldt, Mohamed Nadif Univ Paris Cite, Ctr Borelli UMR 9010, Paris, France; Univ Paris Cite, SogetiLabs, Paris, France We present WordGraph, a Python package for exploring the topics of documents corpora. WordGraph provides causal graphical models from text data vocabulary and proposes interactive visualizations of terms networks. Our ease-to-use package is provided with a prebuilt pipeline to access the main modules through jupyter widgets. It results in the encapsulation of a whole vocabulary exploration process within a single jupyter notebook cell, with straightforward parameters settings and interactive plots. WordGraph pipeline is fully customizable by adding/removing widgets or changing default parameters. To assist users with no background in Python nor jupyter notebook, but willing to explore large corpora topics, we also propose an automatic dashboard generation from the customizable jupyter notebook pipeline in a web application style. WordGraph is available through a GitHub repository. 我们介绍 WordGraph,这是一个用于探索文档语料库主题的 Python 包。WordGraph 从文本数据词汇中提供因果图形模型,并提出术语网络的交互式可视化。我们的易于使用的软件包提供了一个预构建的管道,可以通过 jupyter 小部件访问主要模块。它将整个词汇探索过程封装在一个单独的木星笔记本电脑单元中,具有简单的参数设置和交互式图形。通过添加/删除小部件或更改默认参数,WordGraph 管道是完全可定制的。为了帮助那些既没有 Python 背景,也没有 Jupyter 笔记本背景,但又愿意探索大型语料库主题的用户,我们还提议以 Web 应用程序风格从可定制的 Jupyter 笔记本管道自动生成仪表板。WordGraph 可以通过 GitHub 存储库获得。 code 0
EvidenceQuest: An Interactive Evidence Discovery System for Explainable Artificial Intelligence Ambreen Hanif, Amin Beheshti, Xuyun Zhang, Steven Wood, Boualem Benatallah, EuJin Foo Macquarie Univ, Sydney, NSW, Australia; Prospa, Sydney, NSW, Australia; Dublin City Univ, Dublin, Ireland Explainable Artificial Intelligence (XAI) aims to make artificial intelligence (AI) systems transparent and understandable to humans, providing clear explanations for the decisions made by AI models. This paper presents a novel pipeline and a digital dashboard that provides a user-friendly platform for interpreting the results of machine learning algorithms using XAI technology. The dashboard utilizes evidence-based design principles to deliver information clearly and concisely, enabling users to better understand the decisions made by their algorithms. We integrate XAI services into the dashboard to explain the algorithm's predictions, allowing users to understand howtheir models function and make informed decisions. We demonstrate a motivating scenario in banking and present how the proposed system enhances transparency and accountability and improves trust in the technology. 可解释人工智能(XAI)旨在使人工智能(AI)系统对人类透明、易懂,为 AI 模型的决策提供清晰的解释。本文提出了一种新的流水线和数字仪表板,为使用 XAI 技术解释机器学习算法的结果提供了一个用户友好的平台。仪表板利用基于证据的设计原则来清晰、简洁地传递信息,使用户能够更好地理解他们的算法所做的决策。我们将 XAI 服务集成到仪表板中,以解释算法的预测,使用户能够理解他们的模型是如何工作的,并做出明智的决策。我们展示了银行业的激励情景,并介绍了拟议的系统如何增强透明度和问责制,并提高对技术的信任。 code 0
Ginkgo-P: General Illustrations of Knowledge Graphs for Openness as a Platform Blaine Hill, Lihui Liu, Hanghang Tong Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA Accessibility and openness are two of the most important factors in motivating AI and Web research. One example is as costs to train and deploy large Knowledge Graph (KG) systems increases, valuable auxiliary features such as visualization, explainability, and automation are often overlooked, diminishing impact and popularity. Furthermore, current KG research has undergone a vicissitude to become convoluted and abstract, dissuading collaboration. To this end, we present Ginkgo-P, a platform to automatically illustrate any KG algorithm with nothing but a script and a data file. Additionally, Ginkgo-P elucidates modern KG research on the UMLS dataset with interactive demonstrations on four categories: KG Node Recommendation, KG Completion, KG Question Answering, and KG Reinforcement Learning. These categories and their many applications are increasingly ubiquitous yet lack both introductory and advanced resources to accelerate interest and contributions: with just a few clicks, our demonstration addresses this by providing an open platform for users to integrate individual KG algorithms. The source code for Ginkgo-P is available: we hope that it will propel future KG systems to become more accessible as an open source project. 可访问性和开放性是激发人工智能和网络研究的两个最重要的因素。其中一个例子是,随着培训和部署大型知识图(KG)系统的成本增加,可视化、可解释性和自动化等有价值的辅助特性常常被忽视,影响力和受欢迎程度降低。此外,当前幼儿园的研究也经历了一个变迁,变得复杂而抽象,阻碍了合作。为此,我们提出了银杏 -P,一个平台,自动说明任何 KG 算法,只有一个脚本和数据文件。此外,银杏 -P 阐述了现代幼稚园对 UMLS 数据集的研究与交互式示范的四个类别: 幼稚园节点推荐,幼稚园完成,幼稚园问题回答和幼稚园强化学习。这些类别和它们的许多应用程序越来越普遍,但缺乏引入性和先进的资源来加速兴趣和贡献: 只需几次点击,我们的演示通过为用户提供一个开放平台来集成各个 KG 算法来解决这个问题。Ginkgo-P 的源代码是可用的: 我们希望它将推动未来的 KG 系统作为一个开源项目变得更容易访问。 code 0
Real-time E-bike Route Planning with Battery Range Prediction Zhao Li, Guoqi Ren, Yongchun Gu, Siwei Zhou, Xuanwu Liu, Jiaming Huang, Ming Li code 0
An Interpretable Brain Graph Contrastive Learning Framework for Brain Disorder Analysis Xuexiong Luo, Guangwei Dong, Jia Wu, Amin Beheshti, Jian Yang, Shan Xue code 0
Future Timelines: Extraction and Visualization of Future-related Content From News Articles Juwal Regev, Adam Jatowt, Michael Färber code 0
Temporal Graph Analysis with TGX Razieh Shirzadkhani, Shenyang Huang, Elahe Kooshafar, Reihaneh Rabbany, Farimah Poursafaei code 0
A Scalable Open-Source System for Segmenting Urban Areas with Road Networks Ming Zhang, Yanyan Li, Jianguo Duan, Jizhou Huang, Jingbo Zhou code 0
Some Useful Things to Know When Combining IR and NLP: The Easy, the Hard and the Ugly Omar Alonso, Kenneth Church Northeastern Univ, San Jose, CA USA; Amazon, Santa Clara, CA 95054 USA Deep nets such as GPT are at the core of the current advances in many systems and applications. Things are moving very fast, and it appears that techniques are out of date within weeks. How can we take advantage of new discoveries and incorporate them into our existing work? Are these radical new developments, repetitions of older concepts, or both? In this tutorial, we aim to bring interested researchers and practitioners up to speed on the recent and ongoing techniques around ML and Deep learning in the context of IR and NLP. Additionally, our goal is to clarify terminology, emphasize fundamentals, and outline new research opportunities. 像 GPT 这样的深网是当前许多系统和应用进展的核心。事情发展得非常快,而且技术似乎在几周内就过时了。我们如何利用新的发现,并将其纳入我们现有的工作?这些是激进的新发展、旧观念的重复,还是两者兼而有之?在本教程中,我们的目的是让感兴趣的研究人员和从业人员加快最近和正在进行的技术在机器学习和深度学习的背景下的 IR 和 NLP。此外,我们的目标是澄清术语,强调基本原理,并概述新的研究机会。 code 0
Introduction to Responsible AI Ricardo BaezaYates code 0
Towards Trustworthy Large Language Models Sanmi Koyejo, Bo Li code 0
Strategic ML: How to Learn With Data That 'Behaves' Nir Rosenfeld code 0
Learning Opinion Dynamics from Data Jacopo Lenti code 0
Gaussian Graphical Model-Based Clustering of Time Series Data Kohei Obata code 0
Using Causal Inference to Solve Uncertainty Issues in Dataset Shift Song Shuang, Muhammad Syafiq Bin Mohd Pozi code 0
Multi-Granular Text Classification with Minimal Supervision Yunyi Zhang code 0
Scaling Use-case Based Shopping using LLMs Sachin Farfade, Sachin Vernekar, Vineet Chaoji, Rajdeep Mukherjee code 0
HealAI: A Healthcare LLM for Effective Medical Documentation Sagar Goyal, Eti Rastogi, Sree Prasanna Rajagopal, Dong Yuan, Fen Zhao, Jai Chintagunta, Gautam Naik, Jeff Ward code 0
Mitigating Factual Inconsistency and Hallucination in Large Language Models Muneeswaran I, Advaith Shankar, Varun V, Saisubramaniam Gopalakrishnan, Vishal Vaddina code 0
Foundation Models for Aerial Robotics Ashish Kapoor code 0
Journey of Hallucination-minimized Generative AI Solutions for Financial Decision Makers Sohini Roychowdhury Generative AI has significantly reduced the entry barrier to the domain of AI owing to the ease of use and core capabilities of automation, translation, and intelligent actions in our day to day lives. Currently, Large language models (LLMs) that power such chatbots are being utilized primarily for their automation capabilities for software monitoring, report generation etc. and for specific personalized question answering capabilities, on a limited scope and scale. One major limitation of the currently evolving family of LLMs is 'hallucinations', wherein inaccurate responses are reported as factual. Hallucinations are primarily caused by biased training data, ambiguous prompts and inaccurate LLM parameters, and they majorly occur while combining mathematical facts with language-based context. Thus, monitoring and controlling for hallucinations becomes necessary when designing solutions that are meant for decision makers. In this work we present the three major stages in the journey of designing hallucination-minimized LLM-based solutions that are specialized for the decision makers of the financial domain, namely: prototyping, scaling and LLM evolution using human feedback. These three stages and the novel data to answer generation modules presented in this work are necessary to ensure that the Generative AI chatbots, autonomous reports and alerts are reliable and high-quality to aid key decision-making processes. 生成性人工智能大大降低了进入人工智能领域的门槛,因为在我们的日常生活中,自动化、翻译和智能行动具有易于使用和核心能力。目前,驱动聊天机器人的大语言模型(LLM)主要用于软件监控、报告生成等自动化功能,以及在有限的范围和规模内提供特定的个性化问题回答功能。当前发展中的 LLM 家族的一个主要限制是“幻觉”,其中不准确的反应被报道为事实。幻觉主要是由有偏见的训练数据、模糊的提示和不准确的 LLM 参数引起的,它们主要发生在将数学事实与基于语言的上下文相结合的情况下。因此,在为决策者设计解决方案时,对幻觉的监测和控制变得必要。在这项工作中,我们介绍了三个主要阶段的旅程设计幻觉最小化 LLM 为基础的解决方案,专门为金融领域的决策者,即: 原型,缩放和 LLM 进化使用人的反馈。这三个阶段和新的数据回答生成模块在这项工作是必要的,以确保生成人工智能聊天机器人,自主报告和警报是可靠的和高质量的,以帮助关键的决策过程。 code 0
Accelerating Pharmacovigilance using Large Language Models Mukkamala Venkata Sai Prakash, Ganesh Parab, Meghana Veeramalla, Siddartha Reddy, Varun V, Saisubramaniam Gopalakrishnan, Vishal Pagidipally, Vishal Vaddina code 0
Automated Topic Generation for the Mexican Platform for Access to Government Public Information During the Period 2003-2020 Hermelando CruzPérez, Alejandro MolinaVillegas code 0
Profiling Urban Mobility Patterns with High Spatial and Temporal Resolution: A Deep Dive into Cellphone Geo-position Data José Ignacio Huertas, Luisa Fernanda Chaparro Sierra code 0
Integrating Knowledge Graph Data with Large Language Models for Explainable Inference Carlos Efrain Quintero Narvaez, Raúl Monroy code 0
Genomic-World Fungi Data: Synteny Part Pedro EscobarTurriza, Luis MuñozMiranda, Alejandro PereiraSantana code 0
Preserving Heritage: Developing a Translation Tool for Indigenous Dialects Melissa Robles, Cristian A. Martínez, Juan C. Prieto, Sara Palacios, Rubén Manrique code 0
Automatic Extraction of Patterns in Digital News Articles of Femicides occurred in Mexico by Text Mining Techniques Jonathan ZárateCartas, Alejandro MolinaVillegas code 0
Integrity 2024: Integrity in Social Networks and Media Lluís Garcia Pueyo, Symeon Papadopoulos, Prathyusha Senthil Kumar, Aristides Gionis, Panayiotis Tsaparas, Vasilis Verroios, Giuseppe Manco, Anton Andryeyev, Stefano Cresci, Timos Sellis, Anthony McCosker Integrity 2024 is the fifth edition of the Workshop on Integrity in Social Networks and Media, held in conjunction with the ACM Conference on Web Search and Data Mining (WSDM) since the 2020 edition. The goal of the workshop is to bring together academic and industry researchers working on integrity, fairness, trust and safety in social networks to discuss the most pressing risks and cutting-edge technologies to reliably measure and mitigate them. The event consists of invited talks from academic experts and industry leaders as well as peer-reviewed papers and posters through an open call-for-papers. The workshop will take place on March 8th, 2024, in Mérida, Yucatán, Mexico. Call for papers and further information can be found in http://integrity-workshop.org. “诚信2024”是社交网络和媒体诚信研讨会的第五个版本,自2020年以来与 ACM 网络搜索和数据挖掘会议(WSDM)一起举办。研讨会的目标是汇集学术界和行业研究人员在诚信、公平、信任和社会网络安全方面的工作,讨论最紧迫的风险和尖端技术,以可靠地衡量和减轻这些风险。这项活动包括邀请学术专家和业界领袖进行讲座,以及通过公开征集论文的方式进行同行评议的论文和海报。研讨会将于2024年3月8日在梅里达、尤卡坦举行。如欲索取文件或查询进一步资料,请浏览 http://integrity-workshop.org。 code 0