New submissions for Mon, 3 Jul 23 #387
Labels
abstract meaning representation
argument mining
citation context analysis
computational social science
contrastive
cross-language information retrieval
cross-lingual information retrieval
data augmentation
extreme multi-label
knowledge discovery
knowledge graph
legal text
legal
mixup
multi-task
paraphrase
passage generation
plagiarism
robustness
scholarly document processing
scholarly
semantic similarity
similarity measure
simplification
summarization
text generation
Keyword: contrastive
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Authors: Simian Luo, Chuanhao Yan, Chenxu Hu, Hang ZhaoArxiv: https://arxiv.org/abs/2306.17203
TLDR: The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronous Video-To-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and Audio-visual
Repo: None
Masked Contrastive Graph Representation Learning for Age Estimation
Authors: Yuntao Shou, Xiangyong Cao, Deyu MengArxiv: https://arxiv.org/abs/2306.17798
TLDR: Age estimation of face images is a crucial task with various practical applications in areas such as video surveillance and Internet access control. While deep learning-based age estimation frameworks, e.g., convolutional neural network (CNN), multi-layer perceptrons (MLP), and transformers have shown remarkable performance, they have limitations when modelling complex or irregular objects in an image that contains a large amount of redundant information. To address this issue, this paper utilizes the robustness property of graph
Repo: None
A Massive Scale Semantic Similarity Dataset of Historical English
Authors: Emily Silcock, Melissa DellArxiv: https://arxiv.org/abs/2306.17810
TLDR: A diversity of tasks use language models trained on semantic similarity data. While there are a variety of datasets that capture semantic similarity, they are either constructed from modern web data or are relatively small datasets created in the past decade by human annotators. This study utilizes a novel source, newly digitized articles from off-copyright, local U.S. newspapers, to assemble a massive-scale semantic similarity dataset spanning 70 years from 1920 to 1989 and containing nearly 400M positive semantic similarity pairs
Repo: None
Keyword: data augmentation
EyeBAG: Accurate Control of Eye Blink and Gaze Based on Data Augmentation Leveraging Style Mixing
Authors: Bryan S. Kim, Jeong Young Jeong, Wonjong RyuArxiv: https://arxiv.org/abs/2306.17391
TLDR: Recent developments in generative models have enabled the generation of photo-realistic human face images, and downstream tasks utilizing face generation technology have advanced accordingly. However, models for downstream tasks are yet substandard at eye control (e.g. eye blink, gaze redirection). To overcome such eye control problems, we introduce a novel framework consisting of two distinct modules: a blink control module and a gaze redirect module. We also propose a novel data augmentation method to train each module
Repo: None
DeepTagger: Knowledge Enhanced Named Entity Recognition for Web-Based Ads Queries
Authors: Simiao Zuo, Pengfei Tang, Xinyu Hu, Qiang Lou, Jian Jiao, Denis CharlesArxiv: https://arxiv.org/abs/2306.17413
TLDR: Named entity recognition (NER) is a crucial task for online advertisement. State-of-the-art solutions leverage pre-trained language models for this task. However, three major challenges remain unresolved: web queries differ from natural language; web queries are short and lack contextual information; and labeled data for NER is scarce. We propose DeepTagger, a knowledge-enhanced NER model for web-based ads queries. The proposed knowledge enhancement framework leverages both model-
Repo: None
Impact of Noise on Calibration and Generalisation of Neural Networks
Authors: Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel RodriguesArxiv: https://arxiv.org/abs/2306.17630
TLDR: Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under
Repo: None
Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
Authors: Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel RuizArxiv: https://arxiv.org/abs/2306.17848
TLDR: Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to
Repo: None
Keyword: knowledge discovery
GPT-FinRE: In-context Learning for Financial Relation Extraction using Large Language Models
Authors: Pawan Kumar Rajpoot, Ankur ParikhArxiv: https://arxiv.org/abs/2306.17519
TLDR: Relation extraction (RE) is a crucial task in natural language processing (NLP) that aims to identify and classify relationships between entities mentioned in text. In the financial domain, relation extraction plays a vital role in extracting valuable information from financial documents, such as news articles, earnings reports, and company filings. This paper describes our solution to relation extraction on one such dataset REFinD. The dataset was released along with shared task as a part of the Fourth Workshop on Knowledge Discovery from
Repo: None
Keyword: knowledge graph
RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care
Authors: Rakhilya Lee Mekhtieva, Brandon Forbes, Dalal Alrajeh, Brendan Delaney, Alessandra RussoArxiv: https://arxiv.org/abs/2306.17175
TLDR: Clinical decision-making is a fundamental stage in delivering appropriate care to patients. In recent years several decision- making systems designed to aid the clinician in this process have been developed. However, technical solutions currently in use are based on simple regression models and are only able to take into account simple pre-defined multiple-choice features, such as patient age, pre-existing conditions, smoker status, etc. One particular source of patient data, that available decision-makers systems are incapable
Repo: None
Keyword: mixup
Impact of Noise on Calibration and Generalisation of Neural Networks
Authors: Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel RodriguesArxiv: https://arxiv.org/abs/2306.17630
TLDR: Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under
Repo: None
Keyword: multi-task
Progressive Multi-task Learning Framework for Chinese Text Error Correction
Authors: Shirong Ma, Yinghui Li, Haojing Huang, Shulin Huang, Yangning Li, Hai-Tao Zheng, Ying ShenArxiv: https://arxiv.org/abs/2306.17447
TLDR: Chinese Text Error Correction (CTEC) aims to detect and correct errors in the input text, which benefits human's daily life and various downstream tasks. Recent approaches mainly employ Pre-trained Language Models (PLMs) to resolve CTEC task and achieve tremendous success. However, previous approaches suffer from issues of over-correction and under-rection, and the former is especially conspicuous in the precision-critical CTEC tasks. To mitigate the issue of overcorrection, we propose a
Repo: None
FedBone: Towards Large-Scale Federated Multi-Task Learning
Authors: Yiqiang Chen, Teng Zhang, Xinlong Jiang, Qian Chen, Chenlong Gao, Wuliang HuangArxiv: https://arxiv.org/abs/2306.17465
TLDR: Heterogeneous federated multi-task learning (HFMTL) is a federated learning technique that combines heterogeneous tasks of different clients to achieve more accurate, comprehensive predictions. In real-world applications, visual and natural language tasks typically require large-scale models to extract high-level abstract features. However, large-volume models cannot be directly applied to existing federated Multi-task Learning methods. Existing HFML methods also disregard the impact of gradient conflicts on multi-source
Repo: None
Achieving RGB-D level Segmentation Performance from a Single ToF Camera
Authors: Pranav Sharma, Jigyasa Singh Katrolia, Jason Rambach, Bruno Mirbach, Didier Stricker, Juergen SeilerArxiv: https://arxiv.org/abs/2306.17636
TLDR: Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-d cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF camera, we introduce a method utilizing
Repo: None
Feature Representation Learning for NL2SQL Generation Based on Coupling and Decoupling
Authors: Chenduo Hao, Xu Zhang, Chuanbao Gao, Deyu ZhouArxiv: https://arxiv.org/abs/2306.17646
TLDR: The NL2SQL task involves parsing natural language statements into SQL queries. While most state-of-the-art methods treat NL2QL as a slot-filling task and use feature representation learning techniques, they overlook explicit correlation features between the SELECT and WHERE clauses and implicit correlation features within sub-tasks within a single clause. To address this issue, we propose the Clause Feature Correlation Decoupling and Coupling (CFCDC) model, which uses a feature representation
Repo: None
Improved NL2SQL based on Multi-layer Expert Network
Authors: Chenduo Hao, Xu ZhangArxiv: https://arxiv.org/abs/2306.17727
TLDR: The Natural Language to SQL (NL2SQL) technique is used to convert natural language queries into executable SQL statements. Typically, slot-filling is employed as a classification method for multi-task cases to achieve this goal. However, sloting can result in inaccurate SQL statement generation due to negative migration issues arising from different classification tasks. To overcome this limitation, this study introduces a new approach called Multi-Layer Expert Generate SQL (MLEG-SQL), which utilizes a dedicated
Repo: None
Keyword: robustness
Limits of Machine Learning for Automatic Vulnerability Detection
Authors: Niklas Risse, Marcel BöhmeArxiv: https://arxiv.org/abs/2306.17193
TLDR: Recent results of machine learning for automatic vulnerability detection have been very promising indeed: Given only the source code of a function
Repo: None
Robust Roadside Perception for Autonomous Driving: an Annotation-free Strategy with Synthesized Data
Authors: Rusheng Zhang, Depu Meng, Lance Bassett, Shengyin Shen, Zhengxia Zou, Henry X. LiuArxiv: https://arxiv.org/abs/2306.17302
TLDR: Recently, with the rapid development in vehicle-to-infrastructure communication technologies, the infrastructure-based, roadside perception system for cooperative driving has become a rising field. This paper focuses on one of the most critical challenges - the data-insufficiency problem. The lacking of high-quality labeled roadside sensor data with high diversity leads to low robustness, and low transfer-ability of current roadside perception systems. In this paper, a novel approach is proposed to address this problem by creating
Repo: None
Scaling Model Checking for DNN Analysis via State-Space Reduction and Input Segmentation (Extended Version)
Authors: Mahum Naseer, Osman Hasan, Muhammad ShafiqueArxiv: https://arxiv.org/abs/2306.17323
TLDR: Owing to their remarkable learning capabilities and performance in real-world applications, the use of machine learning systems based on Neural Networks (NNs) has been continuously increasing. However, various case studies and empirical findings in the literature suggest that slight variations to NN inputs can lead to erroneous and undesirable NN behavior. This has led to considerable interest in their formal analysis, aiming to provide guarantees regarding a given NN's behavior. Existing frameworks provide robustness and/or safety guarantees
Repo: None
Decentralized Motor Skill Learning for Complex Robotic Systems
Authors: Yanjiang Guo, Zheyuan Jiang, Yen-Jen Wang, Jingyue Gao, Jianyu ChenArxiv: https://arxiv.org/abs/2306.17411
TLDR: Reinforcement learning (RL) has achieved remarkable success in complex robotic systems (eg. quadruped locomotion). In previous works, the RL-based controller was typically implemented as a single neural network with concatenated observation input. However, the corresponding learned policy is highly task-specific. Since all motors are controlled in a centralized way, out-of-distribution local observations can impact global motors through the single coupled neural network policy. In contrast, animals and humans can control
Repo: None
LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map
Authors: Xingyu Ji, Shenghai Yuan, Pengyu Yin, Lihua XieArxiv: https://arxiv.org/abs/2306.17436
TLDR: This letter presents an accurate and robust Lidar Inertial Odometry framework. We fuse LiDAR scans with IMU data using a tightly-coupled iterative error state Kalman filter for robust and fast localization. To achieve robust correspondence matching, we represent the points as a set of Gaussian distributions and evaluate the divergence in variance for outlier rejection. Based on the fitted distributions, a new residual metric, which demonstrates an improvement from merely quantifying distance to incorporating
Repo: None
Provable Robust Watermarking for AI-Generated Text
Authors: Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang WangArxiv: https://arxiv.org/abs/2306.17439
TLDR: As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated content becomes crucial. To address this challenge, we present GPTWatermark, a robust and high-quality solution designed to ascertain whether a piece of text originates from a specific model. Our approach extends existing watermarking strategies and employs a fixed group design to enhance robustness against editing and paraphrasing attacks. We show that our watermarked language model enjoys strong provable guarantees on
Repo: None
CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning
Authors: Yang Liu, Weixing Chen, Guanbin Li, Liang LinArxiv: https://arxiv.org/abs/2306.17462
TLDR: We present CausalVLR (Causal Visual-Linguistic Reasoning), an open-source toolbox containing a rich set of state-of-the-art causal relation discovery and causal inference methods for various visual-linguistic reasoning tasks, such as VQA, image/video captioning, medical report generation, model generalization and robustness, etc. These methods have been included in the toolbox with PyTorch implementations under NVIDIA computing system. It
Repo: None
Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization
Authors: Stephen Hausler, Sourav Garg, Punarjay Chakravarty, Shubham Shrivastava, Ankit Vora, Michael MilfordArxiv: https://arxiv.org/abs/2306.17529
TLDR: Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6- DoF frame-by-frame Pn
Repo: None
Towards the extraction of robust sign embeddings for low resource sign language recognition
Authors: Mathieu De Coster, Ellen Rushe, Ruth Holmes, Anthony Ventresque, Joni DambreArxiv: https://arxiv.org/abs/2306.17558
TLDR: Isolated Sign Language Recognition (SLR) has mostly been applied on relatively large datasets containing signs executed slowly and clearly by a limited group of signers. In real-world scenarios, however, we are met with challenging visual conditions, coarticulated signing, small datasets, and the need for signer independent models. To tackle this difficult problem, we require a robust feature extractor to process the sign language videos. One could expect human pose estimators to be ideal candidates
Repo: None
Control of Cross-Directional Systems with Approximate Symmetries
Authors: Idris Kempf, Paul Goulart, Stephen DuncanArxiv: https://arxiv.org/abs/2306.17565
TLDR: Structural asymmetries of linear dynamical systems can be exploited for decoupling the dynamics and reducing the computational complexity of the controller implementation. However, in practical applications, inexact structural symmetries undermine the ability to decouple the system, resulting in the loss of any potential complexity reduction. To address this, we propose substituting an approximation with exact structural symmtries for the original system model, thereby introducing an approximation error. We focus on internal model controllers for cross-
Repo: None
Unscented Optimal Control for 3D Coverage Planning with an Autonomous UAV Agent
Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. PolycarpouArxiv: https://arxiv.org/abs/2306.17588
TLDR: We propose a novel probabilistically robust controller for the guidance of an unmanned aerial vehicle (UAV) in coverage planning missions, which can simultaneously optimize both the UAV's motion, and camera control inputs for the 3D coverage of a given object of interest. Specifically, the coverage planning problem is formulated in this work as an optimal control problem with logical constraints to enable a UAV agent to jointly: a) select a series of discrete camera field-of-view states which satisfy
Repo: None
Navigation of micro-robot swarms for targeted delivery using reinforcement learning
Authors: Akshatha Jagadish, Manoj VarmaArxiv: https://arxiv.org/abs/2306.17598
TLDR: Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfully. In this work, we use the Reinforcement Learning (RL) algorithms Proximal Policy Optimization (PPO) and Robust Policy Optim
Repo: None
Impact of Noise on Calibration and Generalisation of Neural Networks
Authors: Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel RodriguesArxiv: https://arxiv.org/abs/2306.17630
TLDR: Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under
Repo: None
Masked Contrastive Graph Representation Learning for Age Estimation
Authors: Yuntao Shou, Xiangyong Cao, Deyu MengArxiv: https://arxiv.org/abs/2306.17798
TLDR: Age estimation of face images is a crucial task with various practical applications in areas such as video surveillance and Internet access control. While deep learning-based age estimation frameworks, e.g., convolutional neural network (CNN), multi-layer perceptrons (MLP), and transformers have shown remarkable performance, they have limitations when modelling complex or irregular objects in an image that contains a large amount of redundant information. To address this issue, this paper utilizes the robustness property of graph
Repo: None
Keyword: semantic similarity
A Massive Scale Semantic Similarity Dataset of Historical English
Authors: Emily Silcock, Melissa DellArxiv: https://arxiv.org/abs/2306.17810
TLDR: A diversity of tasks use language models trained on semantic similarity data. While there are a variety of datasets that capture semantic similarity, they are either constructed from modern web data or are relatively small datasets created in the past decade by human annotators. This study utilizes a novel source, newly digitized articles from off-copyright, local U.S. newspapers, to assemble a massive-scale semantic similarity dataset spanning 70 years from 1920 to 1989 and containing nearly 400M positive semantic similarity pairs
Repo: None
Keyword: summarization
SummQA at MEDIQA-Chat 2023:In-Context Learning with GPT-4 for Medical Summarization
Authors: Yash Mathur, Sanketh Rangreji, Raghav Kapoor, Medha Palavalli, Amanda Bertsch, Matthew R. GormleyArxiv: https://arxiv.org/abs/2306.17384
TLDR: Medical dialogue summarization is challenging due to the unstructured nature of medical conversations, the use of medical terminology in gold summaries, and the need to identify key information across multiple symptom sets. We present a novel system for the Dialogue2Note Medical Summarization tasks in the MEDIQA 2023 Shared Task. Our approach for section-wise summarization (Task A) is a two-stage process of selecting semantically similar dialogues and using the top-k similar dialog
Repo: None
Keyword: text generation
Empowering NLG: Offline Reinforcement Learning for Informal Summarization in Online Domains
Authors: Zhi-Xuan Tai, Po-Chuan ChenArxiv: https://arxiv.org/abs/2306.17174
TLDR: Our research introduces an innovative Natural Language Generation (NLG) approach that aims to optimize user experience and alleviate the workload of human customer support agents. Our primary objective is to generate informal summaries for online articles and posts using an offline reinforcement learning technique. In our study, we compare our proposed method with existing approaches to text generation and provide a comprehensive overview of our architectural design, which incorporates crawling, reinforcement learning, and text generation modules. By presenting this original approach, our paper makes a
Repo: None
High-throughput Simulation of Federated Learning via Resource-Aware Client Placement
Authors: Lorenzo Sani, Pedro Porto Buarque de Gusmão, Alex Iacob, Wanru Zhao, Xinchi Qiu, Yan Gao, Javier Fernandez-Marques, Nicholas Donald LaneArxiv: https://arxiv.org/abs/2306.17453
TLDR: Federated Learning (FL) is the privacy-preserving machine learning paradigm which collaboratively trains a model across millions of devices. Simulated environments are fundamental to large-scale FL research, allowing researchers to quickly test new ideas to solve system and statistical heterogeneity issues. This work proposes \emph{Pollen}, a novel resource-aware system capable of speeding up FL simulations by efficiently placing clients across distributed and heterogeneous hardware. We propose minimising server-GPU communication and using
Repo: None
The text was updated successfully, but these errors were encountered: