Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New submissions for Mon, 3 Jul 23 #387

Open
e-tornike opened this issue Jul 3, 2023 · 0 comments
Open

New submissions for Mon, 3 Jul 23 #387

e-tornike opened this issue Jul 3, 2023 · 0 comments

Comments

@e-tornike
Copy link
Owner

Keyword: contrastive

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Authors: Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao
Arxiv: https://arxiv.org/abs/2306.17203
TLDR: The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronous Video-To-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and Audio-visual
Repo: None

Masked Contrastive Graph Representation Learning for Age Estimation

Authors: Yuntao Shou, Xiangyong Cao, Deyu Meng
Arxiv: https://arxiv.org/abs/2306.17798
TLDR: Age estimation of face images is a crucial task with various practical applications in areas such as video surveillance and Internet access control. While deep learning-based age estimation frameworks, e.g., convolutional neural network (CNN), multi-layer perceptrons (MLP), and transformers have shown remarkable performance, they have limitations when modelling complex or irregular objects in an image that contains a large amount of redundant information. To address this issue, this paper utilizes the robustness property of graph
Repo: None

A Massive Scale Semantic Similarity Dataset of Historical English

Authors: Emily Silcock, Melissa Dell
Arxiv: https://arxiv.org/abs/2306.17810
TLDR: A diversity of tasks use language models trained on semantic similarity data. While there are a variety of datasets that capture semantic similarity, they are either constructed from modern web data or are relatively small datasets created in the past decade by human annotators. This study utilizes a novel source, newly digitized articles from off-copyright, local U.S. newspapers, to assemble a massive-scale semantic similarity dataset spanning 70 years from 1920 to 1989 and containing nearly 400M positive semantic similarity pairs
Repo: None

Keyword: data augmentation

EyeBAG: Accurate Control of Eye Blink and Gaze Based on Data Augmentation Leveraging Style Mixing

Authors: Bryan S. Kim, Jeong Young Jeong, Wonjong Ryu
Arxiv: https://arxiv.org/abs/2306.17391
TLDR: Recent developments in generative models have enabled the generation of photo-realistic human face images, and downstream tasks utilizing face generation technology have advanced accordingly. However, models for downstream tasks are yet substandard at eye control (e.g. eye blink, gaze redirection). To overcome such eye control problems, we introduce a novel framework consisting of two distinct modules: a blink control module and a gaze redirect module. We also propose a novel data augmentation method to train each module
Repo: None

DeepTagger: Knowledge Enhanced Named Entity Recognition for Web-Based Ads Queries

Authors: Simiao Zuo, Pengfei Tang, Xinyu Hu, Qiang Lou, Jian Jiao, Denis Charles
Arxiv: https://arxiv.org/abs/2306.17413
TLDR: Named entity recognition (NER) is a crucial task for online advertisement. State-of-the-art solutions leverage pre-trained language models for this task. However, three major challenges remain unresolved: web queries differ from natural language; web queries are short and lack contextual information; and labeled data for NER is scarce. We propose DeepTagger, a knowledge-enhanced NER model for web-based ads queries. The proposed knowledge enhancement framework leverages both model-
Repo: None

Impact of Noise on Calibration and Generalisation of Neural Networks

Authors: Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel Rodrigues
Arxiv: https://arxiv.org/abs/2306.17630
TLDR: Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under
Repo: None

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Authors: Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz
Arxiv: https://arxiv.org/abs/2306.17848
TLDR: Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to
Repo: None

Keyword: knowledge discovery

GPT-FinRE: In-context Learning for Financial Relation Extraction using Large Language Models

Authors: Pawan Kumar Rajpoot, Ankur Parikh
Arxiv: https://arxiv.org/abs/2306.17519
TLDR: Relation extraction (RE) is a crucial task in natural language processing (NLP) that aims to identify and classify relationships between entities mentioned in text. In the financial domain, relation extraction plays a vital role in extracting valuable information from financial documents, such as news articles, earnings reports, and company filings. This paper describes our solution to relation extraction on one such dataset REFinD. The dataset was released along with shared task as a part of the Fourth Workshop on Knowledge Discovery from
Repo: None

Keyword: knowledge graph

RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care

Authors: Rakhilya Lee Mekhtieva, Brandon Forbes, Dalal Alrajeh, Brendan Delaney, Alessandra Russo
Arxiv: https://arxiv.org/abs/2306.17175
TLDR: Clinical decision-making is a fundamental stage in delivering appropriate care to patients. In recent years several decision- making systems designed to aid the clinician in this process have been developed. However, technical solutions currently in use are based on simple regression models and are only able to take into account simple pre-defined multiple-choice features, such as patient age, pre-existing conditions, smoker status, etc. One particular source of patient data, that available decision-makers systems are incapable
Repo: None

Keyword: mixup

Impact of Noise on Calibration and Generalisation of Neural Networks

Authors: Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel Rodrigues
Arxiv: https://arxiv.org/abs/2306.17630
TLDR: Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under
Repo: None

Keyword: multi-task

Progressive Multi-task Learning Framework for Chinese Text Error Correction

Authors: Shirong Ma, Yinghui Li, Haojing Huang, Shulin Huang, Yangning Li, Hai-Tao Zheng, Ying Shen
Arxiv: https://arxiv.org/abs/2306.17447
TLDR: Chinese Text Error Correction (CTEC) aims to detect and correct errors in the input text, which benefits human's daily life and various downstream tasks. Recent approaches mainly employ Pre-trained Language Models (PLMs) to resolve CTEC task and achieve tremendous success. However, previous approaches suffer from issues of over-correction and under-rection, and the former is especially conspicuous in the precision-critical CTEC tasks. To mitigate the issue of overcorrection, we propose a
Repo: None

FedBone: Towards Large-Scale Federated Multi-Task Learning

Authors: Yiqiang Chen, Teng Zhang, Xinlong Jiang, Qian Chen, Chenlong Gao, Wuliang Huang
Arxiv: https://arxiv.org/abs/2306.17465
TLDR: Heterogeneous federated multi-task learning (HFMTL) is a federated learning technique that combines heterogeneous tasks of different clients to achieve more accurate, comprehensive predictions. In real-world applications, visual and natural language tasks typically require large-scale models to extract high-level abstract features. However, large-volume models cannot be directly applied to existing federated Multi-task Learning methods. Existing HFML methods also disregard the impact of gradient conflicts on multi-source
Repo: None

Achieving RGB-D level Segmentation Performance from a Single ToF Camera

Authors: Pranav Sharma, Jigyasa Singh Katrolia, Jason Rambach, Bruno Mirbach, Didier Stricker, Juergen Seiler
Arxiv: https://arxiv.org/abs/2306.17636
TLDR: Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-d cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF camera, we introduce a method utilizing
Repo: None

Feature Representation Learning for NL2SQL Generation Based on Coupling and Decoupling

Authors: Chenduo Hao, Xu Zhang, Chuanbao Gao, Deyu Zhou
Arxiv: https://arxiv.org/abs/2306.17646
TLDR: The NL2SQL task involves parsing natural language statements into SQL queries. While most state-of-the-art methods treat NL2QL as a slot-filling task and use feature representation learning techniques, they overlook explicit correlation features between the SELECT and WHERE clauses and implicit correlation features within sub-tasks within a single clause. To address this issue, we propose the Clause Feature Correlation Decoupling and Coupling (CFCDC) model, which uses a feature representation
Repo: None

Improved NL2SQL based on Multi-layer Expert Network

Authors: Chenduo Hao, Xu Zhang
Arxiv: https://arxiv.org/abs/2306.17727
TLDR: The Natural Language to SQL (NL2SQL) technique is used to convert natural language queries into executable SQL statements. Typically, slot-filling is employed as a classification method for multi-task cases to achieve this goal. However, sloting can result in inaccurate SQL statement generation due to negative migration issues arising from different classification tasks. To overcome this limitation, this study introduces a new approach called Multi-Layer Expert Generate SQL (MLEG-SQL), which utilizes a dedicated
Repo: None

Keyword: robustness

Limits of Machine Learning for Automatic Vulnerability Detection

Authors: Niklas Risse, Marcel Böhme
Arxiv: https://arxiv.org/abs/2306.17193
TLDR: Recent results of machine learning for automatic vulnerability detection have been very promising indeed: Given only the source code of a function $f$, models trained by machine learning techniques can decide if $f$ contains a security flaw with up to 70% accuracy. But how do we know that these results are general and not specific to the datasets? To study this question, researchers proposed to amplify the testing set by injecting semantic preserving changes and found that the model's accuracy significantly drops. In other words,
Repo: None

Robust Roadside Perception for Autonomous Driving: an Annotation-free Strategy with Synthesized Data

Authors: Rusheng Zhang, Depu Meng, Lance Bassett, Shengyin Shen, Zhengxia Zou, Henry X. Liu
Arxiv: https://arxiv.org/abs/2306.17302
TLDR: Recently, with the rapid development in vehicle-to-infrastructure communication technologies, the infrastructure-based, roadside perception system for cooperative driving has become a rising field. This paper focuses on one of the most critical challenges - the data-insufficiency problem. The lacking of high-quality labeled roadside sensor data with high diversity leads to low robustness, and low transfer-ability of current roadside perception systems. In this paper, a novel approach is proposed to address this problem by creating
Repo: None

Scaling Model Checking for DNN Analysis via State-Space Reduction and Input Segmentation (Extended Version)

Authors: Mahum Naseer, Osman Hasan, Muhammad Shafique
Arxiv: https://arxiv.org/abs/2306.17323
TLDR: Owing to their remarkable learning capabilities and performance in real-world applications, the use of machine learning systems based on Neural Networks (NNs) has been continuously increasing. However, various case studies and empirical findings in the literature suggest that slight variations to NN inputs can lead to erroneous and undesirable NN behavior. This has led to considerable interest in their formal analysis, aiming to provide guarantees regarding a given NN's behavior. Existing frameworks provide robustness and/or safety guarantees
Repo: None

Decentralized Motor Skill Learning for Complex Robotic Systems

Authors: Yanjiang Guo, Zheyuan Jiang, Yen-Jen Wang, Jingyue Gao, Jianyu Chen
Arxiv: https://arxiv.org/abs/2306.17411
TLDR: Reinforcement learning (RL) has achieved remarkable success in complex robotic systems (eg. quadruped locomotion). In previous works, the RL-based controller was typically implemented as a single neural network with concatenated observation input. However, the corresponding learned policy is highly task-specific. Since all motors are controlled in a centralized way, out-of-distribution local observations can impact global motors through the single coupled neural network policy. In contrast, animals and humans can control
Repo: None

LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map

Authors: Xingyu Ji, Shenghai Yuan, Pengyu Yin, Lihua Xie
Arxiv: https://arxiv.org/abs/2306.17436
TLDR: This letter presents an accurate and robust Lidar Inertial Odometry framework. We fuse LiDAR scans with IMU data using a tightly-coupled iterative error state Kalman filter for robust and fast localization. To achieve robust correspondence matching, we represent the points as a set of Gaussian distributions and evaluate the divergence in variance for outlier rejection. Based on the fitted distributions, a new residual metric, which demonstrates an improvement from merely quantifying distance to incorporating
Repo: None

Provable Robust Watermarking for AI-Generated Text

Authors: Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang Wang
Arxiv: https://arxiv.org/abs/2306.17439
TLDR: As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated content becomes crucial. To address this challenge, we present GPTWatermark, a robust and high-quality solution designed to ascertain whether a piece of text originates from a specific model. Our approach extends existing watermarking strategies and employs a fixed group design to enhance robustness against editing and paraphrasing attacks. We show that our watermarked language model enjoys strong provable guarantees on
Repo: None

CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning

Authors: Yang Liu, Weixing Chen, Guanbin Li, Liang Lin
Arxiv: https://arxiv.org/abs/2306.17462
TLDR: We present CausalVLR (Causal Visual-Linguistic Reasoning), an open-source toolbox containing a rich set of state-of-the-art causal relation discovery and causal inference methods for various visual-linguistic reasoning tasks, such as VQA, image/video captioning, medical report generation, model generalization and robustness, etc. These methods have been included in the toolbox with PyTorch implementations under NVIDIA computing system. It
Repo: None

Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization

Authors: Stephen Hausler, Sourav Garg, Punarjay Chakravarty, Shubham Shrivastava, Ankit Vora, Michael Milford
Arxiv: https://arxiv.org/abs/2306.17529
TLDR: Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6- DoF frame-by-frame Pn
Repo: None

Towards the extraction of robust sign embeddings for low resource sign language recognition

Authors: Mathieu De Coster, Ellen Rushe, Ruth Holmes, Anthony Ventresque, Joni Dambre
Arxiv: https://arxiv.org/abs/2306.17558
TLDR: Isolated Sign Language Recognition (SLR) has mostly been applied on relatively large datasets containing signs executed slowly and clearly by a limited group of signers. In real-world scenarios, however, we are met with challenging visual conditions, coarticulated signing, small datasets, and the need for signer independent models. To tackle this difficult problem, we require a robust feature extractor to process the sign language videos. One could expect human pose estimators to be ideal candidates
Repo: None

Control of Cross-Directional Systems with Approximate Symmetries

Authors: Idris Kempf, Paul Goulart, Stephen Duncan
Arxiv: https://arxiv.org/abs/2306.17565
TLDR: Structural asymmetries of linear dynamical systems can be exploited for decoupling the dynamics and reducing the computational complexity of the controller implementation. However, in practical applications, inexact structural symmetries undermine the ability to decouple the system, resulting in the loss of any potential complexity reduction. To address this, we propose substituting an approximation with exact structural symmtries for the original system model, thereby introducing an approximation error. We focus on internal model controllers for cross-
Repo: None

Unscented Optimal Control for 3D Coverage Planning with an Autonomous UAV Agent

Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
Arxiv: https://arxiv.org/abs/2306.17588
TLDR: We propose a novel probabilistically robust controller for the guidance of an unmanned aerial vehicle (UAV) in coverage planning missions, which can simultaneously optimize both the UAV's motion, and camera control inputs for the 3D coverage of a given object of interest. Specifically, the coverage planning problem is formulated in this work as an optimal control problem with logical constraints to enable a UAV agent to jointly: a) select a series of discrete camera field-of-view states which satisfy
Repo: None

Navigation of micro-robot swarms for targeted delivery using reinforcement learning

Authors: Akshatha Jagadish, Manoj Varma
Arxiv: https://arxiv.org/abs/2306.17598
TLDR: Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfully. In this work, we use the Reinforcement Learning (RL) algorithms Proximal Policy Optimization (PPO) and Robust Policy Optim
Repo: None

Impact of Noise on Calibration and Generalisation of Neural Networks

Authors: Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel Rodrigues
Arxiv: https://arxiv.org/abs/2306.17630
TLDR: Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under
Repo: None

Masked Contrastive Graph Representation Learning for Age Estimation

Authors: Yuntao Shou, Xiangyong Cao, Deyu Meng
Arxiv: https://arxiv.org/abs/2306.17798
TLDR: Age estimation of face images is a crucial task with various practical applications in areas such as video surveillance and Internet access control. While deep learning-based age estimation frameworks, e.g., convolutional neural network (CNN), multi-layer perceptrons (MLP), and transformers have shown remarkable performance, they have limitations when modelling complex or irregular objects in an image that contains a large amount of redundant information. To address this issue, this paper utilizes the robustness property of graph
Repo: None

Keyword: semantic similarity

A Massive Scale Semantic Similarity Dataset of Historical English

Authors: Emily Silcock, Melissa Dell
Arxiv: https://arxiv.org/abs/2306.17810
TLDR: A diversity of tasks use language models trained on semantic similarity data. While there are a variety of datasets that capture semantic similarity, they are either constructed from modern web data or are relatively small datasets created in the past decade by human annotators. This study utilizes a novel source, newly digitized articles from off-copyright, local U.S. newspapers, to assemble a massive-scale semantic similarity dataset spanning 70 years from 1920 to 1989 and containing nearly 400M positive semantic similarity pairs
Repo: None

Keyword: summarization

SummQA at MEDIQA-Chat 2023:In-Context Learning with GPT-4 for Medical Summarization

Authors: Yash Mathur, Sanketh Rangreji, Raghav Kapoor, Medha Palavalli, Amanda Bertsch, Matthew R. Gormley
Arxiv: https://arxiv.org/abs/2306.17384
TLDR: Medical dialogue summarization is challenging due to the unstructured nature of medical conversations, the use of medical terminology in gold summaries, and the need to identify key information across multiple symptom sets. We present a novel system for the Dialogue2Note Medical Summarization tasks in the MEDIQA 2023 Shared Task. Our approach for section-wise summarization (Task A) is a two-stage process of selecting semantically similar dialogues and using the top-k similar dialog
Repo: None

Keyword: text generation

Empowering NLG: Offline Reinforcement Learning for Informal Summarization in Online Domains

Authors: Zhi-Xuan Tai, Po-Chuan Chen
Arxiv: https://arxiv.org/abs/2306.17174
TLDR: Our research introduces an innovative Natural Language Generation (NLG) approach that aims to optimize user experience and alleviate the workload of human customer support agents. Our primary objective is to generate informal summaries for online articles and posts using an offline reinforcement learning technique. In our study, we compare our proposed method with existing approaches to text generation and provide a comprehensive overview of our architectural design, which incorporates crawling, reinforcement learning, and text generation modules. By presenting this original approach, our paper makes a
Repo: None

High-throughput Simulation of Federated Learning via Resource-Aware Client Placement

Authors: Lorenzo Sani, Pedro Porto Buarque de Gusmão, Alex Iacob, Wanru Zhao, Xinchi Qiu, Yan Gao, Javier Fernandez-Marques, Nicholas Donald Lane
Arxiv: https://arxiv.org/abs/2306.17453
TLDR: Federated Learning (FL) is the privacy-preserving machine learning paradigm which collaboratively trains a model across millions of devices. Simulated environments are fundamental to large-scale FL research, allowing researchers to quickly test new ideas to solve system and statistical heterogeneity issues. This work proposes \emph{Pollen}, a novel resource-aware system capable of speeding up FL simulations by efficiently placing clients across distributed and heterogeneous hardware. We propose minimising server-GPU communication and using
Repo: None
@e-tornike e-tornike self-assigned this Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment