New submissions for Fri, 2 Jun 23 #367
Labels
abstract meaning representation
argument mining
citation context analysis
computational social science
contrastive
cross-language information retrieval
cross-lingual information retrieval
data augmentation
extreme multi-label
knowledge discovery
knowledge graph
legal text
legal
mixup
multi-task
paraphrase
passage generation
plagiarism
robustness
scholarly document processing
scholarly
semantic similarity
similarity measure
simplification
summarization
text generation
Keyword: abstract meaning representation
AMR4NLI: Interpretable and robust NLI measures from semantic graphs
Authors: Juri Opitz, Shira Wein, Julius Steen, Anette Frank, Nathan SchneiderArxiv: https://arxiv.org/abs/2306.00936
TLDR: The task of natural language inference (NLI) asks whether a given premise (expressed in NL) entails a given NL hypothesis. NLI benchmarks contain human ratings of entailment, but the meaning relationships driving these ratings are not formalized. Can the underlying sentence pair relationships be made more explicit in an interpretable yet robust fashion? We compare semantic structures to represent premise and hypothesis, including sets of contextualized embeddings and semantic graphs (Abstract Meaning Representations), and measure whether
Repo: None
Keyword: contrastive
Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects
Authors: Stefan Thalhammer, Jean-Baptiste Weibel, Markus Vincze, Jose Garcia-RodriguezArxiv: https://arxiv.org/abs/2306.00129
TLDR: Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is objects unseen during training. Such works use deep template matching strategies to retrieve the closest template connected to a query image. This template retrieval implicitly provides object class and pose. Despite the recent success and improvements of Vision Transformers over CNNs for many vision tasks, the state of the art uses CNN-based approaches
Repo: None
Contrastive Hierarchical Discourse Graph for Scientific Document Summarization
Authors: Haopeng Zhang, Xiao Liu, Jiawei ZhangArxiv: https://arxiv.org/abs/2306.00177
TLDR: The extended structural context has made scientific paper summarization a challenging task. This paper proposes CHANGES, a contrastive hierarchical graph neural network for extractive scientific paper categorization and summarization of scientific papers with a hierarchical discourse graph and learns effective sentence representations with dedicated designed hierarchical graph information aggregation. We also propose a graph contrastive learning module to learn global theme-aware sentence representations. Extensive experiments on the PubMed and arXiv benchmark datasets prove the effectiveness of CHCHES and
Repo: None
CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV Perception
Authors: Jiachen Sun, Haizhong Zheng, Qingzhao Zhang, Atul Prakash, Z. Morley Mao, Chaowei XiaoArxiv: https://arxiv.org/abs/2306.00349
TLDR: Perception is crucial in the realm of autonomous driving systems, where bird's eye view (BEV)-based architectures have recently reached state-of-the-art performance. The desirability of self-supervised representation learning stems from the expensive and laborious process of annotating 2D and 3D data. Although previous research has investigated pretraining methods for both LiDAR and camera-based 3D object detection, a unified pretraining framework for multimodal BEV
Repo: None
Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning
Authors: Yuting Yang, Yuke Li, Binbin DuArxiv: https://arxiv.org/abs/2306.00755
TLDR: The unified streaming and non-streaming speech recognition model has achieved great success due to its comprehensive capabilities. In this paper, we propose to improve the accuracy of the unified model by bridging the inherent representation gap between the streaming andnon-streamed modes with a contrastive objective. Specifically, the top-layer hidden representation at the same frame of the streaming (non-Streaming mode) is regarded as a positive pair, encouraging the representation of the stream mode close to its non
Repo: None
Topic-Guided Sampling For Data-Efficient Multi-Domain Stance Detection
Authors: Erik Arakelyan, Arnav Arora, Isabelle AugensteinArxiv: https://arxiv.org/abs/2306.00765
TLDR: Stance Detection is concerned with identifying the attitudes expressed by an author towards a target of interest. This task spans a variety of domains ranging from social media opinion identification to detecting the stance for a legal claim. However, the framing of the task varies within these domains, in terms of the data collection protocol, the label dictionary and the number of available annotations. Furthermore, these stance annotations are significantly imbalanced on a per-topic and inter-topic basis. These make multi-domain stance
Repo: None
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation
Authors: Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep RavikumarArxiv: https://arxiv.org/abs/2306.00788
TLDR: Good data augmentation is one of the key factors that lead to the empirical success of self-supervised representation learning such as contrastive learning and masked language modeling, yet theoretical understanding of its role in learning good representations remains limited. Recent work has built the connection between self-Supervised learning and approximating the top eigenspace of a graph Laplacian operator. Learning a linear probe on top of such features can naturally be connected to RKHS regression. In this
Repo: None
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning
Authors: Xiao Dong, Runhui Huang, Xiaoyong Wei, Zequn Jie, Jianxing Yu, Jian Yin, Xiaodan LiangArxiv: https://arxiv.org/abs/2306.00813
TLDR: Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e.g., image-text semantic alignment) and image synthesis. On the other hand, fine-tuning pre-trained models with discriminative or generative capabilities such as CLIP and Stable Diffusion on domain-specific datasets has shown to be effective in various tasks by adapting to specific domains. However, few studies have explored the possibility of learning both discrim
Repo: None
Domain Generalization for Domain-Linked Classes
Authors: Kimathi Kaai, Saad Hossain, Sirisha RambhatlaArxiv: https://arxiv.org/abs/2306.00879
TLDR: Domain generalization (DG) focuses on transferring domain-invariant knowledge from multiple source domains (available at train time) to an, a priori, unseen target domain(s). This requires a class to be expressed in multiple domains for the learning algorithm to break the spurious correlations between domain and class. However, in the real-world, classes may often be domain-linked, i.e. expressed only in a specific domain, which leads to extremely poor generalization
Repo: None
LIV: Language-Image Representations and Rewards for Robotic Control
Authors: Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh JayaramanArxiv: https://arxiv.org/abs/2306.00958
TLDR: We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. Exploiting a novel connection between dual reinforcement learning and mutual information contrastive learning, the LIV objective trains a multi-modal representation that implicitly encodes a universal value function for tasks specified as language or image goals. We use LIV to pre-train the first control-centric vision- language representation from large human video
Repo: None
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Authors: Yonglong Tian, Lijie Fan, Phillip Isola, Huiwen Chang, Dilip KrishnanArxiv: https://arxiv.org/abs/2306.00984
TLDR: We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural question in the light of the excellent performance of such models in generating high-quality images. We consider specifically the Stable Diffusion, one of the leading open source text-and-image model. We show that (1) when the generative model is configured with proper classifier-free guidance scale, training self-supervised methods on synthetic images can match
Repo: None
Keyword: data augmentation
Building Manufacturing Deep Learning Models with Minimal and Imbalanced Training Data Using Domain Adaptation and Data Augmentation
Authors: Adrian Shuai Li, Elisa Bertino, Rih-Teng Wu, Ting-Yan WuArxiv: https://arxiv.org/abs/2306.00202
TLDR: Deep learning (DL) techniques are highly effective for defect detection from images. Training DL classification models, however, requires vast amounts of labeled data which is often expensive to collect. In many cases, not only the available training data is limited but may also imbalanced. In this paper, we propose a novel domain adaptation (DA) approach to address the problem of labeled training data scarcity for a target learning task by transferring knowledge gained from an existing source dataset used for a similar learning task.
Repo: None
AfriNames: Most ASR models "butcher" African Names
Authors: Tobi Olatunji, Tejumade Afonja, Bonaventure F. P. Dossou, Atnafu Lambebo Tonja, Chris Chinenye Emezue, Amina Mardiyyah Rufai, Sahib SinghArxiv: https://arxiv.org/abs/2306.00253
TLDR: Useful conversational agents must accurately capture named entities to minimize error for downstream tasks, for example, asking a voice assistant to play a track from a certain artist, initiating navigation to a specific location, or documenting a laboratory result for a patient. However, where named entities such as ``Ukachukwu`` (Igbo), ``Lakicia`` (Swahili), or ``Ingabire`` (Rwandan) are spoken, automatic speech recognition (ASR
Repo: None
Provable Benefit of Mixup for Finding Optimal Decision Boundaries
Authors: Junsoo Oh, Chulee YunArxiv: https://arxiv.org/abs/2306.00267
TLDR: We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. For a family of data distributions with a separability constant
Repo: None
CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification
Authors: Akbar Karimi, Lucie FlekArxiv: https://arxiv.org/abs/2306.00346
TLDR: The class imbalance problem can cause machine learning models to produce an undesirable performance on the minority class as well as the whole dataset. Using data augmentation techniques to increase the number of samples is one way to tackle this problem. We introduce a novel counterfactual data auguration by using a method that can be used as a replacement for the identification of medical claims. In addition, we investigate the impact of this method and compare it with 3 other data augrating techniques, showing that the proposed
Repo: None
A Novel Driver Distraction Behavior Detection Based on Self-Supervised Learning Framework with Masked Image Modeling
Authors: Yingzhi Zhang, Taiguo Li, Chao Li, Xinghong ZhouArxiv: https://arxiv.org/abs/2306.00543
TLDR: Driver distraction causes a significant number of traffic accidents every year, resulting in economic losses and casualties. Currently, the level of automation in commercial vehicles is far from completely unmanned, and drivers still play an important role in operating and controlling the vehicle. Therefore, driver distraction behavior detection is crucial for road safety. At present, driver distract detection primarily relies on traditional Convolutional Neural Networks (CNN) and supervised learning methods. However, there are still challenges such as the high cost of labeled datasets
Repo: None
A Uniform Confidence Phenomenon in Deep Learning and its Implications for Calibration
Authors: Muthu Chidambaram, Rong GeArxiv: https://arxiv.org/abs/2306.00740
TLDR: Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to poorly estimate their predictive uncertainty - in other words, they are frequently overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form of modified training schemes and post-training calibration procedures. In this work, we present a significant hurdle to the calibration of modern models: deep neural nets have large neighborhoods of almost certain confidence around their training
Repo: None
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation
Authors: Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep RavikumarArxiv: https://arxiv.org/abs/2306.00788
TLDR: Good data augmentation is one of the key factors that lead to the empirical success of self-supervised representation learning such as contrastive learning and masked language modeling, yet theoretical understanding of its role in learning good representations remains limited. Recent work has built the connection between self-Supervised learning and approximating the top eigenspace of a graph Laplacian operator. Learning a linear probe on top of such features can naturally be connected to RKHS regression. In this
Repo: None
Geo-Tiles for Semantic Segmentation of Earth Observation Imagery
Authors: Sebastian Bullinger, Florian Fevers, Christoph Bodensteiner, Michael ArensArxiv: https://arxiv.org/abs/2306.00823
TLDR: To cope with the high requirements during the computation of semantic segmentations of earth observation imagery, current state-of-the-art pipelines divide the corresponding data into smaller images. Existing methods and benchmark datasets oftentimes rely on pixel-based tiling schemes or on geo-tiling schemes employed by web mapping applications. The selection of the subimages (comprising size, location and orientation) is crucial since it affects the available context information of each pixel, defines the number of tiles
Repo: None
ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
Authors: Sreyan Ghosh, Utkarsh Tyagi, Manan Suri, Sonal Kumar, S Ramaneswaran, Dinesh ManochaArxiv: https://arxiv.org/abs/2306.00928
TLDR: Complex Named Entity Recognition (NER) is the task of detecting linguistically complex named entities in low-context text. In this paper, we present ACLM Attention-map aware keyword selection for Conditional Language Model fine-tuning), a novel data augmentation approach based on conditional generation to address the data scarcity problem in Low-resource complex NER. ACLM alleviates the context-entity mismatch issue, a problem existing NER data auguration techniques suffer from and often
Repo: None
Keyword: knowledge graph
Column Type Annotation using ChatGPT
Authors: Keti Korini, Christian BizerArxiv: https://arxiv.org/abs/2306.00745
TLDR: Column type annotation, is the task of annotating the columns of a relational table with the semantic type of the values contained in each column. Column type annotation is a crucial pre-processing step for data search and integration in the context of data lakes. State-of-the-art column type annotation methods either rely on matching table columns to properties of a knowledge graph or fine-tune pre-trained language models such as BERT. In this work, we take a different approach
Repo: None
Keyword: legal
Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches
Authors: Daniel da Silva Junior, Paulo Roberto dos S. Corval, Aline Paes, Daniel de OliveiraArxiv: https://arxiv.org/abs/2306.00007
TLDR: The Brazilian judiciary has a large workload, resulting in a long time to finish legal proceedings. Brazilian National Council of Justice has established in Resolution 469/2022 formal guidance for document and process digitalization opening up the possibility of using automatic techniques to help with everyday tasks in the legal field, particularly in a large number of texts yielded on the routine of law procedures. Notably, Artificial Intelligence (AI) techniques allow for processing and extracting useful information from textual data, potentially speeding up the process
Repo: None
AI Imagery and the Overton Window
Authors: Sarah K. AmerArxiv: https://arxiv.org/abs/2306.00080
TLDR: AI-based text-to-image generation has undergone a significant leap in the production of visually comprehensive and aesthetic imagery over the past year, to the point where differentiating between a man-made piece of art and an AI-generated image is becoming more difficult. Generative Models such as Stable Diffusion, Midjourney and others are expected to affect several major industries in technological and ethical aspects. Striking the balance between raising human standard of life and work vs exploiting one group
Repo: None
Graph Colouring is Hard for Algorithms Based on Hilbert's Nullstellensatz and Gröbner Bases
Authors: Massimo Lauria, Jakob NordströmArxiv: https://arxiv.org/abs/2306.00125
TLDR: We consider the graph
Repo: None
Sustainable AI Regulation
Authors: Philipp HackerArxiv: https://arxiv.org/abs/2306.00292
TLDR: This paper suggests that AI regulation needs a shift from trustworthiness to sustainability. With the carbon footprint of large generative AI models like ChatGPT or GPT-4 adding urgency to this goal, the paper develops a roadmap to make AI, and technology more broadly, environmentally sustainable. It explores two key dimensions: legal instruments to makeAI greener; and methods to render AI regulation more sustainable. Concerning the former, transparency mechanisms, such as the disclosure of the GHG footprint
Repo: None
Towards Argument-Aware Abstractive Summarization of Long Legal Opinions with Summary Reranking
Authors: Mohamed Elaraby, Yang Zhong, Diane LitmanArxiv: https://arxiv.org/abs/2306.00672
TLDR: We propose a simple approach for the abstractive summarization of long legal opinions that considers the argument structure of the document. Legal opinions often contain complex and nuanced argumentation, making it challenging to generate a concise summary that accurately captures the main points of the legal opinion. Our approach involves using argument role information to generate multiple candidate summaries, then reranking these candidates based on alignment with the document's argument structure. We demonstrate the effectiveness of our approach on a dataset of very large legal opinions
Repo: None
Topic-Guided Sampling For Data-Efficient Multi-Domain Stance Detection
Authors: Erik Arakelyan, Arnav Arora, Isabelle AugensteinArxiv: https://arxiv.org/abs/2306.00765
TLDR: Stance Detection is concerned with identifying the attitudes expressed by an author towards a target of interest. This task spans a variety of domains ranging from social media opinion identification to detecting the stance for a legal claim. However, the framing of the task varies within these domains, in terms of the data collection protocol, the label dictionary and the number of available annotations. Furthermore, these stance annotations are significantly imbalanced on a per-topic and inter-topic basis. These make multi-domain stance
Repo: None
Keyword: mixup
Provable Benefit of Mixup for Finding Optimal Decision Boundaries
Authors: Junsoo Oh, Chulee YunArxiv: https://arxiv.org/abs/2306.00267
TLDR: We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. For a family of data distributions with a separability constant
Repo: None
A Uniform Confidence Phenomenon in Deep Learning and its Implications for Calibration
Authors: Muthu Chidambaram, Rong GeArxiv: https://arxiv.org/abs/2306.00740
TLDR: Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to poorly estimate their predictive uncertainty - in other words, they are frequently overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form of modified training schemes and post-training calibration procedures. In this work, we present a significant hurdle to the calibration of modern models: deep neural nets have large neighborhoods of almost certain confidence around their training
Repo: None
Keyword: multi-task
Addressing Negative Transfer in Diffusion Models
Authors: Hyojun Go, JinYoung Kim, Yunsung Lee, Seunghyun Lee, Shinhyeok Oh, Hyeongdon Moon, Seungtaek ChoiArxiv: https://arxiv.org/abs/2306.00354
TLDR: Diffusion-based generative models have achieved remarkable success in various domains. It trains a model on denoising tasks that encompass different noise levels simultaneously, representing a form of multi-task learning (MTL). However, analyzing and improving diffusion models from an MTL perspective remains under-explored. In particular, MTL can sometimes lead to the well-known phenomenon of
Repo: None
Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home
Authors: Eda Okur, Roddy Fuentes Alba, Saurav Sahay, Lama NachmanArxiv: https://arxiv.org/abs/2306.00482
TLDR: Enriching the quality of early childhood education with interactive math learning at home systems, empowered by recent advances in conversational AI technologies, is slowly becoming a reality. With this motivation, we implement a multimodal dialogue system to support play-based learning experiences at home, guiding kids to master basic math concepts. This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space, with cascading Automatic Speech Recognition (ASR
Repo: None
RHFedMTL: Resource-Aware Hierarchical Federated Multi-Task Learning
Authors: Xingfu Yi, Rongpeng Li, Chenghui Peng, Fei Wang, Jianjun Wu, Zhifeng ZhaoArxiv: https://arxiv.org/abs/2306.00675
TLDR: The rapid development of artificial intelligence (AI) over massive applications including Internet-of-things on cellular network raises the concern of technical challenges such as privacy, heterogeneity and resource efficiency. Federated learning is an effective way to enable AI over massive distributed nodes with security. However, conventional works mostly focus on learning a single global model for a unique task across the network, and are generally less competent to handle multi-task learning (MTL) scenarios with stragglers at the expense of
Repo: None
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
Authors: Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun WuArxiv: https://arxiv.org/abs/2306.00923
TLDR: Developing embodied agents in simulation has been a key research topic in recent years. Exciting new tasks, algorithms, and benchmarks have been developed in various simulators. However, most of them assume deaf agents in silent environments, while we humans perceive the world with multiple senses. We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear. Sonicverse models realistic continuous audio rendering in 3D environments in real
Repo: None
Keyword: paraphrase
BiSync: A Bilingual Editor for Synchronized Monolingual Texts
Authors: Josep Crego, Jitao Xu, François YvonArxiv: https://arxiv.org/abs/2306.00400
TLDR: In our globalized world, a growing number of situations arise where people are required to communicate in one or several foreign languages. In the case of written communication, users with a good command of a foreign language may find assistance from computer-aided translation (CAT) technologies. These technologies often allow users to access external resources, such as dictionaries, terminologies or bilingual concordancers, thereby interrupting and considerably hindering the writing process. In addition, CAT systems assume that
Repo: None
Uncertainty-Aware Unlikelihood Learning Improves Generative Aspect Sentiment Quad Prediction
Authors: Mengting Hu, Yinhao Bai, Yike Wu, Zhen Zhang, Liqi Zhang, Hang Gao, Shiwan Zhao, Minlie HuangArxiv: https://arxiv.org/abs/2306.00418
TLDR: Recently, aspect sentiment quad prediction has received widespread attention in the field of aspect-based sentiment analysis. Existing studies extract quadruplets via pre-trained generative language models to paraphrase the original sentence into a templated target sequence. However, previous works only focus on what to generate but ignore what not to generate. We argue that considering the negative samples also leads to potential benefits. In this work, we propose a template-agnostic method to control the token-level generation
Repo: None
Keyword: robustness
Explainability in Simplicial Map Neural Networks
Authors: Eduardo Paluzo-Hidalgo, Miguel A. Gutiérrez-Naranjo, Rocio Gonzalez-DiazArxiv: https://arxiv.org/abs/2306.00010
TLDR: Simplicial map neural networks (SMNNs) are topology-based neural networks with interesting properties such as universal approximation capability and robustness to adversarial examples under appropriate conditions. However, SMNNs present some bottlenecks for their possible application in high dimensions. First, no SMNN training process has been defined so far. Second, SMNss require the construction of a convex polytope surrounding the input dataset. In this paper, we propose a SM
Repo: None
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Authors: Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Yike Guo, Jie FuArxiv: https://arxiv.org/abs/2306.00107
TLDR: Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music underem
Repo: None
Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis
Authors: Angtian Wang, Wufei Ma, Alan Yuille, Adam KortylewskiArxiv: https://arxiv.org/abs/2306.00118
TLDR: Human vision demonstrates higher robustness than current AI algorithms under out-of-distribution scenarios. It has been conjectured such robustness benefits from performing analysis-by-synthesis. Our paper formulates triple vision tasks in a consistent manner using 3D modeling and 3D rendering. We demonstrate that we can perform 3D image-based 3D model-based 4D modeling using render-and-compare algorithms on neural features. In this work, we introduce Neural Text
Repo: None
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
Authors: Wei Xiao, Tsun-Hsuan Wang, Chuang Gan, Daniela RusArxiv: https://arxiv.org/abs/2306.00148
TLDR: Diffusion model-based approaches have shown promise in data-driven planning, but there are no safety guarantees, thus making it hard to be applied for safety-critical applications. To address these challenges, we propose a new method, called SafeDiffuser, to ensure diffusion probabilistic models satisfy specifications by using a class of control barrier functions. The key idea of our approach is to embed the proposed finite-time diffusion invariance into the denoising diffusion procedure, which enables trustworthy
Repo: None
Measuring the Robustness of Natural Language Processing Models to Domain Shifts
Authors: Nitay Calderon, Naveh Porat, Eyal Ben-David, Zorik Gekhman, Nadav Oved, Roi ReichartArxiv: https://arxiv.org/abs/2306.00168
TLDR: Large Language Models have shown promising performance on various tasks, including fine-tuning, few-shot learning, and zero-shotlearning. However, their performance on domains without labeled data still lags behind those with labeled data, which we refer as the Domain Robustness (DR) challenge. Existing research on DR suffers from disparate setups, lack of evaluation task variety, and reliance on challenge sets. In this paper, we explore the DR challenge of both fine-tun
Repo: None
Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees
Authors: Pengfei Li, Jianyi Yang, Shaolei RenArxiv: https://arxiv.org/abs/2306.00172
TLDR: Many problems, such as online ad display, can be formulated as online bipartite matching. The crucial challenge lies in the nature of sequentially-revealed online item information, based on which we make irreversible matching decisions at each step. While numerous expert online algorithms have been proposed with bounded worst-case competitive ratios, they may not offer satisfactory performance in average cases. On the other hand, reinforcement learning (RL) has been applied to improve the average performance, but it lacks robust
Repo: None
Accelerated Fingerprint Enhancement: A GPU-Optimized Mixed Architecture Approach
Authors: André Brasil Vieira Wyzykowski, Anil K. JainArxiv: https://arxiv.org/abs/2306.00272
TLDR: This document presents a preliminary approach to latent fingerprint enhancement, fundamentally designed around a mixed Unet architecture. It combines the capabilities of the Resnet-101 network and Unet encoder, aiming to form a potentially powerful composite. This combination, enhanced with attention mechanisms and forward skip connections, is intended to optimize the enhancement of ridge and minutiae features in fingerprints. One innovative element of this approach includes a novel Fingerprint Enhancement Gabor layer, specifically designed for GPU computations. This
Repo: None
Efficient Deep Learning of Robust Policies from MPC using Imitation and Tube-Guided Data Augmentation
Authors: Andrea Tagliabue, Jonathan P. HowArxiv: https://arxiv.org/abs/2306.00286
TLDR: Imitation Learning (IL) has been increasingly employed to generate computationally efficient policies from task-relevant demonstrations provided by Model Predictive Control (MPC). However, commonly employed IL methods are often data- and computationally-inefficient, as they require a large number of MPC demonstrations, resulting in long training times, and they produce policies with limited robustness to disturbances not experienced during training. In this work, we propose an IL strategy to efficiently compress a computationally expensive
Repo: None
CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV Perception
Authors: Jiachen Sun, Haizhong Zheng, Qingzhao Zhang, Atul Prakash, Z. Morley Mao, Chaowei XiaoArxiv: https://arxiv.org/abs/2306.00349
TLDR: Perception is crucial in the realm of autonomous driving systems, where bird's eye view (BEV)-based architectures have recently reached state-of-the-art performance. The desirability of self-supervised representation learning stems from the expensive and laborious process of annotating 2D and 3D data. Although previous research has investigated pretraining methods for both LiDAR and camera-based 3D object detection, a unified pretraining framework for multimodal BEV
Repo: None
The Survey, Taxonomy, and Future Directions of Trustworthy AI: A Meta Decision of Strategic Decisions
Authors: Caesar Wu, Yuan-Fang Lib, Pascal BouvryArxiv: https://arxiv.org/abs/2306.00380
TLDR: When making strategic decisions, we are often confronted with overwhelming information to process. The situation can be further complicated when some pieces of evidence are contradicted each other or paradoxical. The challenge then becomes how to determine which information is useful and which ones should be eliminated. This process is known as meta-decision. Likewise, when it comes to using Artificial Intelligence (AI) systems for strategic decision-making, placing trust in the AI itself becomes a meta-Decision, given that many
Repo: None
Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking
Authors: Qingyue Wang, Liang Ding, Yanan Cao, Yibing Zhan, Zheng Lin, Shi Wang, Dacheng Tao, Li GuoArxiv: https://arxiv.org/abs/2306.00434
TLDR: Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data. Existing works mainly study common data- or model-level augmentation methods to enhance the generalization but fail to effectively decouple the semantics of samples, limiting the zero-shot performance of DST. In this paper, we present a simple and effective "divide, conquer and combine" solution, which explicitly
Repo: None
FMapping: Factorized Efficient Neural Field Mapping for Real-Time Dense RGB SLAM
Authors: Tongyan Hua, Haotian Bai, Zidong Cao, Lin WangArxiv: https://arxiv.org/abs/2306.00579
TLDR: In this paper, we introduce FMapping, an efficient neural field mapping framework that facilitates the continuous estimation of a colorized point cloud map in real-time dense RGB SLAM. To achieve this challenging goal without depth, a hurdle is how to improve efficiency and reduce the mapping uncertainty of the RGB SLam system. To this end, we first build up a theoretical analysis by decomposing the SLAM system into tracking and mapping parts, and the mapping uncertainties is explicitly defined within the frame
Repo: None
Adversarial Robustness in Unsupervised Machine Learning: A Systematic Review
Authors: Mathias Lundteigen Mohus, Jinyue LiArxiv: https://arxiv.org/abs/2306.00687
TLDR: As the adoption of machine learning models increases, ensuring robust models against adversarial attacks is increasingly important. With unsupervised machine learning gaining more attention, ensuring it is robust against attacks is vital. This paper conducts a systematic literature review on the robustness of unsupervisory learning, collecting 86 papers. Our results show that most research focuses on privacy attacks, which have effective defenses; however, many attacks lack effective and general defensive measures. Based on the results, we formulate a model on
Repo: None
SQL-PaLM: Improved Large Language ModelAdaptation for Text-to-SQL
Authors: Ruoxi Sun, Sercan O Arik, Hootan Nakhost, Hanjun Dai, Rajarishi Sinha, Pengcheng Yin, Tomas PfisterArxiv: https://arxiv.org/abs/2306.00739
TLDR: One impressive emergent capability of large language models (LLMs) is generation of code, including Structured Query Language (SQL) for databases. For the task of converting natural language text to SQL queries, Text-to-SQL, adaptation of LLMs is of paramount importance, both in in-context learning and fine-tuning settings, depending on the amount of adaptation data used. In this paper, we propose an LLM-based Text- to-SQL model SQL-
Repo: None
SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Authors: Mirazul Haque, Rutvij Shah, Simin Chen, Berrak Şişman, Cong Liu, Wei YangArxiv: https://arxiv.org/abs/2306.00794
TLDR: Deep Learning (DL) models have been popular nowadays to execute different speech-related tasks, including automatic speech recognition (ASR). As ASR is being used in different real-time scenarios, it is important that the ASR model remains efficient against minor perturbations to the input. Hence, evaluating efficiency robustness of the AS R model is the need of the hour. We show that popular ASR models like Speech2Text model and Whisper model have dynamic computation based on
Repo: None
Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers
Authors: Ruotong Wang, Hongrui Chen, Zihao Zhu, Li Liu, Yong Zhang, Yanbo Fan, Baoyuan WuArxiv: https://arxiv.org/abs/2306.00816
TLDR: Deep neural networks (DNNs) can be manipulated to exhibit specific behaviors when exposed to specific trigger patterns, without affecting their performance on normal samples. This type of attack is known as a backdoor attack. Recent research has focused on designing invisible triggers for backdoor attacks to ensure visual stealthiness. These triggers have demonstrated strong attack performance even under backdoor defense, which aims to eliminate or suppress the backdoor effect in the model. However, through experimental observations, we have noticed that these carefully designed invisible
Repo: None
Keyword: semantic similarity
Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches
Authors: Daniel da Silva Junior, Paulo Roberto dos S. Corval, Aline Paes, Daniel de OliveiraArxiv: https://arxiv.org/abs/2306.00007
TLDR: The Brazilian judiciary has a large workload, resulting in a long time to finish legal proceedings. Brazilian National Council of Justice has established in Resolution 469/2022 formal guidance for document and process digitalization opening up the possibility of using automatic techniques to help with everyday tasks in the legal field, particularly in a large number of texts yielded on the routine of law procedures. Notably, Artificial Intelligence (AI) techniques allow for processing and extracting useful information from textual data, potentially speeding up the process
Repo: None
Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity
Authors: Katharina Hämmerl, Alina Fastowski, Jindřich Libovický, Alexander FraserArxiv: https://arxiv.org/abs/2306.00458
TLDR: Previous work has shown that the representations output by contextual language models are more anisotropic than static type embeddings, and typically display outlier dimensions. This seems to be true for both monolingual and multilingual models, although much less work has been done on the multilingual context. Why these outliers occur and how they affect the representations is still an active area of research. We investigate outlier dimension and their relationship to anisotropy in multiple pre-trained mult
Repo: None
Vocabulary-free Image Classification
Authors: Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa RicciArxiv: https://arxiv.org/abs/2306.00917
TLDR: Recent advances in large vision-language models have revolutionized the image classification paradigm. Despite showing impressive zero-shot capabilities, a pre-defined set of categories, a.k.a. the vocabulary, is assumed at test time for composing the textual prompts. However, such assumption can be impractical when the semantic context is unknown and evolving. We thus formalize a novel task, termed as Vocabulary-free Image Classification (VIC), where we aim to assign to an input image
Repo: None
Keyword: similarity measure
End-to-End Document Classification and Key Information Extraction using Assignment Optimization
Authors: Ciaran Cooney, Joana Cavadas, Liam Madigan, Bradley Savage, Rachel Heyburn, Mairead O'CuinnArxiv: https://arxiv.org/abs/2306.00750
TLDR: We propose end-to-end document classification and key information extraction (KIE) for automating document processing in forms. Through accurate document classification we harness known information from templates to enhance KIE from forms. We use text and layout encoding with a cosine similarity measure to classify visually-similar documents. We then demonstrate a novel application of mixed integer programming by using assignment optimization to extract key information from documents. Our approach is validated on an in-house dataset of noisy scanned forms.
Repo: None
Cross Modal Data Discovery over Structured and Unstructured Data Lakes
Authors: Mohamed Y. Eltabakh, Mayuresh Kunjir, Ahmed Elmagarmid, Mohammad Shahmeer AhmadArxiv: https://arxiv.org/abs/2306.00932
TLDR: Organizations are collecting increasingly large amounts of data for data driven decision making. These data are often dumped into a centralized repository, e.g., a data lake, consisting of thousands of structured and unstructured datasets. Perversely, such mixture of datasets makes the problem of discovering elements (e.g, tables or documents) that are relevant to a user's query or an analytical task very challenging. Despite the recent efforts in data discovery, the problem remains widely open especially in the
Repo: None
Keyword: summarization
Contrastive Hierarchical Discourse Graph for Scientific Document Summarization
Authors: Haopeng Zhang, Xiao Liu, Jiawei ZhangArxiv: https://arxiv.org/abs/2306.00177
TLDR: The extended structural context has made scientific paper summarization a challenging task. This paper proposes CHANGES, a contrastive hierarchical graph neural network for extractive scientific paper categorization and summarization of scientific papers with a hierarchical discourse graph and learns effective sentence representations with dedicated designed hierarchical graph information aggregation. We also propose a graph contrastive learning module to learn global theme-aware sentence representations. Extensive experiments on the PubMed and arXiv benchmark datasets prove the effectiveness of CHCHES and
Repo: None
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Authors: Paul Roit, Johan Ferret, Lior Shani, Roee Aharoni, Geoffrey Cideron, Robert Dadashi, Matthieu Geist, Sertan Girgin, Léonard Hussenot, Orgad Keller, Nikola Momchev, Sabela Ramos, Piotr Stanczyk, Nino Vieillard, Olivier Bachem, Gal Elidan, Avinatan Hassidim, Olivier Pietquin, Idan SzpektorArxiv: https://arxiv.org/abs/2306.00186
TLDR: Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work, we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may
Repo: None
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Authors: Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan ZhouArxiv: https://arxiv.org/abs/2306.00398
TLDR: Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the token level. There is, therefore, a granularity mismatch between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token
Repo: None
Towards Argument-Aware Abstractive Summarization of Long Legal Opinions with Summary Reranking
Authors: Mohamed Elaraby, Yang Zhong, Diane LitmanArxiv: https://arxiv.org/abs/2306.00672
TLDR: We propose a simple approach for the abstractive summarization of long legal opinions that considers the argument structure of the document. Legal opinions often contain complex and nuanced argumentation, making it challenging to generate a concise summary that accurately captures the main points of the legal opinion. Our approach involves using argument role information to generate multiple candidate summaries, then reranking these candidates based on alignment with the document's argument structure. We demonstrate the effectiveness of our approach on a dataset of very large legal opinions
Repo: None
Keyword: text generation
Pre-Trained Language-Meaning Models for Multilingual Parsing and Generation
Authors: Chunliu Wang, Huiyuan Lai, Malvina Nissim, Johan BosArxiv: https://arxiv.org/abs/2306.00124
TLDR: Pre-trained languages models (PLMs) have achieved great success in NLP and have recently been used for tasks in computational semantics. However, these tasks do not fully benefit from PLMs since meaning representations are not explicitly included in the pre-training stage. We introduce multilingual pre-trained language-meaning models based on Discourse Representation Structures (DRSs), including meaning representations besides natural language texts in the same model, and design a new strategy to reduce the gap
Repo: None
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Authors: Paul Roit, Johan Ferret, Lior Shani, Roee Aharoni, Geoffrey Cideron, Robert Dadashi, Matthieu Geist, Sertan Girgin, Léonard Hussenot, Orgad Keller, Nikola Momchev, Sabela Ramos, Piotr Stanczyk, Nino Vieillard, Olivier Bachem, Gal Elidan, Avinatan Hassidim, Olivier Pietquin, Idan SzpektorArxiv: https://arxiv.org/abs/2306.00186
TLDR: Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work, we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may
Repo: None
Focused Prefix Tuning for Controllable Text Generation
Authors: Congda Ma, Tianyu Zhao, Makoto Shing, Kei Sawada, Manabu OkumuraArxiv: https://arxiv.org/abs/2306.00369
TLDR: In a controllable text generation dataset, there exist unannotated attributes that could provide irrelevant learning signals to models that use it for training and thus degrade their performance. We propose focused prefix tuning(FPT) to mitigate the problem and to enable the control to focus on the desired attribute. Experimental results show that FPT can achieve better control accuracy and text fluency than baseline models in single-attribute control tasks. In multi-model control tasks, FPT achieves comparable control accuracy
Repo: None
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Authors: Rahul Madhavan, Rishabh Garg, Kahini Wadhawan, Sameep MehtaArxiv: https://arxiv.org/abs/2306.00374
TLDR: We propose a method to control the attributes of Language Models (LMs) for the text generation task using Causal Average Treatment Effect (ATE) scores and counterfactual augmentation. We explore this method, in the context of LM detoxification, and propose the Causally Fair Language (CFL) architecture for detoxifying pre-trained LMs in a plug-and-play manner. Our architecture is based on a Structural Causal Model (SCM) that is
Repo: None
EEL: Efficiently Encoding Lattices for Reranking
Authors: Prasann Singhal, Jiacheng Xu, Xi Ye, Greg DurrettArxiv: https://arxiv.org/abs/2306.00947
TLDR: Standard decoding approaches for conditional text generation tasks typically search for an output hypothesis with high model probability, but this may not yield the best hypothesis according to human judgments of quality. Reranking to optimize for "downstream" metrics can better optimize for quality, but many metrics of interest are computed with pre-trained language models, which are slow to apply to large numbers of hypotheses. We explore an approach for reranking hypotheses by using Transformers to efficiently encode lattices of generated outputs, a
Repo: None
The text was updated successfully, but these errors were encountered: