Textual-Visual-Alignment-and-Fusion-NetWork

Source codes of the our paper titled "Multi-level Textual-Visual Alignment and Fusion Network for Multimodal Aspect-based Sentiment Analysis"

Dataset

For visual objects in dataset, we perform YOLOv5x6 to detect objects.
Applying the ClipCap to generate image captions.
The face descriptions from the raw images from FITE, many thanks.
The OCR text of images extracted from Google's Tesseract OCR engine.
Obtained ANPs of each image following the image preprocessing procedure of VLP-MABSA.
Twitter2015 and Twitter2017 from BaiduYun Diskcode:5fyu.