Skip to content

Latest commit

 

History

History
10 lines (9 loc) · 864 Bytes

README.md

File metadata and controls

10 lines (9 loc) · 864 Bytes

Textual-Visual-Alignment-and-Fusion-NetWork

Source codes of the our paper titled "Multi-level Textual-Visual Alignment and Fusion Network for Multimodal Aspect-based Sentiment Analysis"

Dataset

  • For visual objects in dataset, we perform YOLOv5x6 to detect objects.
  • Applying the ClipCap to generate image captions.
  • The face descriptions from the raw images from FITE, many thanks.
  • The OCR text of images extracted from Google's Tesseract OCR engine.
  • Obtained ANPs of each image following the image preprocessing procedure of VLP-MABSA.
  • Twitter2015 and Twitter2017 from BaiduYun Diskcode:5fyu.