Source codes of the our paper titled "Multi-level Textual-Visual Alignment and Fusion Network for Multimodal Aspect-based Sentiment Analysis"
- For visual objects in dataset, we perform YOLOv5x6 to detect objects.
- Applying the ClipCap to generate image captions.
- The face descriptions from the raw images from FITE, many thanks.
- The OCR text of images extracted from Google's Tesseract OCR engine.
- Obtained ANPs of each image following the image preprocessing procedure of VLP-MABSA.
- Twitter2015 and Twitter2017 from BaiduYun Diskcode:5fyu.