Textual-Visual-Alignment-and-Fusion-NetWork

Source codes of the our paper titled "Multi-level Textual-Visual Alignment and Fusion Network for Multimodal Aspect-based Sentiment Analysis"

Dataset

For visual objects in dataset, we perform YOLOv5x6 to detect objects.
Applying the ClipCap to generate image captions.
The face descriptions from the raw images from FITE, many thanks.
The OCR text of images extracted from Google's Tesseract OCR engine.
Obtained ANPs of each image following the image preprocessing procedure of VLP-MABSA.
Twitter2015 and Twitter2017 from BaiduYun Diskcode:5fyu.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
model		model
models		models
modules		modules
probes		probes
squad		squad
MTVAF_training.py		MTVAF_training.py
README.md		README.md