KIEV: Interactivity Proposals for Surveillance Videos
a new task of spatio-temporal interactivity proposals -
ImageNet-VidVRD: Video Visual Relation Detection
1,000 videos, 35 common subject/object categories and 132 relationships -
VidOR: Annotating Objects and Relations in User-Generated Videos
10,000 videos selected from YFCC100M collection, 80 object categories and 50 predicate categories -
Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks (CVPR 2020)
annotations for 180049 videos from the Something-Something Dataset -
Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs (CVPR 2020)
10K videos, 0.4M objects, 1.7M visual relationships -
VidSitu: Visual Semantic Role Labeling for Video Understanding (CVPR 2021)
29K 10-second movie clips richly annotated with a verb and semantic-roles every 2 seconds