-
ImageNet VID
[Paper][Homepage]
30 categories, train: 3,862 video snippets, validation: 555 snippets -
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
[Paper][Homepage]
380,000 video segments about 19s long, 5.6 M bounding boxes, 23 types of objects -
Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations (CVPR 2021)
[Paper][Homepage]
15K annotated video clips supplemented with over 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes, manually annotated 3D bounding boxes for each object -
DroneCrowd: Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark (CVPR 2021)
[Paper][Homepage]
112 video clips with 33,600 HD frames in various scenarios, 20,800 people trajectories with 4.8 million heads and several video-level attributes -
BOLD: Detecting Biological Locomotion in Video: A Computational Approach
[Paper]
1,348 videos, objects: human, terrestrial quadruped, bird, reptile, cetacean, seal, fish, stingray, eel, sea snake, insects, spiders, scorpion, lobster, ball, car, train, motorbike, submarine, airplane, helicopter, rocket, oscillating stuff -
Water detection through spatio-temporal invariant descriptors
[Paper][Dataset]
260 videos