We have successfully pre-trained and fine-tuned our AdaMAE on Kinetics400, Something-Something-V2, UCF101 and HMDB51 with this codebase.
-
The pre-processing of Something-Something-V2 can be summarized into 3 steps:
-
Download the dataset from official website.
-
Preprocess the dataset by changing the video extension from
webm
to.mp4
with the original height of 240px.. You can simply runffmpeg -i [input.webm] -c:v libx264 [output.mp4]
. -
Generate annotations needed for dataloader ("<path_to_video> <video_class>" in annotations). The annotation usually includes
train.csv
,val.csv
andtest.csv
( heretest.csv
is the same asval.csv
). The format of*.csv
file is like:dataset_root/video_1.mp4 label_1 dataset_root/video_2.mp4 label_2 dataset_root/video_3.mp4 label_3 ... dataset_root/video_N.mp4 label_N
-
-
The pre-processing of Kinetics400 can be summarized into 3 steps:
-
Download the dataset from official website.
-
Preprocess the dataset by resizing the short edge of video to 320px. You can refer to MMAction2 Data Benchmark for TSN and SlowOnly.
-
Generate annotations needed for dataloader ("<path_to_video> <video_class>" in annotations). The annotation usually includes
train.csv
,val.csv
andtest.csv
( heretest.csv
is the same asval.csv
). The format of*.csv
file is like:dataset_root/video_1.mp4 label_1 dataset_root/video_2.mp4 label_2 dataset_root/video_3.mp4 label_3 ... dataset_root/video_N.mp4 label_N
-
We use decord to decode the videos on the fly during both pre-training and fine-tuning phases.
Instructions copied from the VideoMAE codebase. We thank the authors of the VideoMAE for making their code available online!