[NeurIPS D&B 2024 Spotlight] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
💡 We also have other video generation project that may interest you ✨.
Open-Sora-Plan
PKU-Yuan Lab and Tuzhan AI etc.
MagicTime
Shenghai Yuan, Jinfa Huang and Yujun Shi etc.
- ⏳⏳⏳ Evaluate more Text-to-Video Generation Models via ChronoMagic-Bench.
[2024.09.30]
🔥 We have updated the calculation of the CHScore, making it more robust to temporally coherent disappearance of points. You can click here for detailed implementation.[2024.09.26]
✨ Our paper is accepted by NeurIPS 2024 D&B track as a spotlight present.[2024.08.13]
🔥 We further evaluate EasyAnimate-V3 and CogVideoX-2B. The results are available here.[2024.06.30]
🔥 We release the code of the "Multi-Aspect Data Preprocessing", which is used to process the ChronoMagic-Pro dataset. Please click here and here to see more details.[2024.06.29]
🔥 Support evaluating customized Text-to-Video models. The code and instructions are available in this repo.[2024.06.28]
🔥 We release the ChronoMagic-Pro and ChronoMagic-ProH datasets. The datasets include 460K and 150K time-lapse video-text pairs respectively and can be downloaded at HF-Dataset-Pro and HF-Dataset-ProH.[2024.06.27]
🔥 We release the arXiv paper and Leaderboard for ChronoMagic-Bench, and you can click here to read the paper and here to see the leaderboard.[2024.06.26]
🔥 We release the testing prompts, reference videos and generated results by different models in ChronoMagic-Bench, and you can click here to see more details.[2024.06.25]
🔥 All codes & datasets are coming soon! Stay tuned 👀!
ChronoMagic-Bench can reflect the physical prior capacity of Text-to-Video Generation Model.
- ChronoMagic-Bench: including 1649 time-lapse video-text pairs. (captioned by GPT-4o)
- ChronoMagic-Bench-150: including 150 time-lapse video-text pairs. (captioned by GPT-4o)
- ChronoMagic: including 2265 time-lapse video-text pairs. (captioned by GPT-4V)
- ChronoMagic-Pro: including 460K time-lapse video-text pairs. (captioned by ShareGPT4Video)
- ChronoMagic-ProH: including 150K time-lapse video-text pairs. (captioned by ShareGPT4Video)
In contrast to existing benchmarks, ChronoMagic-Bench emphasizes generating videos with high persistence and strong variation, i.e., metamorphic time-lapse videos with high physical prior content.
Backbone | Type | Visual Quality | Text Relevance | Metamorphic Amplitude | Temporal Coherence |
---|---|---|---|---|---|
UCF-101 | General | ✔️ | ✔️ | ❌ | ❌ |
Make-a-Video-Eval | General | ✔️ | ✔️ | ❌ | ❌ |
MSR-VTT | General | ✔️ | ✔️ | ❌ | ❌ |
FETV | General | ✔️ | ✔️ | ❌ | ✔️ |
VBench | General | ✔️ | ✔️ | ❌ | ✔️ |
T2VScore | General | ✔️ | ✔️ | ❌ | ❌ |
ChronoMagic-Bench | Time-lapse | ✔️ | ✔️ | ✔️ | ✔️ |
We specifically design four major categories for time-lapse videos (as shown below), including biological, human-created, meteorological, and physical videos, and extend these to 75 subcategories. Based on this, we construct ChronoMagic-Bench, comprising 1,649 prompts and their corresponding reference time-lapse videos.
We visualize the evaluation results of various open-source and closed-source T2V generation models across ChronoMagic-Bench.
See numeric values at our Leaderboard 🥇🥈🥉
or you can run it locally:
cd LeadBoard
python app.py
We recommend the requirements as follows.
git clone --depth=1 https://github.com/PKU-YuanGroup/ChronoMagic-Bench.git
cd ChronoMagic-Bench
conda create -n chronomagic python=3.10
conda activate chronomagic
# install base packages
pip install -r requirements.txt
# install flash-attn
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention/csrc/layer_norm && pip install .
cd ../../../
rm -r flash-attention
huggingface-cli download --repo-type model \
BestWishYsh/ChronoMagic-Bench \
--local-dir BestWishYsh/ChronoMagic-Bench
We provide evaluation prompt lists of ChronoMagic-Bench here or here. You can use this to sample videos for evaluation of your model. We also provide the reference videos for the corresponding evaluation prompts here.
Use ChronoMagic-Bench to evaluate videos, and video generative models.
The generated videos should be named corresponding to the prompt ID in ChronoMagic-Bench and placed in the evaluation folder, which is structured as follows. We also provide input examples in the 'toy_video' .
# for open-source models
`-- input_video_folder
`-- model_name_a
|-- 1
| |-- 3d_printing_08.mp4
| `-- ...
|-- 2
| |-- 3d_printing_08.mp4
| `-- ...
`-- 3
|-- 3d_printing_08.mp4
`-- ...
`-- model_name_b
|-- 1
| |-- 3d_printing_08.mp4
| `-- ...
|-- 2
| |-- 3d_printing_08.mp4
| `-- ...
`-- 3
|-- 3d_printing_08.mp4
`-- ...
# for close-source models
-- input_video_folder
|-- model_name_a
| |-- 3d_printing_08.mp4
| `-- animal_04.mp4
| `-- ...
|-- model_name_b
| |-- 3d_printing_08.mp4
| `-- ...
`-- ...
The filenames of all videos to be evaluated should be "videoid.mp4". For example, if the videoid is 3d_printing_08, the video filename should be "3d_printing_08.mp4". If this naming convention is not followed, the text relevance cannot be evaluated.
We provide output examples in the 'results'. You can run the following commands for testing, then modify the relevant parameters (such as model_names, input_folder, model_pth and openai_api) to suit the text-to-video (T2V) generation model you want to evaluate.
python evaluate.py \
--eval_type "open" \
--model_names test \
# or more than one model
# --model_names name1 name2 \
--input_folder toy_video \
--output_folder results \
--video_frames_folder video_frames_folder_temp \
--model_pth_CHScore cotracker2.pth \
--model_pth_MTScore InternVideo2-stage2_1b-224p-f4.pt \
--num_workers 8 \
--openai_api "sk-UybXXX" \
If you only want to evaluate any one of the metrics instead of calculating all of them, you can follow the step below. Before running, please modify the parameters in 'xxx.sh' as needed. (If you want to obtain the JSON to submit to the leaderboard, you can organize the output files in MTScore / CHScore / GPT4o-MTScore according to 'results' and then proceed with the following steps.)
# for MTScore
cd MTScore
bash get_chscore.sh
# for CHScore
cd CHScore
bash get_mtscore.sh
# for GPT4o-MTScore
cd GPT4o_MTScore
bash get_gp4omtscore.sh
Please refer to the folder UMT for how to compute the UMTScore.
python get_uploaded_json.py \
--input_path results/all \
--output_path results
After completing the above steps, you will obtain ChronoMagic-Bench-Input.json, and then you need to manually fill the JSON with UMT-FVD and UMTScore (as we calculate them separately). Finally, you can submit the JSON to HuggingFace.
To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for ChronoMagic-Bench evaluation. You can download them on Hugging Face. We also provide detailed explanations of the sampled videos and detailed setting for the models under evaluation here.
ChronoMagic-Pro with 460K time-lapse videos, each accompanied by a detailed caption. We also released the 150K subset (ChronoMagic-ProH), which is a higher quality subset. All the dataset can be downloaded at here and here, or you can download it with the following command. Some samples can be found on our Project Page.
huggingface-cli download --repo-type dataset \
--resume-download BestWishYsh/ChronoMagic-Pro \ # or BestWishYsh/ChronoMagic-ProH
--local-dir BestWishYsh/ChronoMagic-Pro \ # or BestWishYsh/ChronoMagic-ProH
--local-dir-use-symlinks False
Please refer to the folder Multi-Aspect_Preprocessing for how ChronoMagic-Pro to process this data.
- This project wouldn't be possible without the following open-sourced repositories: CoTracker, InternVideo2, UMT, FETV, VBench, Panda-70M, ShareGPT4Video and LAION Aesthetic Predictor.
- The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
- The service is a research preview. Please contact us if you find any potential violations. ([email protected])
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.
@article{yuan2024chronomagic,
title={ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation},
author={Yuan, Shenghai and Huang, Jinfa and Xu, Yongqi and Liu, Yaoyang and Zhang, Shaofeng and Shi, Yujun and Zhu, Ruijie and Cheng, Xinhua and Luo, Jiebo and Yuan, Li},
journal={arXiv preprint arXiv:2406.18522},
year={2024}
}