Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting to Add Video Evaluation Benchmark - VELOCITI #166

Open
varungupta31 opened this issue Jul 6, 2024 · 1 comment
Open

Requesting to Add Video Evaluation Benchmark - VELOCITI #166

varungupta31 opened this issue Jul 6, 2024 · 1 comment

Comments

@varungupta31
Copy link

Hi there!
Thanks for the effort to maintain this amazing repository.

This is a request to add our recent work on evaluation of Video Models. We propose an evaluation benchmark, VELOCITI.

Please find relevant details below,

Title:

VELOCITI: Can Video-Language Models Bind Semantic Concepts Through Time?

About
To keep up with the rapid pace with which Video-Language Models (VLM) are being proposed, our primary motivation is to provide a benchmark to evaluate current SoTA, as well as upcoming VLMs on Compositionality, which is a fundamental aspect of vision- language understanding. This is achieved through carefully designed tests, which evaluate various aspects of perception and binding. With this, we aim to provide a more accurate gauge of VLM capabilities, encouraging research towards improving VLMs and preventing shortcomings that may percolate into the systems that rely on such models.

ArXiv
https://arxiv.org/abs/2406.10889v1

GitHub
https://github.com/katha-ai/VELOCITI

Project Page and Demo
https://katha-ai.github.io/projects/velociti/

Please let me know if I missed some required details.
Thanks for your time.

@xjtupanda
Copy link
Collaborator

Sorry for the late response. It's incorporated now.
Please also consider citing our works:

@article{yin2024survey,
  title={A survey on multimodal large language models},
  author={Yin, Shukang and Fu, Chaoyou and Zhao, Sirui and Li, Ke and Sun, Xing and Xu, Tong and Chen, Enhong},
  journal={National Science Review},
  pages={nwae403},
  year={2024},
  publisher={Oxford University Press}
}

@article{fu2023mme,
  title={MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models},
  author={Fu, Chaoyou and Chen, Peixian and Shen, Yunhang and Qin, Yulei and Zhang, Mengdan and Lin, Xu and Yang, Jinrui and Zheng, Xiawu and Li, Ke and Sun, Xing and others},
  journal={arXiv preprint arXiv:2306.13394},
  year={2023}
}

@article{fu2024mme,
  title={MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs},
  author={Fu, Chaoyou and Zhang, Yi-Fan and Yin, Shukang and Li, Bo and Fang, Xinyu and Zhao, Sirui and Duan, Haodong and Sun, Xing and Liu, Ziwei and Wang, Liang and others},
  journal={arXiv preprint arXiv:2411.15296},
  year={2024}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants