- 🔍 Evaluation of MLLMs under misleading inputs
- 📊 Uncertainty quantification metrics
- 🎯 Explicit & Implicit misleading experiments
- 🔬 Comprehensive model comparison
- 📝 Reproducible results and visualization
Before running the code, set up the required environments for: glm
, llava
, MiniCPM-V
, mmstar
.
📥 Installation Steps:
- Navigate to the
env
folder. - Download and install the corresponding
.yml
environment files:conda env create -f env/glm.yml conda env create -f env/llava.yml conda env create -f env/MiniCPM-V.yml conda env create -f env/mmstar.yml
- Activate the required environment:
conda activate <ENV_NAME>
Download the Multimodal Uncertainty Benchmark (MUB) dataset here.
Extract and place the downloaded images into the extract_img_all
folder.
Evaluated Open-source and Close-source Models:
MiniCPM-v-v2; Phi-3-vision; YiVL-6b; Qwen-VL-Chat; Deepseek-VL-7b-Chat; LLaVA-NeXT-7b-vicuna; MiniCPM-Llama3-v2.5; GLM4V-9Bchat; CogVLM-chat; InternVL-Chat-V1-5; LLaVA-Next-34b; Yi-VL-34b; GPT-4o; Gemini-Pro; Claude3-OpusV; Glm-4V
bash MR_test.sh
- Open
implicit/misleading_generate/my_tool.py
. - Fill in your API Key.
- Run:
bash implicit/misleading_generate/mislead_generate.sh
Use the generated data in implicit/mislead_output
:
bash implicit/Implicit_MR_test/implicit_MR_test.sh
Results are saved in:
- 📁
result/test_dataset_6
.jsonl
→ Detailed outputs.txt
→ Model's Misleading Rate (MR)
- Open
extract2table/extract2table.py
- Modify
txt_folder_paths
as needed. - Run:
python extract2table/extract2table.py
- The formatted table is saved in:
📁
extract2table/Tables/
If you use this work, please cite:
@article{yourpaper2024,
title={Exploring Response Uncertainty in MLLMs: An Empirical Evaluation Under Misleading Scenarios},
author={Authors' Names},
journal={arXiv preprint arXiv:2411.02708},
year={2024}
}
For any issues, please open a GitHub issue or reach out via email: [email protected]