This repository contains implementation of ManyICL. Prepare a dataframe, configure your API key, modify the prompt and just run it!
Please note that this code repo is intended for research purpose, and might not be suitable for large-scale production.
Install packages using pip:
$ pip install -r requirements.txt
- Get your API key from here;
- Replace the placeholder in
ManyICL/LMM.py
(Line 2);
Note that you need a Google cloud project for this.
- In the Google Cloud console, go to the Dashboard.
- Click the project selection list at the top of the page. In the Select a resource window that appears, select a project. Note the project ID displayed in the Project info section.
- Replace the placeholder in
ManyICL/LMM.py
(Line 3); - If you're developing locally or on Colab (not on GCP instances), you need to authenticate by following this instruction.
Prepare two pandas dataframe: one for the demonstration set and one for the test set. You can find examples under the dataset/
folder. Note that the index column should contain the filenames of the images. Here's a quick preview:
Index | Forest | Golf course | Freeway |
---|---|---|---|
forest39.jpeg | 1 | 0 | 0 |
golfcourse53.jpeg | 0 | 1 | 0 |
freeway97.jpeg | 0 | 0 | 1 |
Note that we only include 42 images in UCMerced dataset for illustration purposes.
ManyICL/
├── LMM.py
├── dataset
│ └── UCMerced
│ ├── demo.csv
│ ├── test.csv
│ ├── images
│ │ ├── forest39.jpeg
│ │ ├── forest47.jpeg
│ │ ├── freeway09.jpeg
│ │ ├── freeway97.jpeg
│ │ ├── ...
├── prompt.py
└── run.py
Modify the prompt in prompt.py
if needed.
Run the experiment script, and it'll save all the raw responses in UCMerced_21shot_Gemini1.5_7.pkl
.
python3 ManyICL/run.py --dataset=UCMerced --num_shot_per_class=1 --num_qns_per_round=7
Run the evaluation script, and it'll read from the raw responses and print out the accuracy score.
python3 ManyICL/eval.py --dataset=UCMerced --num_shot_per_class=1 --num_qns_per_round=7
If you find our work useful in your research please consider citing:
@misc{jiang2024manyshot,
title={Many-Shot In-Context Learning in Multimodal Foundation Models},
author={Yixing Jiang and Jeremy Irvin and Ji Hun Wang and Muhammad Ahmed Chaudhry and Jonathan H. Chen and Andrew Y. Ng},
year={2024},
eprint={2405.09798},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
We thank Dr. Jeff Dean, Yuhui Zhang, Dr. Mutallip Anwar, Kefan Dong, Rishi Bommasani, Ravi B. Sojitra, Chen Shani and Annie Chen for their feedback on the ideas and manuscript.