The is the repo of CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features and Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction.
Link to papers: CSA and Any2Any
Link to blogs: CSA
Canonical Similarity Analysis (CSA) matches CLIP in multimodal tasks with far less data, mapping unimodal features into a multimodal space without extensive GPU training.
Any2Any effectively retrieves from incomplete multimodal data, achieving 35% Recall@5 on the KITTI dataset, matching baseline models.
- Prerequisites
- Adding more datasets
- Reproducing the results
- Core code of CSA
- Core code of Any2Any
- Disclaimer
- Citation
To run the code, you need to install the packages using poetry:
poetry lock && poetry install
Or, you can install the packages using pip (check the pyproject.toml for the dependencies).
To add more dataset, just add the dataset configs in configs/main.yaml
and fill in the code wherever with the "TODO: add more dataset" comment.
To reproduce the results, download the datasets and change their corresponding paths in the configs/main.yaml
file.
Then, run the following command according to what experiment you want to run (see the description of the experiments in the head of each .py file):
poetry run python mmda/<experiment>.py
To see the core code of CSA, see "class NormalizedCCA" in mmda.utils.cca_class.py and "fn weighted_corr_sim" in mmda.utils.sim_utils.py.
We have a notebook to show the usage of the core code of CSA in mmda/csa_example.ipynb
.
To see the core code of Any2Any, see in mmda/any2any_conformal_retrieval.py
and mmda/exps/any2any_retrieval.py
. We put the configs of KITTI (Image-to-LiDAR retrieval) in mmda/utils/liploc_model.py
, which is modified from the Lip-loc repository.
Some of the code are modified from the ASIF and Lip-loc repositories and leverage datasets and models from Hugginface. We tried our best to cite the source of all code, models, and datasets used. If we missed any, please let us know.
If you find this repo useful for your research, please consider citing our paper:
@inproceedings{li2025csa,
title={{CSA}: Data-efficient Mapping of Unimodal Features to Multimodal Features},
author={Po-han Li and Sandeep P. Chinchali and ufuk topcu},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=6Mg7pjG7Sw}
}
@misc{li2024any2anyincompletemultimodalretrieval,
title={Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction},
author={Po-han Li and Yunhao Yang and Mohammad Omama and Sandeep Chinchali and Ufuk Topcu},
year={2024},
eprint={2411.10513},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.10513},
}