Multi-modal Data Alignment (MMDA)

The is the repo of CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features and Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction.

Link to papers: CSA and Any2Any

Link to blogs: CSA

TL;DR

Canonical Similarity Analysis (CSA) matches CLIP in multimodal tasks with far less data, mapping unimodal features into a multimodal space without extensive GPU training.

Any2Any effectively retrieves from incomplete multimodal data, achieving 35% Recall@5 on the KITTI dataset, matching baseline models.

Prerequisites

To run the code, you need to install the packages using poetry:

poetry lock && poetry install

Or, you can install the packages using pip (check the pyproject.toml for the dependencies).

Adding more datasets

To add more dataset, just add the dataset configs in configs/main.yaml and fill in the code wherever with the "TODO: add more dataset" comment. To reproduce the results, download the datasets and change their corresponding paths in the configs/main.yaml file. Then, run the following command according to what experiment you want to run (see the description of the experiments in the head of each .py file):

poetry run python mmda/<experiment>.py

CSA: System plot and major results

Reproducing the results

Core code of CSA

To see the core code of CSA, see "class NormalizedCCA" in mmda.utils.cca_class.py and "fn weighted_corr_sim" in mmda.utils.sim_utils.py. We have a notebook to show the usage of the core code of CSA in mmda/csa_example.ipynb.

Core code of Any2Any

To see the core code of Any2Any, see in mmda/any2any_conformal_retrieval.py and mmda/exps/any2any_retrieval.py. We put the configs of KITTI (Image-to-LiDAR retrieval) in mmda/utils/liploc_model.py, which is modified from the Lip-loc repository.

Disclaimer

Some of the code are modified from the ASIF and Lip-loc repositories and leverage datasets and models from Hugginface. We tried our best to cite the source of all code, models, and datasets used. If we missed any, please let us know.

Citation

If you find this repo useful for your research, please consider citing our paper:

@inproceedings{li2025csa,
    title={{CSA}: Data-efficient Mapping of Unimodal Features to Multimodal Features},
    author={Po-han Li and Sandeep P. Chinchali and ufuk topcu},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=6Mg7pjG7Sw}
}

@misc{li2024any2anyincompletemultimodalretrieval,
      title={Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction}, 
      author={Po-han Li and Yunhao Yang and Mohammad Omama and Sandeep Chinchali and Ufuk Topcu},
      year={2024},
      eprint={2411.10513},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.10513}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
assets		assets
bash_scripts		bash_scripts
config		config
mmda		mmda
unused		unused
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modal Data Alignment (MMDA)

TL;DR

Table of Contents

Prerequisites

Adding more datasets

CSA: System plot and major results

Reproducing the results

Core code of CSA

Core code of Any2Any

Disclaimer

Citation

About

Releases

Packages

Languages

License

UTAustin-SwarmLab/Multi-modal-Data-Alignment

Folders and files

Latest commit

History

Repository files navigation

Multi-modal Data Alignment (MMDA)

TL;DR

Table of Contents

Prerequisites

Adding more datasets

CSA: System plot and major results

Reproducing the results

Core code of CSA

Core code of Any2Any

Disclaimer

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages