Skip to content

iburenko/multimodal-reading-group

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 

Repository files navigation

multimodal-reading-group


Date Paper Authors Code Demo Comments
01.02.2024 Visual Instruction Tuning H. Liu, C. Li, Q. Wu, Y. J. Lee GitHub Project Page Demo
08.02.2024 When and why vision-language models behave like bags-of-words, and what to do about it? M. Yuksekgonul, F. Bianchi, P. Kalluri, D. Jurafsky, J. Zou https://github.com/mertyg/vision-language-models-are-bows Colab Why did they expect that CLIP will take a word order into account given that CLIP is trained to match a bag-of-words with a corresponding image?
22.02.2024 Learning Transferable Visual Models From Natural Language Supervision A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G.Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever GitHub Project Page Colab See also open source implementation of CLIP; Scaling laws for contrastive language-image learning
29.02.2024 Continue Fig. 2 is unclear. How do they obtain a vector for a bag-of-words?
07.03.2024 Still (sic!) continue It seems that they train using BoW, even though their inference pipeline does not reflect this.
14.03.2024 Sigmoid Loss for Language Image Pre-Training X. Zhai, B. Mustafa, A. Kolesnikov, L. Beyer HuggingFace
21.03.2024 Continue
28.03.2024 Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning W. Liang, Y. Zhang, Y. Kwon, S. Yeung, J. Zou GitHub Project Page
04.04.2024 What Makes Training Multi-modal Classification Networks Hard? Wang, Tran, Feiszli
11.04.2024 MultiBench: Multiscale Benchmarks for Multimodal Representation Learning Liang, Lyu, Fan, Wu, Cheng, Wu, Chen, Wu, Lee, Zhu, Salakhutdinaov, Morency GitHub Project Page Demos
16.04.2024 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs Tong, Liu, Zhai, Ma, LeCun, Xie GitHub Project Page HuggingFace
23.04.2024 Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies Li, Xie, Cubuk
30.04.2024 Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models Lu, Peng, Cheng, Galley, Chang, Wu, Zhu, Gao GitHub, Project Page
07.05.2024 Many-Shot In-Context Learning Agarwal, Singh, Zhang, Bohnet, Chan, Anand, Abbas, Nova, Co-Reyes, Chu, Behbahani, Faust, Larochelle Not Provided
28.05.2024 BABILong: a long-context needle-in-a-haystack benchmark for LLMs Kuratob, Bulatov, Anokhin, Sorokin, Sorokin, Burtsev GitHub
04.06.2024 Continue
11.06.2024 4M: Massively Multimodal Masked Modeling Mizrahi, Bachmann, Kar, Yeo, Gao, Dehghan, Zamir GitHub Project Page
18.06.2024 Continue
25.06.2024 GLaMM: Pixel Grounding Large Multimodal Model Rasheed, Maaz, Shaji, Shaker, Khan, Cholakkal, Anwer, Xing, Yang, Khan GitHub Project Page Demo
02.07.2024 Code Reading Group
09.07.2024 Knowledge Distillation Gemma 2 (pdf), MobileLLM, Knowledge distillation, On-Policy distillation of Language Models
16.07.2024 Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Menon, Zemel, Vondrick Project Page
23.07.2024 Multimodal Neurons in Artificial Neural Networks Goh, Cammarata, Voss, Carter, Petrov, Schubert, Radford, Olah
30.07.2024 Continue + (very briefly) CLIPPO
06.08.2024 Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think! Hessel, Lee
13.08.2024 Graph of Thoughts and Monte Carlo Tree Search Monte Carlo Tree Search from Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B; Graph of Thoughts; Large Language Monkeys; STaR: Self-Taught Reasoner; Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents; Bonus! DeepSeek-Prover-V1.5 Tinygrad example of MCTS
15.10.2024
22.10.2024 Calibration Multimodal Learning Ma, Zhang, Wu, Fu, Hu
08.11.2024 Towards Mamba: the S4 model and topic around: HiPPo, S4 paper, Annotated S4 blog post Gu, Goel, Ré GitHub
15.11.2024 Continue: HiPPO and S4
22.11.2024 Continue: Mamba & differences between Transformers and SSMs

Datasets and benchmarks
Surveys
Representation Learning
Latent Space Structure
Fusion
Modality Competition. Quantitative Methods of Detection of Suboptimality.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published