multimodal-large-language-model

Here are 3 public repositories matching this topic...

dvlab-research / Lyra

Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"

efficiency vision-language-model speech-language-model omni-language-model multimodal-large-language-model

Updated Jan 9, 2025
Python

🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement learning—to achieve faithful, concise, and self-reflective state-of-the-art performance in visual and textual reasoning.

reinforcement-learning visual-reasoning mathematical-reasoning data-efficiency multimodal-large-language-model prioritized-advantage-distillation cold-start-initialization efficient-length-reward open-source-7b-model self-reflective-chain-of-thought

Updated Jul 9, 2025
Python

Jorffy / NoteMR

Star

[CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".

knowledge retrieval vqa cvpr multimodal-learning visual-question-answering gradcam rag llm large-language-model mllm llava retrieval-augmented-generation llava-next cvpr2025 mllm-reasoning multimodal-large-language-model knowledge-based-visual-question-answering kb-vqa

Updated Jun 16, 2025
Python

Improve this page

Add a description, image, and links to the multimodal-large-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-large-language-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly