Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"
-
Updated
Jan 9, 2025 - Python
Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"
🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement learning—to achieve faithful, concise, and self-reflective state-of-the-art performance in visual and textual reasoning.
[CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".
Add a description, image, and links to the multimodal-large-language-model topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-large-language-model topic, visit your repo's landing page and select "manage topics."