-🌱 I’m currently an AI Research Resident at FPT Software AI Center (AIC), ex-AI Engineer at Data & AI Lab (DAL), VNG Corporation.
-
(Large) Multimodal Models Reasoning & Understanding: (Large) Multimodal Model (LMM), Image/Video Understanding, Vision-Language Compositionality, Structured Representation.
-
Efficient (Large) Multimodal Models: Parameter-Efficient Fine-Tuning (PEFT) (Efficient Training), Distilled/Small Models (Efficient Inference), Token Merging/Pruning (Efficient Input).
-
(Large) Multimodal Models Generation: Multimodal Chatbot, Visual Programming, Embodied Agent.
My current research experience comprises of Intelligent Surveillance Systems, Image/Video Understanding, Multimodal Learning and PEFT including:
-
[2023-Present] Efficient Cross-Modal Learning & Understanding: Video-Language Matching, Parameter-Efficient Fine-Tuning (PEFT), Multimodal Compositionality, Structured Representation (Scene Graph Generation).
-
[2021-2023] Intelligent Industrial/Traffic Systems Applications: Tracked-Vehicle to Video Retrieval, Person/Vehicle Re-Identification, Person/Vehicle Tracking, Face Recognition/Verification.