Developed by Nitin Tiwari, Sagar Malhotra and Savio Rodrigues.
This repository is an implementation of inferring the PaliGemma Vision Language Model on Android using Hugging Face-Gradio Client API for tasks such as zero-shot object detection, image captioning and visual question-answering.
Visual question-answering, zero-shot object detection, image captioning
Reference Expression Segmentation