Skip to content
This repository was archived by the owner on May 19, 2025. It is now read-only.

tensorsense/videogemma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VideoGemma

This repository contains code for VideoGemma multimodal language model.

VideoGemma combines LanguageBind video encoder with performant and flexible Gemma LLM in a LLaVA-style architecture.

Getting started

We recommend using Dev Containers to create the environment.

I don't want a container

  1. Install PyTorch.

  2. Install Python dependencies.

    pip3 install -r requirements.txt
    pip3 install git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d
  3. For checkpoint loading and model configuration see run_finetune.ipynb.

Pretrained checkpoints

Pretrained checkpoint for the model can be found here: HF 🤗.

  • The model's projector has been pretrained for 1 epoch on the Valley dataset.
  • LLM and the projector have been jointly fine-tuned using the Video-ChatGPT dataset.

About

Code for VideoGemma multimodal language model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •