StreamChat

Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025🎉

🔥 News

[2025.1] 🔥 Release repo and test code.
[2025.2] 🔥 Release StreamBench.

🚩 Approach

Motivation

1. Video agent with training-free and decoupled architecture.
2. Multi-round interaction with memory-enhanced knowledge during inference.
3. Achieving faster video processing speed.

Architecture

Selective Frame Stacking: reduce the redundant video frame feature storage.
Memory Formation: update memory and retrieve the related information as in-context.
Contextual Summarization: reorganize in-context as prompt for MLLM.

StreamBench

StreamBench is designed for the model performance evaluation in online videos. It covers 4 key domains and 16 sub-class video types. These videos exhibit a broader distribution of length, with 6 different types that are evenly distributed. It consists of 6 kindsof questions (Object Search, Long-term Memory Search, Short-term Memory Search, Conversational Interaction, Knowledge-based Question Answering, and Simple Factual) to provide more comprehensive evaluation results.

🏃‍♂️ Getting Started

You need at least 2x80G GPU to run.
Sorry for the terrible code, we are trying to solve it.

Preparation

Download StreamBench.

StreamBench_v0.3.

StreamBench_v0.3
├── Ego
│   │── all_videos
│── WebvVideo
│── Movie
│── streaming_bench_v0.3.json

Download LLaMA 3, LongVA and Embedding model weight.

Environment

git clone https://github.com/hmxiong/StreamChat.git
cd StreamChat
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Inference, scouring and get results

# change model setting
Change the 'embedding_model_dict -> minilm-l6' path in memory_bank/memory_retrieval/configs/model_config.py
Change the 'embedding_model_id' in  inference_streaming_longva_v2.py wih mxbai-colbert-large-v1 model save path.
Change the LLaMA3, LongVA model save path in inference_streamchat_v0.3.sh
All settings that need to be changed are marked with 'Your_xxxxx'.

# run script
bash inference_streamchat_v0.3.sh

You can change to parameters in the script and it takes about 28 hours to get results.

TODO:

🌟 Citation

If you find this work helpful for your research, please consider citing our work.

@misc{xiong2025streamingvideounderstandingmultiround,
      title={Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge}, 
      author={Haomiao Xiong and Zongxin Yang and Jiazuo Yu and Yunzhi Zhuge and Lu Zhang and Jiawen Zhu and Huchuan Lu},
      year={2025},
      eprint={2501.13468},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.13468}, 
}

🤗 Acknowledgement

StreamChat is built upon the following outstanding works: LongVA, LLaVA-NeXT, ChatUnivi, InternVL, MemoryBank, FreeVA, LLaVA-VID, Flash-VStream, Video-online. Thanks！

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
freeva		freeva
kmeans_pytorch		kmeans_pytorch
llamavid		llamavid
llava		llava
llava_hound		llava_hound
llavanext		llavanext
llavavid		llavavid
longva		longva
memory_bank		memory_bank
pic		pic
previous_version		previous_version
rag_memory		rag_memory
test_other_models		test_other_models
tools		tools
torch_kmeans		torch_kmeans
vila		vila
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
calculate_avg_score.py		calculate_avg_score.py
calculate_score.py		calculate_score.py
ego_data.py		ego_data.py
eval_ego_streaming_with_llama3.py		eval_ego_streaming_with_llama3.py
eval_video_qa_with_llama3.py		eval_video_qa_with_llama3.py
eval_video_qa_with_llama3_ours.py		eval_video_qa_with_llama3_ours.py
inference_streamchat_v0.3.sh		inference_streamchat_v0.3.sh
inference_streaming_longva_v2.py		inference_streaming_longva_v2.py
requirements.txt		requirements.txt
utiles.py		utiles.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamChat

🔥 News

🚩 Approach

Motivation

Architecture

StreamBench

🏃‍♂️ Getting Started

Preparation

Environment

Inference, scouring and get results

🌟 Citation

🤗 Acknowledgement

About

Releases

Packages

Languages

License

hmxiong/StreamChat

Folders and files

Latest commit

History

Repository files navigation

StreamChat

🔥 News

🚩 Approach

Motivation

Architecture

StreamBench

🏃‍♂️ Getting Started

Preparation

Environment

Inference, scouring and get results

🌟 Citation

🤗 Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages