SMC++: Masked Learning of Unsupervised Video Semantic Compression

PyTorch Implementation of our paper:

SMC++: Masked Learning of Unsupervised Video Semantic Compression

Yuan Tian, Guo Lu, and Guangtao Zhai [ArXiv]

This paper is an extended version of our work originally presented at ICCV 2023 "Non-Semantics Suppressed Mask Learning for Unsupervised Video SemanticCompression".

Main Contribution

Most video compression methods focus on human visual perception, neglecting semantic preservation. This leads to severe semantic loss during the compression, hampering downstream video analysis tasks. In this paper, we propose a Masked Video Modeling (MVM)-powered compression framework that particularly preserves video semantics, by jointly mining and compressing the semantics in a self-supervised manner. While MVM is proficient at learning generalizable semantics through the masked patch prediction task, it may also encode non-semantic information like trivial textural details, wasting bitcost and bringing semantic noises. To suppress this, we explicitly regularize the non-semantic entropy of the compressed video in the MVM token space. The proposed framework is instantiated as a simple Semantic-Mining-then-Compression (SMC) model. Furthermore, we extend SMC as an advanced SMC++ model from several aspects. First, we equip it with a masked motion prediction objective, leading to better temporal semantic learning ability. Second, we introduce a Transformer-based compression module, to improve the semantic compression efficacy. Considering that directly mining the complex redundancy among heterogeneous features in different coding stages is non-trivial, we introduce a compact blueprint semantic representation to align these features into a similar form, fully unleashing the power of the Transformer-based compression module. Extensive results demonstrate the proposed SMC and SMC++ models show remarkable superiority over previous methods.

Environment Setup

Create conda env
```
conda env create -f smc_env.yml
```
Install VVenC 1.5.0

Download the binary VVenC library (For ubuntu only, if you use other Linux version, please build VVenC from the source code)
```
Link: https://pan.baidu.com/s/1wm92tY27QJW4lm8d3Dxz_A?pwd=tzkf 
Password: tzkf 
```
Put the VVenC folder under this project

Pre-trained SMC++ Model

Download the pre-trained SMC++ model, and put it under the folder "pretrain_models/coding"
```
Link: https://pan.baidu.com/s/1vk6bRpj5G3ffeg7MwIdpIA?pwd=ekuh 
Password: ekuh 
```

Video Object Segmentation (VOS) Task

In this section, we adopt the XMEM approach as the VOS task evaluator.

Download the official XMEM models, XMem.pth' and XMem-s012.pth', from the link, and save them to the folder "XMem/saves".
```
Link: https://pan.baidu.com/s/1pPUunMOmRn5LoOaW6z641g?pwd=isol 
Password: isol 
```
Download our re-organized DAVIS2017 dataset, and put the file "DAVIS_raw_480.tar.gz" under the folder "XMem/dataset_space"
```
Link: https://pan.baidu.com/s/1O0HD1bldq1EmXVJUPp2n8Q?pwd=r17g 
Password: r17g  
```
Note: For this datasete, we download the original 1080p frames, and down-sample the frames into 480p. Compared to the officially released 480p frames, our re-organized version is with less artifacts, thus more suitable for evaluating different compression methods.

De-compress the DAVIS2017 dataset

cd XMem/dataset_space
tar zxvf DAVIS_raw_480.tar.gz

Apply semantic compression to the DAVIS2017 dataset

python VideoDatasetList/Davis2017/transcode_scripts_smcplus_GOP10.py

Evaluate the XMEM model on the compressed DAVIS2017 dataset Chang the "davis_path " option within eval_results/eval_smcplus.py to your "...DAVIS_raw_480/trainval" path.
```
cd XMem/
python eval_results/eval_smcplus.py
```
The evaluation result for VOS task will be saved in "val_results/logs_evalseg".

Video Action Recognition Task

We re-use the codes of mmaction2 to inherit their rich pre-trained action models. Configuring mmaction2 is a bit tedious, please be patient.

Install mmcv and mmaction2

cd mmcv_1_6_0
pip install -e . 

cd mmaction2_v0_20_0
python setup.py develop

Download all pre-trained action models and put them into the folder `pretrain_models/action'.

Link: https://pan.baidu.com/s/1ixi5AYd0YQyQQQlnHFZtkw?pwd=ri6q 
Password: ri6q

Dataset-Action Model Pair	Model file name
timesformer-hmdb51	timesformer_hmdb51_best_top1_acc_epoch_5.pth
timesformer_ucf101	timesformer_ucf101_best_top1_acc_epoch_8.pth
tsm_hmdb51	tsm_hmdb51_best_top1_acc_epoch_10.pth
slowfast_hmdb51	slowfast_hmdb51_epoch_33.pth
slowfast_ucf101	slowfast_ucf101_epoch_40.pth
slowfast_k400	slowfast_k400_slowfast_r50_4x16x1_256e_kinetics400_rgb_20210722-04e43ed4.pth
tsm_ucf101	tsm_ucf101_tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb_20210630-1fae312b.pth
tsm_k400	tsm_k400_tsm_r50_1x1x8_50e_kinetics400_rgb_20200607-af7fb746.pth
tsm_diving48	tsm_diving48_tsm_r50_video_1x1x8_50e_diving48_rgb_20210426-aba5aa3d.pth

Video action dataset preparation

We provide our well-organized HMDB51/UCF101 Dataset for quick start-up. For other large-scale datasets like Kinetic400 and Diving48, please download them from the official weblink.
- HMDB51:
```
Link: https://pan.baidu.com/s/1Yvv4YSGlmHdDRqaEH7ijjg?pwd=qo62 
Password: qo62 
```
Put the downloaded "hmdb51_extracted.tar.gz" under the folder "VideoDatasetList",and decompress it.
```
tar zxvf hmdb51_extracted.tar.gz
```
- UCF101:
```
Link: https://pan.baidu.com/s/1FXXisNGnAu7Uv9opHaIGrw?pwd=964p 
Password: 964p 
```
Put the downloaded "ucf101_jpg.tar.gz" under the folder "VideoDatasetList",and decompress it.
```
tar zxvf ucf101_jpg.tar.gz
```
Re-produce action results
- Slowfast-UCF101:
Set the data_root/data_root_val/data_root_compress/data_root_val_compress field in the configuration file `mmaction2_v0_20_0/configs/lbvu/slowfast_ucf101.py' to the directory of your extracted UCF101 dataset;
```
cd mmaction2_v0_20_0
python configs/SMC_Plus/test_slowfast_ucf101.py
```
If the GPU memory is insufficient, please reduce the value of the `data.test_dataloader.videos_per_gpu' field.
- TSM- HMDB51: Set the data_root/data_root_val/data_root_compress/data_root_val_compress field in the configuration file `mmaction2_v0_20_0/configs/lbvu/tsm_hmdb.py' to the directory of your extracted HMDB51 dataset;
```
cd mmaction2_v0_20_0
python configs/SMC_Plus/test_tsm_hmdb.py
```
If the GPU memory is insufficient, please reduce the value of the `data.test_dataloader.videos_per_gpu' field.
- Other Action models and datasets:
Please refer to `mmaction2_v0_20_0/configs/SMC_Plus/test_other_action_models.py'.

Other Info

References

This repository is built upon the following projects.

Citation

Please [★star] this repo and [cite] the following papers if you feel our project and codes useful to your research:

@article{tian2024smcplus,
  title={SMC++: Masked Learning of Unsupervised Video Semantic Compression},
  author={Tian, Yuan and Lu, Guo and Zhai, Guangtao},
  journal={arXiv},
  year={2024}
}

@inproceedings{tian2023non,
  title={Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression},
  author={Tian, Yuan and Lu, Guo and Zhai, Guangtao and Gao, Zhiyong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13610--13622},
  year={2023}
}

Contact

For any questions, please feel free to open an issue or contact:

Yuan Tian: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
SMC_ICCV2023		SMC_ICCV2023
VideoDatasetList/Davis2017		VideoDatasetList/Davis2017
XMem		XMem
mmaction2_v0_20_0		mmaction2_v0_20_0
mmcv_1_6_0		mmcv_1_6_0
vtm_configs		vtm_configs
.gitignore		.gitignore
README.md		README.md
ReadMe		ReadMe
common.py		common.py
dataset_online_seg.py		dataset_online_seg.py
datasets_video.py		datasets_video.py
main_eval_compress_online_lsgan.py		main_eval_compress_online_lsgan.py
opts.py		opts.py
resnet.py		resnet.py
smc_env.yaml		smc_env.yaml
transforms.py		transforms.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMC++: Masked Learning of Unsupervised Video Semantic Compression

Main Contribution

Environment Setup

Pre-trained SMC++ Model

Video Object Segmentation (VOS) Task

Video Action Recognition Task

Other Info

References

Citation

Contact

About

Releases

Packages

Languages

tianyuan168326/VideoSemanticCompression-Pytorch

Folders and files

Latest commit

History

Repository files navigation

SMC++: Masked Learning of Unsupervised Video Semantic Compression

Main Contribution

Environment Setup

Pre-trained SMC++ Model

Video Object Segmentation (VOS) Task

Video Action Recognition Task

Other Info

References

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages