This repository contains the code and steps to participate in EPIC-KITCHENS VISOR Semi-Supervised Video Object Segmentation Challenge. It contains the steps to get the baseline results and converting them into the required sumbission format.
To participate and submit to this VISOR Semi-Supervised VOS challenge, register at the VISOR Semi-Supervised VOS Challenge
Please Go to the download section of EPIC-KITCHENS VISOR official webpage to download the whole dataset, providing RGB frames for train/val/test and masks for train/val splits. If you are interested in our data generation pipeline, please also check our VISOR paper.
Since most methods follow DAVIS format, we provide a script to convert our dataset to DAVIS style easily. Please refer to VISOR to DAVIS-like for the needed steps and the conversion script.
Once the dataset is converted to DAVIS-like format, the dataset structure would be as follows:
|- VISOR_2022
|- val_data_mapping.json
|- train_data_mapping.json
|- JPEGImages
|- Annotations
|- ImageSets
|- 2022
|- train.txt
|- val.txt
|- val_unseen.txt
Where val.txt
contains the set of seqeunces that belongs to the validation split and val_unseen.txt
contains the subset of the validation split for the unseen kitchens.
We use the official DAVIS evaluation code in our benchmark, we adjusted the code to include the last frame of each sequence as part of the final score.
Moreover, we used STM as a baseline for this challange, please refer to STM inference and evaluation for the details on how to evaluate on the baseline with the provided pre-trained weights.
If you want to participate and evaluate on our Codalab. Then you can use pngs_to_visor.py
script that would convert the PNGs val prediction into VISOR-like JSON file, the script takes these arguments:
masks_path
: path to where the PNGs(predictions) are stored, by default it's ../predictions
. They path sould follow this structure:
|- masks_path
|- P01_107_seq_xxxx
|- P01_107_frame_xxxxxxxxxx.png
|- PXX_XXX_seq_xxxx
|- PXX_(X)XX_frame_xxxxxxxxxx.png
mapping_file
: the mapping file of your data, this would be saved when you did VISOR to DAVIS-like step. by default it's ../VISOR_2022/val_data_mapping.json
out_json_name
: the file name of the output JSON. by default it's val.json
Once the val.json
, you can compress it by running zip -j val.zip val.json
command.
We invite the highest-performing teams from the validation set to evaluate their methods on the VISOR test set. Since the annotations for the test set are not disclosed, we provide instructions on packaging your code and model into a Docker image. This enables us to evaluate your solution on our servers and provide you with the performance metrics on the test set.
A starter Dockerfile is provided with a basic environment. You are required to build it by customizing it with your specific Python packages and versions. Upon cloning the current repository, you will obtain the following file structure:
|- C6-SemiVOS
|- Dockerfile
|- codes
|- your inference codes and model should be there
|- requirements
|- requirements.txt
By following these instructions, you can build and run the VISOR VOS Docker image, allowing you to evaluate the test set within the container.
-
Prepare your code and requirements:
- Place your model and inference code in the
./codes
directory. - In
./requirements/requirements.txt
file, list all the Python packages required for your code, along with their version. For example:numpy==1.19.5 opencv-python==4.5.3 # Add more packages as needed
- Place your model and inference code in the
-
Update the Dockerfile:
- In the Dockerfile, update the first line to match your desired Python version. For example, if you want to use Python 3.8, modify the line to:
FROM python:3.8-slim-buster
. - The Dockerfile also includes some basic environments such as installing some essential tools such as libgl1-mesa-dev and libglib2.0-0 which are important to run OpenCV in most of the time.
- In the last line of the Dockerfile, put the running command for your inference code (instead of the current dummy one) supposing that VISOR dataset is placed in
/app/data
path in the docker (will be loaded to that path in step 4 when we create the docker container). The output predictions should be stored in/app/data/results
, this is critical since when we want to evaluate, we'll be checking/app/data/results
for evaluating your predictions. This an example inference run:
CMD python Training-Code-of-STM/eval.py -g '0' -s test -y 22 -p /app/codes/davis_weights/coco_lr_fix_skip_0_1_release_resnet50_400000_32_399999.pth -D /app/data/
- IMPORTANT NOTE: Don't forget to run the inference on the test set rather than the validation set. The test sequences would be located in the same structure as the val set. This is what to expect when you put the parameters of the test set:
|- VISOR_2022 |- val_data_mapping.json |- train_data_mapping.json |- JPEGImages |- 480p |- all train,val and test sequences |- Annotations |- 480p |- all train,val and test sequences |- ImageSets |- 2022 |- train.txt |- val.txt |- val_unseen.txt |- test.txt
Where test.txt contains the sequences for the test set
- IMPORTANT NOTE: Don't include any evaluation scripts in the inference code, keep as lightweight as generating output PNGs based on your latest model weights.
- We do the inference and the evaluation on 480p resolution. And we only use the sparse frames provided in the test set.
- In the Dockerfile, update the first line to match your desired Python version. For example, if you want to use Python 3.8, modify the line to:
-
Build the Docker image:
- Open a terminal or command prompt.
- Change your current directory to the provided Docker directory
- Build the Docker image using the following command:
where visor_vos is a sample name for the docker image, you can use any image name.
docker build -t visor_vos .
-
Export the Docker image: Once you have built the image, you can export it to a file using the docker save command. This command saves the image as a tar archive. Specify the repository you want to export. For example:
docker save -o <output_file>.tar visor_vos
-
Upload the generated tar file and send a link to [email protected] and CC [email protected] using your registrered Codalab email. We'll reply to you with the test set performance.
-
[optional] Run the Docker container on the validation set: If you'd like to use verify the docker on the validation dataset as a way to make sure that it's working proberly, it would be as simple as changing the
test.txt
toval.txt
.- Change the last line in the Dockerfile to use the val data instead of test (as you have the GT for the val data). Then rebuild the docker image for the validation data (step 3)
- With the docker image successfully built, you can now run the container using the following command (I assume the image name is visor_vos):
If you figured out any problem, you can enter the docker with an interactive session by:
docker run --gpus all -it visor_vos -v /path/on/host/to/visor/dataset:/app/data
docker run --gpus all -it visor_vos -v /path/on/host/to/visor/dataset:/app/data
- The
-v
mount the data (VISOR root directory) from your machine to the countainer, this would be saved in/app/data
, your code should read it from there. - The
--gpus all
flag ensures that the container can access all available GPUs on the host machine. - The
-it
flag enables an interactive terminal session within the container, allowing you to interact with it. - Once the container is running, you can execute your inference script. Make sure that the necessary data and configuration files are accessible within the container.
- The
- The results would be stored in your local mounted directory, i.e. the path of the provided dataset, then you can evlauate them using evaluation code
As the result you'd have a image tar file to share with us, we're going to use it to run your code as the test set is private.
When use this repo, any of our models or dataset, you need to cite the VISOR paper
@inproceedings{VISOR2022,
title = {EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations},
author = {Darkhalil, Ahmad and Shan, Dandan and Zhu, Bin and Ma, Jian and Kar, Amlan and Higgins, Richard and Fidler, Sanja and Fouhey, David and Damen, Dima},
booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
year = {2022}
}
The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.