Skip to content

Commit

Permalink
update readme: extending recaptioning
Browse files Browse the repository at this point in the history
  • Loading branch information
ermu2001 committed Apr 29, 2024
1 parent 6c81a88 commit da17e15
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 2 deletions.
68 changes: 66 additions & 2 deletions DATA.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,70 @@ Also other fantastic repo intergrating these benchmarks are helpful in the proce
- [VideoLlava](https://github.com/PKU-YuanGroup/Video-LLaVA/tree/main/videollava)
- [IG-VLM](https://github.com/imagegridworth/IG-VLM/tree/main)

### Inter4k

This is a dataset with 1000 samples of high resolution videos. We prepare the data folloing the instructions from their [official website](https://alexandrosstergiou.github.io/datasets/Inter4K/index.html)

### Recaptioning
#### Inter4k

This is a dataset with 1000 samples of high resolution videos. We prepare the data folloing the instructions from their [official website](https://alexandrosstergiou.github.io/datasets/Inter4K/index.html)

#### Extending Reacptioning
The recaptioning part is designed to be extendable.

inference script [tasks/eval/recaption/pllava_recaption.py](tasks/eval/recaption/pllava_recaption.py) would use a dataset class [RecaptionDataset](tasks/eval/recaption/__init__.py#L197). The detailed information is kept in the data_list_info attribute as:
```
data_list_info = OrderedDict({
# "Panda70M": OrderedDict(
# json_relpath="Panda70M/annotations.json",
# prefix="DATAS/Recaption/Panda70M/videos",
# data_type="video",
# bound=False,
# key_rename_map={
# # 'caption': 'hint',
# },
# name_key='video_name',
# postfix=('mp4', 'mkv', 'webm'),
# recaption_type=RecaptionSample,
# ), # don't has start & end
"Inter4K": OrderedDict(
json_relpath="Inter4K/annotations.json",
prefix="DATAS/Recaption/Inter4K/60fps/UHD",
data_type="video",
bound=False,
key_rename_map={
# 'caption': 'hint',
},
name_key='video_name',
postfix=('mp4', 'mkv', 'webm'),
recaption_type=CaptionSample,
), # don't has start & end
})
```
It contains the path to a annotation json file where there is a list and each item of the list is a sample waiting for captioning. For example, the Inter4K/annotations.json is like:
```json
[
{
"video_name": "973"
},
...
]
```
and the directory DATAS/Recaption/Inter4K/60fps/UHD would look like:
```
$ ls DATAS/Recaption/Inter4K/60fps/UHD
1.mp4 134.mp4 170.mp4 ....
```

Naively, only the video is needed when captioning directly, therefore the annotation file only needs to contain the names of each video under the "prefix" directory.

Extending a dataset for captioning would consist of the folloing steps:
1. have all the videos downloaded
2. construct a annotation.json file with sepecific format.
3. configure the recaption dataset [here](tasks/eval/recaption/__init__.py#L197), where you would need to determine:
- json_relpath: the annotation relative path
- prefix: root directory for videos
- postfix: a list containing all the file extensions for these videos

The other options are experimental, so stick with the default setting as in Inter4k. The recommended length of video is around 5-20 seconds.

p.s. "bound" is to make sure the video pass to the model doesn't have scene transition or so. This part wasn't tested, so set the bound to false and make sure the original videos files are single clip of a video. But always feel free to discover and contribute to PLLaVA!
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,12 @@ bash scripts/gallery.sh

Feel free to use the compare version to compare differnt models' results or use the single gallery version to check out one model's results. They are basically the same. Check out this [script](scripts/gallery.sh) for more details

#### For Captioning and Recaptioning
Follow instructions at [DATA.md](DATA.md#extending-reacptioning) and you can extend the recaptioning data with a few steps.

Feel free to point out high quality dataset of videos, we would proceed on doing captioning on those datasets.


# :page_facing_up: Citation

If you find this project useful in your research, please consider cite:
Expand Down

0 comments on commit da17e15

Please sign in to comment.