update readme: extending recaptioning

magic-research · Apr 29, 2024 · da17e15 · da17e15
1 parent 6c81a88
commit da17e15
Show file tree

Hide file tree

Showing 2 changed files with 72 additions and 2 deletions.
diff --git a/DATA.md b/DATA.md
@@ -32,6 +32,70 @@ Also other fantastic repo intergrating these benchmarks are helpful in the proce
 - [VideoLlava](https://github.com/PKU-YuanGroup/Video-LLaVA/tree/main/videollava)
 - [IG-VLM](https://github.com/imagegridworth/IG-VLM/tree/main)
 
-### Inter4k
 
-This is a dataset with 1000 samples of high resolution videos. We prepare the data folloing the instructions from their [official website](https://alexandrosstergiou.github.io/datasets/Inter4K/index.html)
+
+### Recaptioning
+#### Inter4k
+
+This is a dataset with 1000 samples of high resolution videos. We prepare the data folloing the instructions from their [official website](https://alexandrosstergiou.github.io/datasets/Inter4K/index.html)
+
+#### Extending Reacptioning
+The recaptioning part is designed to be extendable.
+
+inference script [tasks/eval/recaption/pllava_recaption.py](tasks/eval/recaption/pllava_recaption.py) would use a dataset class [RecaptionDataset](tasks/eval/recaption/__init__.py#L197). The detailed information is kept in the data_list_info attribute as:
+```
+data_list_info = OrderedDict({
+        # "Panda70M": OrderedDict(
+        #     json_relpath="Panda70M/annotations.json", 
+        #     prefix="DATAS/Recaption/Panda70M/videos", 
+        #     data_type="video", 
+        #     bound=False,
+        #     key_rename_map={
+        #         # 'caption': 'hint',
+        #     },
+        #     name_key='video_name',
+        #     postfix=('mp4', 'mkv', 'webm'),
+        #     recaption_type=RecaptionSample,
+        # ), # don't has start & end
+        "Inter4K": OrderedDict(
+            json_relpath="Inter4K/annotations.json", 
+            prefix="DATAS/Recaption/Inter4K/60fps/UHD", 
+            data_type="video", 
+            bound=False,
+            key_rename_map={
+                # 'caption': 'hint',
+            },
+            name_key='video_name',
+            postfix=('mp4', 'mkv', 'webm'),
+            recaption_type=CaptionSample,
+        ), # don't has start & end
+    })
+```
+It contains the path to a annotation json file where there is a list and each item of the list is a sample waiting for captioning. For example, the Inter4K/annotations.json is like:
+```json
+[
+    {
+        "video_name": "973"
+    },
+    ...
+]
+```
+and the directory DATAS/Recaption/Inter4K/60fps/UHD would look like:
+```
+$ ls DATAS/Recaption/Inter4K/60fps/UHD
+1.mp4 134.mp4  170.mp4 ....
+```
+
+Naively, only the video is needed when captioning directly, therefore the annotation file only needs to contain the names of each video under the "prefix" directory.
+
+Extending a dataset for captioning would consist of the folloing steps:
+1. have all the videos downloaded
+2. construct a annotation.json file with sepecific format.
+3. configure the recaption dataset [here](tasks/eval/recaption/__init__.py#L197), where you would need to determine:
+    - json_relpath: the annotation relative path
+    - prefix: root directory for videos
+    - postfix: a list containing all the file extensions for these videos
+
+The other options are experimental, so stick with the default setting as in Inter4k. The recommended length of video is around 5-20 seconds. 
+
+p.s. "bound" is to make sure the video pass to the model doesn't have scene transition or so. This part wasn't tested, so set the bound to false and make sure the original videos files are single clip of a video. But always feel free to discover and contribute to PLLaVA!
diff --git a/README.md b/README.md
@@ -340,6 +340,12 @@ bash scripts/gallery.sh
 
 Feel free to use the compare version to compare differnt models' results or use the single gallery version to check out one model's results. They are basically the same. Check out this [script](scripts/gallery.sh) for more details
 
+#### For Captioning and Recaptioning
+Follow instructions at [DATA.md](DATA.md#extending-reacptioning) and you can extend the recaptioning data with a few steps.
+
+Feel free to point out high quality dataset of videos, we would proceed on doing captioning on those datasets.
+
+
 # :page_facing_up: Citation
 
 If you find this project useful in your research, please consider cite: