Skip to content

Commit

Permalink
add documents
Browse files Browse the repository at this point in the history
  • Loading branch information
nobu-g committed Mar 3, 2024
1 parent 33e7ef6 commit 5089390
Show file tree
Hide file tree
Showing 4 changed files with 88 additions and 1 deletion.
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ repos:
- id: check-yaml
- id: check-toml
- id: check-added-large-files
exclude: "docs/.*pdf"
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.2.0
hooks:
Expand Down
48 changes: 47 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,47 @@
# J-CRe3 Dataset
# J-CRe3: Japanese Conversation dataset for Real-world Reference Resolution

![Dataset Overview](https://raw.githubusercontent.com/riken-grp/J-CRe3/main/docs/overview.pdf)

## Overview

J-CRe3 is a real-world Japanese conversation dataset that contains egocentric video and dialogue audio of real-world conversations between two people.
The conversations involve a robot that is helping its master with daily mundane tasks, including many object manipulations.
It contains 93 scenario-based dialogues with 2,131 utterances and 11,024 seconds of video.
As shown in the figure above, the dataset is annotated with the following information:

- Bounding boxes

- Textual references

- Text-to-object references

## Distributed Files

- `annotation/`: Bounding box and text-to-object reference annotations. The files in this directory are directly generated by the annotation tool, and most of the annotations are omitted for efficient annotation.

- `transcription/`: Transcription of the dialogue audio. This directory contains files directly generated by the transcription tool (ELAN). Refer to the files under `knp` directory for processed transcriptions.

- `knp/`: Processed transcriptions of the dialogue audio with textual reference annotations. The files in this directory are in [the KNP format](https://rhoknp.readthedocs.io/en/latest/format/index.html#knp), which can be easily parsed by [rhoknp](https://github.com/ku-nlp/rhoknp).

- `id/`: Dialogue id files providing train/valid/test split.

## Statistics

Refer to [statistics.md](./docs/statistics.md).

## License

TBW

## Citation

```bibtex
@inproceedings{ueda-2024-jcre3,
title={J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution},
author={Nobuhiro Ueda and Hideko Habe and Yoko Matsui and Akishige Yuguchi and Seiya Kawano and Yasutomo Kawanishi and Sadao Kurohashi and Koichiro Yoshino},
booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
year={2024},
pages={},
address={Turin, Italy},
}
```
Binary file added docs/overview.pdf
Binary file not shown.
40 changes: 40 additions & 0 deletions docs/statistics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Statistics of J-CRe3 dataset

| | Train | Val. | Test | Total |
|----------------------------|---------|--------|--------|---------|
| Dialogues | 75.0 | 9.0 | 9.0 | 93 |
| Frames | 8780.0 | 922.0 | 1360.0 | 11,062 |
| Bounding boxes | 65431.0 | 5079.0 | 9184.0 | 79,694 |
| Object instances | 1435.0 | 119.0 | 187.0 | 1,741 |
| Rel = | 1306.0 | 102.0 | 160.0 | 1,568 |
| Rel ガ | 2091.0 | 155.0 | 297.0 | 2,543 |
| Rel ヲ | 1134.0 | 74.0 | 136.0 | 1,344 |
| Rel ニ | 1228.0 | 96.0 | 148.0 | 1,472 |
| Rel ト | 42.0 | 1.0 | 5.0 | 48 |
| Rel デ | 110.0 | 25.0 | 10.0 | 145 |
| Rel カラ | 65.0 | 1.0 | 2.0 | 68 |
| Rel ヨリ | 3.0 | 0.0 | 0.0 | 3 |
| Rel ヘ | 0.0 | 0.0 | 0.0 | 0 |
| Rel マデ | 2.0 | 0.0 | 0.0 | 2 |
| Rel ガ2 | 522.0 | 47.0 | 65.0 | 634 |
| Rel ヲ2 | 19.0 | 1.0 | 1.0 | 21 |
| Rel ニ2 | 24.0 | 3.0 | 2.0 | 29 |
| Rel ト2 | 0.0 | 0.0 | 0.0 | 0 |
| Rel デ2 | 1.0 | 0.0 | 0.0 | 1 |
| Rel カラ2 | 0.0 | 0.0 | 0.0 | 0 |
| Rel ヨリ2 | 0.0 | 0.0 | 0.0 | 0 |
| Rel ヘ2 | 0.0 | 0.0 | 0.0 | 0 |
| Rel マデ2 | 0.0 | 0.0 | 0.0 | 0 |
| Rel ノ | 385.0 | 16.0 | 32.0 | 433 |
| Rel ノ? | 2.0 | 2.0 | 0.0 | 4 |
| Rel 修飾 | 3.0 | 0.0 | 3.0 | 6 |
| Rel トイウ | 1.0 | 0.0 | 0.0 | 1 |
| Rel = | 1306.0 | 102.0 | 160.0 | 1,568 |
| Rel ノ | 385.0 | 16.0 | 32.0 | 433 |
| Rel ノ? | 2.0 | 2.0 | 0.0 | 4 |
| Rel ノ≒ | 53.0 | 0.0 | 9.0 | 62 |
| Rel ノ?≒ | 0.0 | 0.0 | 0.0 | 0 |
| Zero references | 6016.0 | 453.0 | 708.0 | 7,177 |
| Total object classes | 158.0 | 42.0 | 49.0 | -1 |
| Bounding boxes per frame | 7.5 | 5.5 | 6.8 | -1 |
| Object instances per frame | 0.2 | 0.1 | 0.1 | -1 |

0 comments on commit 5089390

Please sign in to comment.