DSTC9 Track 1 Evaluation Dataset

This directory contains the evaluation dataset for DSTC9 Track 1.

Evaluation Data

DSTC9 Track 1 evaluation data includes the following three subsets:

Subset #1: the test partition of the augmented MultiWOZ 2.1 collected by the same methods as the training/validation datasets.
Subset #2: multi-domain human-human written conversations about touristic information for San Francisco.
Subset #3: multi-domain human-human spoken conversations (with manual transcriptions) about touristic information for San Francisco.

We are releasing the following data and resources:

logs.json: the test instances listed in a random order with no identifier of what subset each instance belongs to.
labels.json: the ground-truth labels/responses for the test instances
knowledge.json: the knowledge candidates for all three subsets including 12,039 snippets for five domains and 668 entities in total, which is a super set of the knowledge.json for the training/validation set.
db.json: the domain DB entries for the Subset #2 and #3.

All the json formats are the same as the training/validation resources.

Participation

Each participating team will submit up to 5 system outputs for the test instances in logs.json.

The system outputs must follow the same format as labels.json for the training/validation sets. Before making your submission, please double check if every file is valid with no error from the following script:

$ python scripts/check_results.py --dataset test --dataroot data_eval/ --outfile [YOUR_SYSTEM_OUTPUT_FILE]
Found no errors, output file is valid

Any invalid submission will be excluded from the official evaluation.

Once you're ready, please make your submission by completing the ~~Submission Form~~ by 11:59PM UTC-12 (anywhere on Earth), September 28, 2020.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DSTC9 Track 1 Evaluation Dataset

Evaluation Data

Participation

Files

README.md

Latest commit

History

README.md

File metadata and controls

DSTC9 Track 1 Evaluation Dataset

Evaluation Data

Participation