This is a repository with automatic discourse annotation of Daily Dialog dataset. It contains 991 dialogs with each utterance segmented into sentences. Annotation was performed with the use of ChatGPT. Currently, dd_annotation_results.tsv
contains 3 kinds of annotation:
- open goals annotation with no set classes. Prompt
- annotation according to (Goals-Plans-Action) theory. Prompt
- annotation according to (DuRecDial 2.0) annotation scheme. In this case, we also attempted to track and annotated goal states (with little success, though). Prompt
Other files contain analysis of the annotation results, specifically, open goals and DuRecDial-style goals.
With dd_analysis.py
, you can see the goal-annotated dialogs in a user-friendly streamlit interface.
With dd_visualisation.ipynb
, you can take a look at a larger-scale dataset analysis, with all annotated dialogs presented in graphs using networkx.
dd_processing.ipynb
is a technical file for preprocessing the annotated data, you don't have to run it. The results of this preprocessing are found in dialog_data/processed
.