Tool to automatically extract (in)direct speech from German and English texts.
docker run --rm -e LANGUAGE='en' -v /absolute/path/to/input/files/:/inputs fynnos/quite:latest python -m quite /inputs > outputfilename.csv
Processes all files in the input folder and writes a CSV file as output.
All files must be plain text files (best already cleaned if crawled from the internet) encoded as UTF8.
To select German as language for processing German texts, replace LANGUAGE='en'
with LANGUAGE='de'
.
The CSV output has the following columns:
filename,subject,subject_start,subject_end,subject references,cue,cue_start,cue_end,quote,quote_start,quote_end
filename
: name of the input file where the quote was extracted fromsubject
: speaker of the quote, possibly emptysubject_start
: Begin of the subject text span in characters (inclusive)subject_end
: End of the subject text span in characters (exclusive)subject references
: empty unless QUITE is configured to use a compatible co-reference resolution servicecue
: cue/trigger/indicator of a speech, usually a verb, possibly emptycue_start
: analogous tosubject_start
cue_end
: analogous tosubject_end
quote
: the extracted quotation, (in)direct speechquote_start
: analogous tosubject_start
quote_end
: analogous tosubject_end