Skip to content

Transcriber

Maarten Janssen edited this page Aug 23, 2020 · 3 revisions

Transcriber (.trs) is a file-format used by Transcriber. It is an XML-based format for transcribing spoken data, which encodes some metadata, speakers turns and their aligment to an audio/video file.

Import

trs2teitok.pl

Options

Command line options of the tool:

  • debug: debugging mode
  • output: name of the output file - if empty STDOUT
  • morerev: More revision statements
  • file: filename of the input

Specifications

The script converts Episode, Sections, and Turn to ab, ug, and u (where ug is an utterance group, which is modeled after lg and others, but does not really exist in TEI). And within the turn, it converts strings between Sync elements to tokens.

Known issues

The format does not really specify what the synchronisation elements are synching; the script currently assumes they are around words, but that will not always be the correct assumption.

Export

No export has been provided as of yet.

Clone this wiki locally