Skip to content

Annotation Layer and Metadata Field Names

ctschroeder edited this page Oct 9, 2014 · 17 revisions
<title>Annotation Layer and Metadata Field Names for Coptic SCRIPTORIUM Documents </title>
Annotation Layer Names
tok tokens, smallest possible unit to be annotated; MAY BE SMALLER THAN THE MORPHEMES IN ORIG
orig see transcription guidelines for smallest unit of LANGUAGE (morpheme or word level; smaller than the bound group level); orthography is from the original text (diplomatic, edition, whatever); includes supralinear strokes and other markings from the manuscript
orig_group bound groups using the original orthography, including supralinear strokes and other markings
norm_group bound groups (same structure as orig_word but with normalized spelling, etc., so content is based on norm)
norm normalized version of orig
pos part of speech tags
lang language of origin tags (Hebrew, Greek, Latin, Aramaic, etc.)
morph see transcription guidelines sections 4.3 and 4.4 for morphs that are below the word level -- this is where words containing mnt, at, ref are annotated a second time
note notes that normally would go in a TEI XML <note note="xxx"> tag
hi@rend see transcription guidelines sections 4.2 & 5; text renderings
lb@n line breaks -- numbered according to the original manuscript
cb@n column breaks -- numbered according to the original manuscript
pb_xml@id page numbers of original manuscript (not the current repositiory numbering) (TEI XML <pb xml:id="xxx">
ignore:note notes that will NOT be imported into ANNIS or exported as TEI or PAULA XML; private notations from annotators/encoders/editors
translation English translation
p paragraph breaks for translation
verse verse of text written as number (always use in Bible of any kind, including Sahidica)
vid formerly verse@id (Sahidica)
chapter chapter of text as number (not necessary -- in metadata)
chapter@cname chapter of text written as text and number (not necessary -- in other data)
chapter@cid chapter id (Sahidica-- not necessary)
verse@vname verse of text written as text and number (e.g. 1 Corinthians 1:10) (not necessary -- in other data)
add_place
Preferred order of layers tok, orig, orig_group, norm_group, norm, pos, morph, lang translation, lb@n, cb@n, pb@xml_id, p
METADATA in meta sheet
Coptic_edition
Greek_source
corpus
title
author
language
annotation
project
translation
msName
pages_from
pages_to
msContents_title@type
msContents_title@n
repository
collection
idno
version@n
version@date
source_info
license use for copyright in Sahidica, CC-BY for everything else
respStatement?
Clone this wiki locally