diff --git a/Test/expected-results/test.epub b/Test/expected-results/test.epub
index c50e88870..9e89ccdcd 100644
Binary files a/Test/expected-results/test.epub and b/Test/expected-results/test.epub differ
diff --git a/Test/expected-results/test.isosch b/Test/expected-results/test.isosch
index 476b0e2a9..2844175fb 100644
--- a/Test/expected-results/test.isosch
+++ b/Test/expected-results/test.isosch
@@ -11,7 +11,6 @@
-
@valueDatcat
is present in the immediate context, this attribute takes on role (a), while @valueDatcat
performs role (b).@datcat
is needed.@datcat
attribute, except that it addresses not its containing element, but an object that is being referenced or modeled by its containing element.<namespace>
element.<gram>
according to some convenient and shared typology, ideally one defined in an external reference taxonomy, such as the CLARIN Concept
+ Registry.
Sample values include: 1] pos (part of speech); 2] gen (gender); 3] num (number); 4] animate; 5] properIn this example
The example below presents the TEI encoding of the name-value pair
+ <part of speech, common
+ noun>
, where the name (key) part
+ of speech
is abbreviated as POS
, and the value, common noun
is symbolized by NN
. The
+ entire name-value pair is encoded by means of the element NN
.
The name (i.e., the key) to the data
+ category part of speech
, while the attribute value to the data category
NN
is the symbol for common noun used e.g. in the CLAWS-7 tagset defined by the
+ University Centre for Computer Corpus Research on Language at the University of Lancaster. The
+ very same data category used for tagging an early version of the British National Corpus, and
+ coming from the BNC Basic
+ (C5) tagset, uses the symbol NN0
(rather than NN
). Making these values semantically
+ interoperable would be extremely difficult without a human expert if they
+ were not anchored in a single point of an established reference taxonomy of morphosyntactic
+ data categories. In the case at hand, the string http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545
is both
+ a persistent identifier of the data category in question, as well as a pointer to a shared definition
+ of
While the symbols NN
, NN0
, and many others (often coming from languages other than
+ English) are implicitly members of the container category part of speech
, it is sometimes
+ useful not to rely on such an implicit relationship but rather use an explicit identifier
+ for that data category, to distinguish it from other morphosyntactic data categories, such
+ as gender, tense, etc. For that purpose, the above example uses the
If the feature structure markup exemplified above is to be repeated many times in a single
+ document, it is much more efficient to gather the persistent identifiers in a single place and to
+ only reference them, implicitly or directly, from feature structure markup. The following
+ example is much more concise than the one above and relies on the concepts of feature structure declaration and
+ feature value library, discussed in chapter
The assumption here is that the relevant feature values are collected in a place that the
+ annotation document in question has access to — preferably, a single document per linguistic
+ resource, for example an
The example below presents an
Note that these Guidelines do not prescribe a specific choice between
In the context of dictionaries designed with semantic interoperability in mind, the following
+ example ensures that the
Efficiency of this type of interoperable markup demands that the references to the particular
+ data categories should best be provided in a single place within the dictionary (or a single
+ place within the project), rather than being repeated inside every entry. For the container
+ elements, this can be achieved at the level of
Another possibility is to shorten the URIs by means of the
This mechanism creates implications that are not always wanted, among others, in the case at
+ hand, suggesting that the identifiers pos
and adj
belong to a namespace
+ associated with the CLARIN Concept Repository (CCR), whereas that is solely a shorthand
+ mechanism whose scope is the current resource. Documenting this clearly in the header of the
+ dictionary is therefore advised.
Yet another possibility is to associate the information about the relationship between a TEI
+ markup element and the data category that it is intended to model already at the level of
+ modeling the dictionary resource, that is, at the level of the ODD, in
The
Above, the
ISO 12620:2009 is a standard describing the data model and procedures for a Data
- Category Registry (DCR). Data categories are defined as elementary descriptors in a
- linguistic structure. In the DCR data model each data category gets assigned a
- unique Peristent IDentifier (PID), i.e., an URI. Linguistic resources or preferably
- their schemas that make use of data categories from a DCR should refer to them using
- this PID. For XML-based resources, like TEI documents, ISO 12620:2009 normative
- Annex A gives a small Data Category Reference XML vocabulary (also available online
- at
The TEI Abstract Model can be expressed as a hierarchy of attribute-value matrices (AVMs) + of various types and of various levels of complexity, nested or grouped in various ways. At + the most abstract level, an AVM consists of an information container and the value + (contents) of that container.
+A simple example of an XML serialization of such structures is, on the one hand, the opening + and closing tags that delimit and name the container, and, on the other, the content enclosed + by the two tags that constitues the value. An analogous example is an + attribute name and the value of that attribute.
+In a TEI XML example of two equivalent serializations expressing the name-value pair
+ <part-of-speech,common-noun>
, namely
+ <pos>commonNoun</pos>
and
+ pos="common-noun"
, one would
+ classify the element
The
The value of the
Historically,
Note that no constraint prevents the occurrence of a combination of
+
ISO
+ 12620:2009は、データ分類のレジストリ(DCR)に関するデータモデルと手続きについての国際標準である.データ分類は,一つの言語の構造における基本的な記述子として定義される。DCRのデータモデルでは、個々のデータ分類には、ユニークな永続的識別子(PID)、つまり、URIが割り当てられる.DCRからデータ分類を利用する言語資源,あるいは、できることならそのスキーマは、このPIDを用いて参照すべきである。TEI文書のようなXMLベースの資源(
ISO 12620:2009は、データ分類のレジストリ(DCR)に関するデータモデルと手続きについての国際標準である。データ分類は、一つの言語の構造における基本的な記述子として定義される。DCRのデータモデルでは、個々のデータ分類には、ユニークな永続的識別子(PID)、つまり、URIが割り当てられる。DCRからデータ分類を利用する言語資源、あるいは、できることならそのスキーマは、このPIDを用いて参照すべきである。TEI文書のようなXMLベースの資源(
The
The
@valueDatcat
is present in the immediate context, this attribute takes on role (a), while @valueDatcat
performs role (b).@datcat
is needed.@datcat
attribute, except that it addresses not its containing element, but an object that is being referenced or modeled by its containing element.<namespace>
element.In this example, a date is given in a Mediaeval text measured "from the creation of the world", which is normalised +
In this example, a date is given in a Mediaeval text measured from the creation of the world
, which is normalized
(in
In this example
The example below presents the TEI encoding of the name-value pair
+ <part of speech, common
+ noun>
, where the name (key) part
+ of speech
is abbreviated as POS
, and the value, common noun
is symbolized by NN
. The
+ entire name-value pair is encoded by means of the element NN
.
The name (i.e., the key) to the data
+ category part of speech
, while the attribute value to the data category
NN
is the symbol for common noun used e.g. in the CLAWS-7 tagset defined by the
+ University Centre for Computer Corpus Research on Language at the University of Lancaster. The
+ very same data category used for tagging an early version of the British National Corpus, and
+ coming from the BNC Basic
+ (C5) tagset, uses the symbol NN0
(rather than NN
). Making these values semantically
+ interoperable would be extremely difficult without a human expert if they
+ were not anchored in a single point of an established reference taxonomy of morphosyntactic
+ data categories. In the case at hand, the string http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545
is both
+ a persistent identifier of the data category in question, as well as a pointer to a shared definition
+ of
While the symbols NN
, NN0
, and many others (often coming from languages other than
+ English) are implicitly members of the container category part of speech
, it is sometimes
+ useful not to rely on such an implicit relationship but rather use an explicit identifier
+ for that data category, to distinguish it from other morphosyntactic data categories, such
+ as gender, tense, etc. For that purpose, the above example uses the
If the feature structure markup exemplified above is to be repeated many times in a single
+ document, it is much more efficient to gather the persistent identifiers in a single place and to
+ only reference them, implicitly or directly, from feature structure markup. The following
+ example is much more concise than the one above and relies on the concepts of feature structure declaration and
+ feature value library, discussed in chapter
The assumption here is that the relevant feature values are collected in a place that the
+ annotation document in question has access to — preferably, a single document per linguistic
+ resource, for example an
The example below presents an
Note that these Guidelines do not prescribe a specific choice between
In the context of dictionaries designed with semantic interoperability in mind, the following
+ example ensures that the
Efficiency of this type of interoperable markup demands that the references to the particular
+ data categories should best be provided in a single place within the dictionary (or a single
+ place within the project), rather than being repeated inside every entry. For the container
+ elements, this can be achieved at the level of
Another possibility is to shorten the URIs by means of the
This mechanism creates implications that are not always wanted, among others, in the case at
+ hand, suggesting that the identifiers pos
and adj
belong to a namespace
+ associated with the CLARIN Concept Repository (CCR), whereas that is solely a shorthand
+ mechanism whose scope is the current resource. Documenting this clearly in the header of the
+ dictionary is therefore advised.
Yet another possibility is to associate the information about the relationship between a TEI
+ markup element and the data category that it is intended to model already at the level of
+ modeling the dictionary resource, that is, at the level of the ODD, in
The
Above, the
ISO 12620:2009 is a standard describing the data model and procedures for a Data
- Category Registry (DCR). Data categories are defined as elementary descriptors in a
- linguistic structure. In the DCR data model each data category gets assigned a
- unique Peristent IDentifier (PID), i.e., an URI. Linguistic resources or preferably
- their schemas that make use of data categories from a DCR should refer to them using
- this PID. For XML-based resources, like TEI documents, ISO 12620:2009 normative
- Annex A gives a small Data Category Reference XML vocabulary (also available online
- at
The TEI Abstract Model can be expressed as a hierarchy of attribute-value matrices (AVMs) + of various types and of various levels of complexity, nested or grouped in various ways. At + the most abstract level, an AVM consists of an information container and the value + (contents) of that container.
+A simple example of an XML serialization of such structures is, on the one hand, the opening + and closing tags that delimit and name the container, and, on the other, the content enclosed + by the two tags that constitues the value. An analogous example is an + attribute name and the value of that attribute.
+In a TEI XML example of two equivalent serializations expressing the name-value pair
+ <part-of-speech,common-noun>
, namely
+ <pos>commonNoun</pos>
and
+ pos="common-noun"
, one would
+ classify the element
The
The value of the
Historically,
Note that no constraint prevents the occurrence of a combination of
+
ISO
+ 12620:2009は、データ分類のレジストリ(DCR)に関するデータモデルと手続きについての国際標準である.データ分類は,一つの言語の構造における基本的な記述子として定義される。DCRのデータモデルでは、個々のデータ分類には、ユニークな永続的識別子(PID)、つまり、URIが割り当てられる.DCRからデータ分類を利用する言語資源,あるいは、できることならそのスキーマは、このPIDを用いて参照すべきである。TEI文書のようなXMLベースの資源(
ISO 12620:2009は、データ分類のレジストリ(DCR)に関するデータモデルと手続きについての国際標準である。データ分類は、一つの言語の構造における基本的な記述子として定義される。DCRのデータモデルでは、個々のデータ分類には、ユニークな永続的識別子(PID)、つまり、URIが割り当てられる。DCRからデータ分類を利用する言語資源、あるいは、できることならそのスキーマは、このPIDを用いて参照すべきである。TEI文書のようなXMLベースの資源(
The
The
A much fuller list of values for the
The list of values for the
Une liste de valeurs beaucoup plus complète pour l'attribut
属性
In the mask Tutankhamun wears a nemes headcloth which has the royal insignia of a cobra (Wadjet) and vulture - (Nekhbet) on it. These are thought respectively to symbolise Tutankhamun's rule of both Lower Egypt and Upper + (Nekhbet) on it. These are thought respectively to symbolize Tutankhamun's rule of both Lower Egypt and Upper Egypt. His ears are pierced for earrings. The mask has rich inlays of coloured glass and gemstones, including lapis lazuli surrounding the eye and eyebrows, quartz for the eyes, obsidian for the pupils. The broad collar is made up of carnelian, feldspar, turquoise, amazonite, faience and other stones.
@@ -59329,6 +59528,7 @@ Feature Value