Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GML encoding of code lists #56

Open
TatjanaKutzner opened this issue Nov 2, 2021 · 27 comments
Open

GML encoding of code lists #56

TatjanaKutzner opened this issue Nov 2, 2021 · 27 comments

Comments

@TatjanaKutzner
Copy link
Contributor

In the meeting on 21 October 2021 we discussed the GML encoding of code lists.

  • The code lists should be kept simple, i.e. they should only contain codes with associated code descriptions.

  • In addition, metadata information should be added to the code lists, in particular metadata attributes for the language of the code list, the CityGML data type to which the code list refers and the CityGML version.
    -> This information is enough to allow for providing code lists for the same data type in different languages. These language specific code lists contain the same codes, but the descriptions are provided in different languages.
    -> In this way, by means of the metadata information also the code lists in other languages can be identified, even if the CityGML file refers to a code list in a specific language only.

  • The code lists are to be represented using GML 3.2.1 dictionaries.
    -> Metadata information can be added to the dictionaries by means of a <gml:metaDataProperty> element.
    -> The <gml:identifier> element can be used for the code.
    -> the <gml:name> element can be used for the description, also in different languages as the element has multiplicity "unbounded".

  • We also discussed about providing a CSV encoding in addition.
    The CSV files should contain three columns.
    The first column indicates whether the row contains a code or metadata.
    The second column contains the code or name of the metadata attribute.
    The third column contains the code description or the metadata value.
    -> When a code value in a CityGML instance links to an external code list, the encoding of the external code list can be determined based on the suffix .gml or .csv.
    -> The code lists could then simply be managed using Excel.

Homework tasks:

  • Claus will check whether CityGML 2 code lists will validate against GML 3.2.1
  • Thomas and Tatjana will provide an example for a code list encoded as csv file
@cmheazel
Copy link
Collaborator

cmheazel commented Nov 3, 2021

The NGA maintains a registry of codelists designed for run-time access from GML documents. It is located at https://nsgreg.nga.mil/voc/registers.jsp?register=CDV. The intent of this register is to remove the codelist values from the data model, managing them as a separate resource. Codelist values are identified in a GML document through a URI. That URI consists of the URI for the register, followed by the codelist name, followed by the codelist value. An advantage of this approach is that the codelist value is resolvable to a definition of that value.

@PeterParslow
Copy link
Contributor

The European Commission's INSPIRE Code Lists are managed at https://inspire.ec.europa.eu/codelist towards a similar (the same?) result.

The preferred approach from the GML documents is an encoding like this:

Where the URI goes "straight to" the required code list value; the xlink:title provides a utility value for those parsers who do not want to dereference the URI. (Some schemas allow the element to also have text content carrying a national translation of the code list value). The definition can be retrieved in various formats by dereferencing the URI.

@cmheazel
Copy link
Collaborator

@PeterParslow The registers are comparable. The difference is how the code list values are encoded. While INSPIRE uses xlinks, NGA uses gco:CodeListValue_Type. Here is an example (from gmd:CI_ResponsibleParty in ISO 19115):
gmd:role
<gmd:CI_RoleCode codeList="http://api.nsgreg.nga.mil/codelist/RoleCode" codeListValue="originator"/>
</gmd:role>
Both approaches work. It would be best to pick one and use it consistently. Do the ISO schemas we use push us toward one over the other?

@PeterParslow
Copy link
Contributor

@cmheazel : I have to ask "which ISO schemas?"

@3DXScape
Copy link
Collaborator

Sorry for dropping the ball on codelists. Even though I was tied up in another SWG at the time, I did do some prototyping for the 21 October meeting. My suggestion was for

  1. an implementation-independent logical model
    CodelistModel

  2. a "neutral" archive format editable with basic tools e.g. a Json text file

  3. a web service that
    a. validates and loads an archive for a specified domain e.g. CityGML 3.0
    b. serves individual codelists in whatever (multiple) encodings we choose

  4. a static file store in an OGC domain with materialized versions of all registered codelists in all supported encodings

  5. a utility to periodically refresh the static file store

@clausnagel
Copy link
Member

clausnagel commented Nov 18, 2021

All code list values are encoded using gml:CodeType. Thus, we can use the codeSpace attribute to reference a codelist just like in INSPIRE and in previous versions of CityGML. We do not mandate how the codelist values shall be stored and managed. So, using a registry like NGA does and referencing this registry via an URI would be perfectly fine. If there is no (local, regional, nation-wide, global, what-so-ever) registry, then codelists can also be stored as files and these files can be referenced via URIs. For this case, we want to define two encodings: a GML encoding and a CSV encoding. So, for instance, if users want to provide their codelists as GML files, they shall use the specified GML encoding (which is just the default GML 3.2.1 encoding for dictionaries). If they want to share them as Excel files, they need to make sure that consumers can read and understand them.

I think this approach is very flexible and lightweight as we can neither manage nor define a global registry for all CityGML users and use cases. And, it is very in line with how CityGML 2.0 and 1.0 work.

@3DXScape
Copy link
Collaborator

I have made some progress in prototyping this, using some 2.0 codelists as a starting point.
CityGML_3.0_CodelistDomainJson.txt
AuxiliaryTrafficAreaFunctionValueXML.txt
AuxiliaryTrafficAreaFunctionValueJSON.txt

@3DXScape
Copy link
Collaborator

I am also happy with Claus' comment. See you soon.

@clausnagel
Copy link
Member

Thanks for your thoughts, @3DXScape. I would welcome having a third predefined JSON encoding.

@TatjanaKutzner
Copy link
Contributor Author

In our web meeting on 18 November 2021 we discussed that each code list should have the following metadata information

  • the data type to which the code list refers
  • XML namespace of the corresponding CityGML module in which the code list is defined. The CityGML 3 conceptual model currently does not define code lists with duplicate names in different modules. But this information will be helpful for ADEs that will be defined in the future and that might have duplicate names.
  • the language in which the descriptions of the code values are provided
  • the authority/issuer of the code list
  • the version of the code list, this could be a date or version number

GML does not provide a content model for the metadata. We need to define our own UML class with the required metadata attributes and encode it in XML to be able to add the metadata to the GML code lists.
-> Tatjana will check how to do this with ShapeChange.

Furthermore, we discussed that we do not intend to specify the encoding of code lists in a normative way.
As in the CityGML 2.0 specification, we will rather include a chapter on the encoding of code lists that provides users and developers with examples on how code lists can be encoded in the formats GML, CSV, and JSON. These code lists can serve as templates for users who want to define their own code lists.

We will try to provide non-empty code lists for all code lists defined in the conceptual model. These code lists will only serve as examples, they will not provide exhaustive lists of possible code list values. These code lists may be used by others, but everybody is still free to create their own code lists.

Setting up a registry for the code lists and defining a REST API to access the code list values are tasks that our group cannot deal with as part of the GML encoding specification.

@TatjanaKutzner
Copy link
Contributor Author

The results from the web meeting on 15 December 2021.

Metadata for code lists

We discussed the following slides on the encoding of the metadata information that is to be provided with each code list: https://github.com/opengeospatial/CityGML-3.0Encodings/blob/master/CityGML/Encoding%20Rules/Metadata%20for%20code%20lists.pdf

We agreed on making use of option 2, i.e. providing our own container element for the metadata information.

The XML schema for the metadata is available here:
https://github.com/opengeospatial/CityGML-3.0Encodings/blob/master/CityGML/Schema/codeListMetaData.xsd

Code list structure

We agreed on the following CityGML 3.0 codelist structure:

<gml:Dictionary gml:id="roofTypes">
    <gml:metaDataProperty>
        <cmd:CodeListMetaData>
            <cmd:dataType>RoofTypeValue</cmd:dataType>
            <cmd:namespace>http://www.opengis.net/citygml/building/3.0</cmd:namespace>
            <cmd:language>en</cmd:language>
            <cmd:authority>xyz</cmd:authority>
            <cmd:version>1.0</cmd:version>
        <cmd:CodeListMetaData>
    </gml:metaDataProperty>
    <gml:description>Roof type values</gml:description>
    <gml:identifier codeSpace="https://ogc.org/citygml/3.0/codelists/gml/rooftypes">RoofTypeValue</gml:identifier>
    <gml:dictionaryEntry>
        <gml:Definition gml:id="id1">
            <gml:description>roof primarily a single plane, not necessarily level</gml:description>
            <gml:identifier codeSpace="https://ogc.org/citygml/3.0/codelists/gml/rooftypes">1000</gml:identifier>
            <gml:name>flat roof</gml:name>
        </gml:Definition>
    </gml:dictionaryEntry>
    <gml:dictionaryEntry>
        <gml:Definition gml:id="id2">
            <gml:description>a roof that has a ridge and two gables</gml:description>
            <gml:identifier codeSpace="https://ogc.org/citygml/3.0/codelists/gml/rooftypes">3100</gml:identifier>
            <gml:name>saddle roof</gml:name>
        </gml:Definition>
    </gml:dictionaryEntry>
</gml:Dictionary>

For the individual code list entries we defined the following:

  • gml:description: can be used to provide a textual description of the code, is optional
  • gml:identifier: contains the code, the code is unique within the code list, the code can be a number or a string
  • gml:name: provides a human-readable representation of the code, is optional and should be provided when the code is given as a number

@TatjanaKutzner
Copy link
Contributor Author

@3DXScape I had a look at the chapter on ADEs and have some comments and questions.

  • What is the purpose of providing an XML schema for the code lists? As far as I understood from our discussions, we intend to use GML Dictionaries for representing code lists. And for GML Dictionaries an XML schema already exists. It's not mentioned in the chapter that we use the predefined GML Dictionaries.
  • Is the intention of the XML schema that users can validate their code lists instances against the schema?
  • Why a new element <Codelist> as root element and not the existing GML element <gml:Dictionary>?
  • The elements <gml:identifier> and <gml:name> for the individual code list entries are currently defined as mandatory in the XML schema. We once discussed that only <gml:identifier> should be mandatory and <gml:name> optional. <gml:identifier> does not necessarily need to contain a code value, but can instead also contain a text value and in that case it is not required to provide the text value in the <name> element as well. The same for the <gml:description> element, this should also be optional. See also my comment above from December.
  • Also, we have created an XML schema for the metadata. This schema is not mentioned in the chapter. The metadata elements have now become part of the new XML schema for code lists. And a new <Metadata> element is introduced that will contain the metadata information. Does that mean that we do not make use of the <gml:metaDataProperty> element any more in the code list GML instances? The instance provided in the chapter, however, still uses the <gml:metaDataProperty> element.

Regarding the metadata elements:

  • Should the metadata elements be shortly explained? I think it would make sense.
  • The metadata element <version> was defined as xs:string in the metadata XML schema, because we said that the value could be a date or version number. Now it's defined as xs:decimal.
  • The metadata element <namespace> was defined as xs:anyURI in the metadata XML schema, now it's defined as xs:string.

I noted down several things from the meeting where we discussed code lists. Would it make sense to provide these things as recommencations?

  • The dictionary must always have a gml:id
  • The <gml:identifier> element contains the code and is mandatory, the <gml:name> element contains a human-readable representation of the code. But the gml:name element is only required, when the code is a numeric code.
  • The gml:description element can be used for additional descriptions of the code, but is optional.
  • Always the same code space should be used for the whole dictionary and the individual entries.
  • Code lists can be extended by referencing external code lists.
  • In the GML instance document, the codespace links to a code list file or API.

The CSV example is completely different from what @thomashkolbe and I suggested some time ago.
We have sent this by e-mail some time ago. It's this file:
Codelist_BuildingClassValue_en.csv
We probably should discuss this further.

@3DXScape
Copy link
Collaborator

3DXScape commented Mar 4, 2022

I will realign the chapter as you describe. We should review the results at the next meeting.

@3DXScape
Copy link
Collaborator

3DXScape commented Mar 4, 2022

What is the purpose of providing an XML schema for the code lists? As far as I understood from our discussions, we intend to use GML Dictionaries for representing code lists. And for GML Dictionaries an XML schema already exists. It's not mentioned in the chapter that we use the predefined GML Dictionaries.

Good point. I ignored the details of the discussion. I originally wanted an XML encoding not tied to external schemas. I will use the GML Dictionary as it is in 3.2.1.

Is the intention of the XML schema that users can validate their code lists instances against the schema?

No. It was intended to be informative for consumers of our example codelists. Most real world codelists defined by government agencies or other communities will not follow our choice.

Why a new element as root element and not the existing GML element gml:Dictionary?

See above for the reason. I am happy to explicitly use the gml:Dictionary structure.

The elements gml:identifier and gml:name for the individual code list entries are currently defined as mandatory in the XML schema. We once discussed that only gml:identifier should be mandatory and gml:name optional. gml:identifier does not necessarily need to contain a code value, but can instead also contain a text value and in that case it is not required to provide the text value in the element as well. The same for the gml:description element, this should also be optional. See also my comment above from December.

I do not see a difference between a "code" value and a "text" value. Again, I have no problem using the name field and the description field for essentially the same purpose if it aligns better.

Also, we have created an XML schema for the metadata. This schema is not mentioned in the chapter. The metadata elements have now become part of the new XML schema for code lists. And a new element is introduced that will contain the metadata information. Does that mean that we do not make use of the gml:metaDataProperty element any more in the code list GML instances? The instance provided in the chapter, however, still uses the gml:metaDataProperty element.

Will change.

Regarding the metadata elements:
Should the metadata elements be shortly explained? I think it would make sense..The metadata element was defined as xs:string in the metadata XML schema, because we said that the value could be a date or version number. Now it's defined as xs:decimal.

Yes.

The metadata element was defined as xs:anyURI in the metadata XML schema, now it's defined as xs:string.
I noted down several things from the meeting where we discussed code lists. Would it make sense to provide these things as recommencations?

Yes.

The dictionary must always have a gml:id
The gml:identifier element contains the code and is mandatory, the gml:name element contains a human-readable representation of the code. But the gml:name element is only required, when the code is a numeric code.

I think it makes sense to just make the name element optional. An identifier can be opaque even it is not a number.

I think the discussion was that a name is only helpful if the identifier is some opaque value, such as a number. My opinion is that the description is the best place to explain opaque codes. I will agree with having an optional name, since we have legacy codelist entries with both codes and names.

The gml:description element can be used for additional descriptions of the code, but is optional.
Always the same code space should be used for the whole dictionary and the individual entries.
Code lists can be extended by referencing external code lists.
In the GML instance document, the codespace links to a code list file or API.

OK.

@PeterParslow
Copy link
Contributor

Do we intend to include anything about how we recommend/expect the code lists to be referenced?

gml:CodeType in GML 3.2.1 and 3.2.2 primarily expects e.g. <gml:name codeSpace = “http://www.ukusa.gov/placenames”>St Paul</gml:name>, but the codeSpace is optional.

Should we say (somewhere!) that codeSpace is required?

GML3.3 then deprecates this ("The use of CodeType to reference code list entries is deprecated" which has always been slightly odd in an 'extension'!)

So in that sense, the CityGML schemas do not conform to CityGML 3.3.

@3DXScape
Copy link
Collaborator

3DXScape commented Apr 7, 2022

Good point. I see that most (all?) of the examples in the repo and in the document do not have a codeSpace attribute. The attribute seems logically necessary in order to interpret or validate the code value. One way to solve the GML issues is to change the mapping of CodeList stereotyped types from GML:CodeType. to the Core type "Code", which is in the CM and seems to be a clone of CodeType but not in the GML namespace. (Maybe I am missing something in the history of Code??)

Optionality would still be still an issue. We add a requirement that restricts the allowed multiplicity to 1 or to 1..* That would support testing whether a codelist value was indeed valid in the specified codelist. It might also give access to other information specific to an external codelist that might enhance the user experience, e.g. plaintext alias or description for the code values- all outside the scope of our standard.

The downside of removing optionality would be breaking of backwards compatibility.

@PeterParslow
Copy link
Contributor

"We add a requirement that restricts the allowed multiplicity to 1 or to 1..*" - better be "1", because XML doesn't allow an element to have more than one attribute with the same name, so an element of type gml:CodeType can't reference more than one codeSpace.

@clausnagel
Copy link
Member

clausnagel commented Apr 9, 2022

The downside of making the XML attribute codeSpace mandatory is that you always have to have a codelist. Remember that the CM does not define codelists. And then please check how many thematic attributes in the GML encoding are of type gml:CodeType. You will not be able to use one of those attributes without defining a codelist beforehand.

A (possibly undocumented?) idea of CityGML 2.0 is that you can simply store a clear-text string as value of a gml:CodeType attribute without the need for referencing a codelist via codeSpace. For example, you could simply store the value "hospital" as bldg:function of a bldg:Building element. I think this should still be possible in CityGML 3.0.

@thomashkolbe
Copy link

I agree with Claus' comment.

@3DXScape
Copy link
Collaborator

3DXScape commented Apr 9, 2022

That is two votes for leaving the codeSpace attribute optional. If we do that, then we should give some more definition to the use of the attribute when it is present. Here's my suggestion:

A gml:CodeValue datatype may have an optional codeSpace attribute. The purpose of the codeSpace attribute is to identify the codelist from which the value is taken. An authority sponsoring the identified codelist may have rules governing the use and meaning of the codelist's values and metadata. The authority may also have rules defining such things as the use and interpretation of values that do not appear in the codelist, the duplication of values, or any other characteristics of a codelist. All such properties and uses of a codelist or codelist value are outside the scope of this CityGML 3.0 Encoding Specification. This does not in any way restrict the ability of an authority sponsoring a codelist, as denoted by the use of a codeSpace attribute, to have application-specific codelist values, rules, interpretations, and uses specified in an independent specification.

@thomashkolbe
Copy link

This suggestion looks good to me!

@PeterParslow
Copy link
Contributor

I think it would be helpful to include something in the note about the benefit to users of having the codeSpace attribute populated, such as:

"Providing the codeSpace, especially when you use a dereferenceable identifier, should allows the user easy access to further information about the value such as more detail about what it means."

@3DXScape
Copy link
Collaborator

Incorporating that suggestion and tidying up the language a bit:

A gml:CodeValue datatype may have an optional codeSpace attribute. The purpose of the codeSpace attribute is to identify the codelist from which the value has been taken. Providing the codeSpace, especially with a dereferenceable identifier, allows easy access to further information detailing rules governing the use and meaning of the codelist's values and metadata. There may also be rules defining such things as the use and interpretation of values that do not appear in the codelist, the duplication of values, or any other characteristics of a codelist. All such properties and uses of a codelist or codelist value are outside the scope of this CityGML 3.0 Encoding Specification. This does not in any way restrict the ability of an authority sponsoring a codelist, as denoted by the use of a codeSpace attribute, to have application-specific codelist values, rules, interpretations, and uses in an independent specification.

@3DXScape
Copy link
Collaborator

The language of my previous text, that is!

@TatjanaKutzner
Copy link
Contributor Author

I like the text.

@clausnagel
Copy link
Member

Also looks good to me. Thanks, @3DXScape and @PeterParslow

@3DXScape
Copy link
Collaborator

3DXScape commented Jun 6, 2022

A question about Claus' comment April 9: since the gml:CodeType codeSpace attribute is optional can we have the requirements:

  1. if the codeSpace attribute is present and resolves to a valid remote resource, then the content of the gml:CodeType element shall match the gml:identifier property of some entry in a codelist encoded as a gml:Dictionary or it SHALL match some codelist value in a codelist if the encoding is not gml:Dictionary, and
  2. if the codeSpace attribute is absent or does not resolve to a valid remote resource, then the gmlcCodeType element SHALL be a non-empty string.

This seems necessary to me since an gml:CodeType value that does not occur in a codelist referenced via a codeSpace attribute seems pointless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants