From d4d340730ffdbfd856ded8dba752e5e309af46c9 Mon Sep 17 00:00:00 2001 From: stephanef Date: Wed, 17 Jan 2024 17:00:08 -0500 Subject: [PATCH] Added guideline for creating distribution --- docs/index.html | 1313 +++++++++++++++++++++++++---------------------- 1 file changed, 694 insertions(+), 619 deletions(-) diff --git a/docs/index.html b/docs/index.html index aa99eca..8fd25ab 100644 --- a/docs/index.html +++ b/docs/index.html @@ -20763,33 +20763,83 @@

Distribution Metadata

primary function: efficient and reliable data delivery.

+ +
+

Guidelines for Creating DCAT Distributions

+

+ The following guidelines are designed to help determine the most effective way to structure DCAT + distributions, whether as a single file, a multi-file package, or multiple distributions. The choice depends + on the dataset's characteristics, user needs, and the data's intended use. Consider these guidelines to ensure + your distributions are user-friendly, accessible, and align with best practices in data management. +

+ +

+ When selecting a distribution format, it is important to consider factors such as the interdependence of + files, the ease of user accessibility, the size and downloadability of the data, the frequency of updates, and + the diversity of formats required. A thoughtful approach to these criteria will help in creating a + distribution strategy that is both practical for data providers and beneficial for end-users, enhancing the + overall effectiveness of data sharing and utilization. +

+
+ +
+

File-centric Properties

- This section focuses on the properties central to the file-centric aspects of dcat:Distribution. + This section focuses on the properties central to the file-centric aspects of dcat:Distribution. These properties are crucial for ensuring datasets are accessible and usable in their practical forms, addressing the aspects of data encoding, structure, packaging, presentation, media type, and language.

@@ -21094,625 +21161,633 @@

Distribution Metadata

data-include="./examples/distribution/packaging-format.jsonld"> - -
-
-

Data Quality

- -

The quality of a dataset plays a pivotal role in shaping trust, reusability, and the overall performance of - applications that rely on it. As a result, it is imperative to integrate data quality information seamlessly - into - both the data publishing and consumption processes. This inclusion allows for a thorough evaluation of a - dataset's - quality, thereby determining its suitability for a particular application.

- -

Thorough documentation of data quality significantly streamlines the dataset selection process, enhancing the - likelihood of reuse. Regardless of domain-specific nuances, documenting data quality and explicitly stating - known - quality issues in metadata are fundamental practices. Typically, assessing quality involves multiple - dimensions, - each encapsulating characteristics of importance to both data publishers and consumers.

- -

The Data Quality Vocabulary (DQV) defines machine-readable concepts such as measurements and criteria to assess - quality - across various dimensions [[VOCAB-DQV]]. Tailored heuristics designed for specific assessment scenarios rely - on - quality indicators, which encompass data content, metadata, and human ratings. These indicators offer valuable - insights into the dataset's suitability for its intended purpose.

- -

- In the context of integrating data quality information into DCAT resources (Dataset, Distribution, Data Service, - Dataset Series), the Data Quality Vocabulary [[VOCAB-DQV]] - provides a structured and standardized way to represent and assess quality information for fitness of use. The - key components of DQV - relevant to this discussion are dqv:QualityMeasurement, dqv:Metric, - dqv:Dimension, and the property hasQualityMeasurement. Here's how each of these - elements is used: -

- - -

- Using these DQV elements, data publishers can document the quality of their datasets in a structured and - meaningful way. This documentation includes specific measurements of quality, the criteria used for these - assessments, and the quality dimensions they relate to. The use of DQV thus enhances transparency and helps data - consumers make informed decisions about the suitability of a dataset for their specific needs. -

- -

- The use of shareable controlled vocabularies for dqv:Metric - and dqv:Dimension is highly encouraged - within communities. These standardized vocabularies facilitate consistent and precise communication of data - quality aspects across different datasets and applications. By adopting such vocabularies, communities can - ensure that their data quality metrics and dimensions are universally understood, enhancing interoperability and - the effective use of data across diverse systems and contexts. -

- -
- + +
+

Data Quality

+ +

The quality of a dataset plays a pivotal role in shaping trust, reusability, and the overall performance of + applications that rely on it. As a result, it is imperative to integrate data quality information seamlessly + into + both the data publishing and consumption processes. This inclusion allows for a thorough evaluation of a + dataset's + quality, thereby determining its suitability for a particular application.

+ +

Thorough documentation of data quality significantly streamlines the dataset selection process, enhancing the + likelihood of reuse. Regardless of domain-specific nuances, documenting data quality and explicitly stating + known + quality issues in metadata are fundamental practices. Typically, assessing quality involves multiple + dimensions, + each encapsulating characteristics of importance to both data publishers and consumers.

+ +

The Data Quality Vocabulary (DQV) defines machine-readable concepts such as measurements and criteria to + assess + quality + across various dimensions [[VOCAB-DQV]]. Tailored heuristics designed for specific assessment scenarios rely + on + quality indicators, which encompass data content, metadata, and human ratings. These indicators offer valuable + insights into the dataset's suitability for its intended purpose.

-
-

Versioning

+

+ In the context of integrating data quality information into DCAT resources (Dataset, Distribution, Data + Service, + Dataset Series), the Data Quality Vocabulary [[VOCAB-DQV]] + provides a structured and standardized way to represent and assess quality information for fitness of use. The + key components of DQV + relevant to this discussion are dqv:QualityMeasurement, dqv:Metric, + dqv:Dimension, and the property hasQualityMeasurement. Here's how each of these + elements is used: +

+
    +
  • + dqv:QualityMeasurement: This class represents a specific measurement or assessment of + quality. It's a quantifiable value that indicates how well a dataset performs against a particular quality + metric. A dqv:QualityMeasurement instance is + associated with a specific dataset and linked to the metric it measures. +
  • +
  • + dqv:Metric: The dqv:Metric class + represents + the standard or criterion used to assess a particular aspect of quality. Metrics are the yardsticks against + which quality is evaluated. Each metric is typically associated with a quality dimension. For example, a + metric could measure the accuracy of data, its timeliness, or its completeness. +
  • +
  • + dqv:Dimension: This dqv:Dimension class represents the + various dimensions or categories of data quality, such as accuracy, timeliness, or completeness. Quality + dimensions help categorize different aspects of data quality, providing a framework for comprehensive + assessment. +
  • +
  • + Property hasQualityMeasurement: The hasQualityMeasurement property is used to link a + resource to a dqv:QualityMeasurement. It + indicates that the dataset has been evaluated in terms of quality and specifies the measurement. This + linkage + is crucial for conveying the results of quality assessments to data consumers, enabling them to understand + the + quality aspects that have been measured and the outcomes of those measurements. +
  • +
-

Versioning is a concept used to describe the relationship between an original resource and its variations, - updates, or translations. In this section, we explore how DCAT (Data Catalog Vocabulary) is employed to - document - versions resulting from updates or modifications throughout a resource's lifecycle.

+

+ Using these DQV elements, data publishers can document the quality of their datasets in a structured and + meaningful way. This documentation includes specific measurements of quality, the criteria used for these + assessments, and the quality dimensions they relate to. The use of DQV thus enhances transparency and helps + data + consumers make informed decisions about the suitability of a dataset for their specific needs. +

-

DCAT relies on established vocabularies, including the versioning section of the PAV ontology and terms from - [[PAV]], [[DCTERMS]], [[OWL2-OVERVIEW]], and [[VOCAB-ADMS]].

+

+ The use of shareable controlled vocabularies for dqv:Metric + and dqv:Dimension is highly + encouraged + within communities. These standardized vocabularies facilitate consistent and precise communication of data + quality aspects across different datasets and applications. By adopting such vocabularies, communities can + ensure that their data quality metrics and dimensions are universally understood, enhancing interoperability + and + the effective use of data across diverse systems and contexts. +

-

It's important to note that versioning applies to all primary DCAT-US resources, including Catalogs, Catalog - Records, Datasets, Dataset Series, and Distributions.

-

The versioning approach within DCAT is designed to complement existing methods specific to certain resources - (such as versioning properties for ontologies in [[OWL2-OVERVIEW]]) and those prevalent in particular domains. - A - detailed comparison with other vocabularies can be found in section 11.4: Complementary Approaches to - Versioning. -

+
+ -

Versioning is closely linked to community conventions, data management strategies, and existing processes. - Data - providers bear the responsibility of determining when and why a new version should be released.

- -

Handling Dataset Changes

-

Datasets published on the Web are subject to change over time. Some datasets are updated on a regular - schedule, - while others evolve as improvements in data collection methods make updates necessary. To manage these changes - effectively, new versions of a dataset may be created. However, there isn't a unanimous consensus on when - changes - to a dataset should categorize it as an entirely new dataset or simply a new version. Below, we outline - scenarios - where most publishers would agree that a revision warrants consideration as a new version:

- -

Scenarios:

-
    -
  1. - Scenario 1: Creation of a new bus stop that needs to be added to the dataset. -
  2. -
  3. - Scenario 2: Removal of an existing bus stop, necessitating its deletion from the dataset. -
  4. -
  5. - Scenario 3: Identification and correction of an error in one of the existing bus stops - stored - in the dataset. -
  6. -
+
+

Versioning

-

In general, when dealing with datasets that represent time series or spatial series, such as data for - different - regions or years, these are not typically regarded as multiple versions of the same dataset. Instead, each - dataset - covers a distinct set of observations about the world and should be treated as a new dataset. This principle - also - applies to datasets collecting data about weekly weather forecasts for a specific city, where a new dataset is - created each week to store data for that particular week.

- -

Scenario 1 and 2 may trigger major version updates, while Scenario 3 is likely to trigger only a minor - version - update. However, the distinction between minor and major versions is less critical than ensuring that any - changes - are clearly indicated by incrementing the version number. Even for minor changes, maintaining a record of - different dataset versions is essential for ensuring the dataset's reliability. Publishers should be mindful - that - a dataset may be in use by one or more data consumers, and they should take reasonable steps to inform those - consumers when a new version is released. For real-time data, an automated timestamp can serve as a version - identifier. It's crucial for publishers to adopt a consistent and informative approach to versioning for each - dataset, ensuring that data consumers can effectively understand and work with evolving data.

+

Versioning is a concept used to describe the relationship between an original resource and its variations, + updates, or translations. In this section, we explore how DCAT (Data Catalog Vocabulary) is employed to + document + versions resulting from updates or modifications throughout a resource's lifecycle.

+

DCAT relies on established vocabularies, including the versioning section of the PAV ontology and terms from + [[PAV]], [[DCTERMS]], [[OWL2-OVERVIEW]], and [[VOCAB-ADMS]].

- -
+

It's important to note that versioning applies to all primary DCAT-US resources, including Catalogs, Catalog + Records, Datasets, Dataset Series, and Distributions.

+

The versioning approach within DCAT is designed to complement existing methods specific to certain resources + (such as versioning properties for ontologies in [[OWL2-OVERVIEW]]) and those prevalent in particular domains. + A + detailed comparison with other vocabularies can be found in section 11.4: Complementary Approaches to + Versioning. +

- -
-

Dataset Series

-

A Dataset Series is a collection of related datasets that share common characteristics, making them part of a - cohesive group. This section provides guidance on the effective use of Dataset Series within data catalogs, - emphasizing the benefits and considerations for publishers and users alike.

-

A Dataset Series is a way for publishers to convey that a dataset is evolving across specific dimensions and - is - available as a set of related datasets. However, choosing to group datasets this way depends on the use case. - Since it demands extra metadata management from the publisher, it's optional. For instance, a dataset updated - frequently via an API may not require individual records for each yearly snapshot unless the publisher wishes - to - share each snapshot's lifecycle.

+

Versioning is closely linked to community conventions, data management strategies, and existing processes. + Data + providers bear the responsibility of determining when and why a new version should be released.

+ +

Handling Dataset Changes

+

Datasets published on the Web are subject to change over time. Some datasets are updated on a regular + schedule, + while others evolve as improvements in data collection methods make updates necessary. To manage these changes + effectively, new versions of a dataset may be created. However, there isn't a unanimous consensus on when + changes + to a dataset should categorize it as an entirely new dataset or simply a new version. Below, we outline + scenarios + where most publishers would agree that a revision warrants consideration as a new version:

+ +

Scenarios:

+
    +
  1. + Scenario 1: Creation of a new bus stop that needs to be added to the dataset. +
  2. +
  3. + Scenario 2: Removal of an existing bus stop, necessitating its deletion from the dataset. +
  4. +
  5. + Scenario 3: Identification and correction of an error in one of the existing bus stops + stored + in the dataset. +
  6. +
-

Why Use Dataset Series?

-

Implementing Dataset Series offers several advantages:

-
    -
  • Organizational Clarity: Helps categorize and group datasets, making it easier for users - to - find and navigate related sets of data.
  • -
  • Efficient Data Management: Streamlines the management of multiple datasets, providing a - structured approach for updates and maintenance.
  • -
  • User Experience: Enhances data discoverability and understanding, as users can perceive - the - broader context of individual datasets within a collective series.
  • -
-

Guidelines for Implementing Dataset Series

-

When using Dataset Series, consider the following best practices:

-
    -
  • Initiate a Dataset Series exclusively for managing multiple, interconnected datasets, ensuring each +

    In general, when dealing with datasets that represent time series or spatial series, such as data for + different + regions or years, these are not typically regarded as multiple versions of the same dataset. Instead, each dataset - is significant independently and contributes to the series' overall narrative.

  • -
  • Maintain up-to-date metadata for the Dataset Series, reflecting any addition or removal of datasets. - Consider discontinuing the series if it no longer contains any datasets, particularly when persistent - identifiers are employed.
  • -
  • Refrain from categorizing a single, frequently updated dataset as a Dataset Series, and avoid associating - distributions directly with a series. Distributions pertain to individual datasets within the series.
  • -
  • Ensure a coherent and strong thematic or contextual connection among the members of a Dataset Series, - defined by shared attributes such as topic, time frame, or publisher, among others.
  • -
  • Uphold high-quality metadata standards for both individual datasets and the Dataset Series, with specific - series guidelines superseding general practices where necessary.
  • -
-

Expressing Relationships and Connections

-

Articulating the interconnections between datasets in a series is crucial for user understanding and data - management:

-
    -
  • Employ consistent metadata descriptors to clarify the relationships and commonalities within the series. -
  • -
  • Utilize versioning for datasets that evolve or expand over time, helping users track changes and - understand - the dataset's history.
  • -
  • Highlight the distinct features of each dataset, ensuring its standalone value is clear, while also - emphasizing its role in the broader series.
  • -
  • For more complex relationships, especially in automated or tightly interconnected collections, leverage - specific DCAT properties (e.g., next, previous, inSeries, last) to express the nuanced connections. Refer to - the DCAT versioning guidelines for detailed practices.
  • -
-

Impact on Metadata

-

Being part of a Dataset Series may necessitate specific metadata considerations:

-
    -
  • Adjust metadata to emphasize the unique aspects of each dataset within the series, such as different time - periods, geographical areas, or methodologies.
  • -
  • Ensure that metadata reflects the cohesive nature of the series, helping users understand the context and - relationship between individual datasets.
  • -
-
- - + covers a distinct set of observations about the world and should be treated as a new dataset. This principle + also + applies to datasets collecting data about weekly weather forecasts for a specific city, where a new dataset is + created each week to store data for that particular week.

+ +

Scenario 1 and 2 may trigger major version updates, while Scenario 3 is likely to trigger only a minor + version + update. However, the distinction between minor and major versions is less critical than ensuring that any + changes + are clearly indicated by incrementing the version number. Even for minor changes, maintaining a record of + different dataset versions is essential for ensuring the dataset's reliability. Publishers should be mindful + that + a dataset may be in use by one or more data consumers, and they should take reasonable steps to inform those + consumers when a new version is released. For real-time data, an automated timestamp can serve as a version + identifier. It's crucial for publishers to adopt a consistent and informative approach to versioning for each + dataset, ensuring that data consumers can effectively understand and work with evolving data.

-
-

Controlled Vocabularies

+ +
-
-

Importance of Controlled Vocabularies

-

Controlled vocabularies are predetermined sets of terms that have been carefully curated to - ensure consistency, accuracy, and standardized representation of concepts within a specific domain. In the - context of DCAT-US, controlled vocabularies are used to define and constrain the values of specific metadata - elements. These vocabularies enable the creation of a common language for describing datasets, facilitating - data - integration and harmonization across different repositories. -

The use of controlled vocabularies in DCAT-US offers several key benefits: + +

+

Dataset Series

+

A Dataset Series is a collection of related datasets that share common characteristics, making them part of a + cohesive group. This section provides guidance on the effective use of Dataset Series within data catalogs, + emphasizing the benefits and considerations for publishers and users alike.

+

A Dataset Series is a way for publishers to convey that a dataset is evolving across specific dimensions and + is + available as a set of related datasets. However, choosing to group datasets this way depends on the use case. + Since it demands extra metadata management from the publisher, it's optional. For instance, a dataset updated + frequently via an API may not require individual records for each yearly snapshot unless the publisher wishes + to + share each snapshot's lifecycle.

+

Why Use Dataset Series?

+

Implementing Dataset Series offers several advantages:

    -
  • Consistency: By providing a predefined list of terms, controlled vocabularies - ensure - consistent representation and labeling of metadata elements. This consistency promotes data - interoperability - and simplifies data integration efforts, as different datasets can be mapped to a shared set of - controlled - terms.
  • - -
  • Enhanced search and discovery: Controlled vocabularies enable more effective - search and - discovery of datasets. By aligning metadata elements with standardized terms, users can easily - navigate and - explore datasets based on their specific domain knowledge. Furthermore, controlled vocabularies - facilitate the - development of advanced search capabilities, such as faceted search, which allows users to - refine search - results based on predefined categories or facets.
  • - -
  • Data harmonization: In a diverse data landscape where multiple agencies and - organizations - produce and manage datasets, controlled vocabularies help in harmonizing the data - representation. By agreeing - on a set of controlled terms, data publishers can ensure that similar concepts are represented - consistently - across different datasets. This harmonization promotes data integration and interoperability, - enabling - meaningful analysis and comparison of data from various sources.
  • - +
  • Organizational Clarity: Helps categorize and group datasets, making it easier for users + to + find and navigate related sets of data.
  • +
  • Efficient Data Management: Streamlines the management of multiple datasets, providing a + structured approach for updates and maintenance.
  • +
  • User Experience: Enhances data discoverability and understanding, as users can perceive + the + broader context of individual datasets within a collective series.
- -
-
- -

Requirements for controlled vocabularies

- -

The following is a list of requirements that were identified for the controlled vocabularies to - be recommended - in this Application Profile.

- -

Controlled vocabularies SHOULD:

- +

Guidelines for Implementing Dataset Series

+

When using Dataset Series, consider the following best practices:

    -
  • Be published under an open license.
  • -
  • Be operated and/or maintained by an agency of the US Government, by a recognised standards - organization or another trusted organization.
  • -
  • Be properly documented.
  • -
  • Have la"@context": "http://schema.org/",bels in english, and optionally in Spanish
  • -
  • Contain a relatively small number of terms (e.g. 10-25) that are general enough to enable a - wide range of - resources to be classified.
  • -
  • Have terms that are identified by URIs with each URI resolving to documentation about the - term.
  • -
  • Have associated persistence and versioning policies.
  • +
  • Initiate a Dataset Series exclusively for managing multiple, interconnected datasets, ensuring each + dataset + is significant independently and contributes to the series' overall narrative.
  • +
  • Maintain up-to-date metadata for the Dataset Series, reflecting any addition or removal of datasets. + Consider discontinuing the series if it no longer contains any datasets, particularly when persistent + identifiers are employed.
  • +
  • Refrain from categorizing a single, frequently updated dataset as a Dataset Series, and avoid associating + distributions directly with a series. Distributions pertain to individual datasets within the series.
  • +
  • Ensure a coherent and strong thematic or contextual connection among the members of a Dataset Series, + defined by shared attributes such as topic, time frame, or publisher, among others.
  • +
  • Uphold high-quality metadata standards for both individual datasets and the Dataset Series, with specific + series guidelines superseding general practices where necessary.
- -

These criteria do not intend to define a set of requirements for controlled vocabularies in - general; they are - only intended to be used for the selection of the controlled vocabularies that are proposed for - this Application - Profile.

- -
-
- -

Controlled vocabularies to be used

- -

In the table below, a number of properties are listed with controlled vocabularies that MUST be - used for the - listed properties. The declaration of the following controlled vocabularies as mandatory ensures a - minimum level - of interoperability.

- -

Compared with [[DCAT-AP-20200608]], DCAT-US makes use of additional controlled vocabularies - mandated by - [[DATA-GOV-REG]], and operated by the Data.gov Registry - with the only exceptions of the - coordinate reference - systems register maintained by OGC [[OGC-EPSG]].

- -

For two of these controlled vocabularies, namely the NGDA spatial data themes [[NGDA-THEMES]] and - the ISO - topic categories [[ISO-19115-1]], the DCAT-US Working Group has defined a set of harmonised mappings to - the Data.gov Vocabularies Data Themes [[DATA-GOV-THEME]], in order to facilitate the identification of the - relevant theme in [[DATA-GOV-THEME]] for geospatial/statistical metadata.

- -
- -
- -
- -

Other controlled vocabularies

- -

In addition to the proposed common vocabularies in , which are mandatory to ensure minimal - interoperability, - implementers are encouraged to publish and to use further region or domain-specific vocabularies - that are - available online. While those may not be recognised by general implementations of the Application - Profile, they - may serve to increase interoperability across applications in the same region or domain. Examples - are the full - set of concepts in GCMD [[GCMD]],and numerous other schemes.

- -

For geospatial metadata, the working group has identified the following additional vocabularies: -

- +

Expressing Relationships and Connections

+

Articulating the interconnections between datasets in a series is crucial for user understanding and data + management:

    -
  • -

    Geographic identifiers:

    - -
  • -
  • -

    Keywords (with controlled vocabularies):

    - +
  • Employ consistent metadata descriptors to clarify the relationships and commonalities within the series.
  • +
  • Utilize versioning for datasets that evolve or expand over time, helping users track changes and + understand + the dataset's history.
  • +
  • Highlight the distinct features of each dataset, ensuring its standalone value is clear, while also + emphasizing its role in the broader series.
  • +
  • For more complex relationships, especially in automated or tightly interconnected collections, leverage + specific DCAT properties (e.g., next, previous, inSeries, last) to express the nuanced connections. Refer to + the DCAT versioning guidelines for detailed practices.
  • +
+

Impact on Metadata

+

Being part of a Dataset Series may necessitate specific metadata considerations:

+
    +
  • Adjust metadata to emphasize the unique aspects of each dataset within the series, such as different time + periods, geographical areas, or methodologies.
  • +
  • Ensure that metadata reflects the cohesive nature of the series, helping users understand the context and + relationship between individual datasets.
-
+ + +
+ +

Controlled Vocabularies

+ +
+

Importance of Controlled Vocabularies

+

Controlled vocabularies are predetermined sets of terms that have been carefully curated to + ensure consistency, accuracy, and standardized representation of concepts within a specific domain. In the + context of DCAT-US, controlled vocabularies are used to define and constrain the values of specific metadata + elements. These vocabularies enable the creation of a common language for describing datasets, facilitating + data + integration and harmonization across different repositories. + +

The use of controlled vocabularies in DCAT-US offers several key benefits: + +

    +
  • Consistency: By providing a predefined list of terms, controlled vocabularies + ensure + consistent representation and labeling of metadata elements. This consistency promotes data + interoperability + and simplifies data integration efforts, as different datasets can be mapped to a shared set of + controlled + terms.
  • + +
  • Enhanced search and discovery: Controlled vocabularies enable more effective + search and + discovery of datasets. By aligning metadata elements with standardized terms, users can easily + navigate and + explore datasets based on their specific domain knowledge. Furthermore, controlled vocabularies + facilitate the + development of advanced search capabilities, such as faceted search, which allows users to + refine search + results based on predefined categories or facets.
  • + +
  • Data harmonization: In a diverse data landscape where multiple agencies and + organizations + produce and manage datasets, controlled vocabularies help in harmonizing the data + representation. By agreeing + on a set of controlled terms, data publishers can ensure that similar concepts are represented + consistently + across different datasets. This harmonization promotes data integration and interoperability, + enabling + meaningful analysis and comparison of data from various sources.
  • + +
+ +
+
+ +

Requirements for controlled vocabularies

+ +

The following is a list of requirements that were identified for the controlled vocabularies to + be recommended + in this Application Profile.

+ +

Controlled vocabularies SHOULD:

+ +
    +
  • Be published under an open license.
  • +
  • Be operated and/or maintained by an agency of the US Government, by a recognised standards + organization or another trusted organization.
  • +
  • Be properly documented.
  • +
  • Have la"@context": "http://schema.org/",bels in english, and optionally in Spanish
  • +
  • Contain a relatively small number of terms (e.g. 10-25) that are general enough to enable a + wide range of + resources to be classified.
  • +
  • Have terms that are identified by URIs with each URI resolving to documentation about the + term.
  • +
  • Have associated persistence and versioning policies.
  • +
+ +

These criteria do not intend to define a set of requirements for controlled vocabularies in + general; they are + only intended to be used for the selection of the controlled vocabularies that are proposed for + this Application + Profile.

+ +
+
+ +

Controlled vocabularies to be used

+ +

In the table below, a number of properties are listed with controlled vocabularies that MUST be + used for the + listed properties. The declaration of the following controlled vocabularies as mandatory ensures a + minimum level + of interoperability.

+ +

Compared with [[DCAT-AP-20200608]], DCAT-US makes use of additional controlled vocabularies + mandated by + [[DATA-GOV-REG]], and operated by the Data.gov Registry - with the only exceptions of the + coordinate reference + systems register maintained by OGC [[OGC-EPSG]].

+ +

For two of these controlled vocabularies, namely the NGDA spatial data themes [[NGDA-THEMES]] and + the ISO + topic categories [[ISO-19115-1]], the DCAT-US Working Group has defined a set of harmonised mappings to + the Data.gov Vocabularies Data Themes [[DATA-GOV-THEME]], in order to facilitate the identification of the + relevant theme in [[DATA-GOV-THEME]] for geospatial/statistical metadata.

+ +
+ +
+ +
+ +

Other controlled vocabularies

+ +

In addition to the proposed common vocabularies in , which are mandatory to ensure minimal + interoperability, + implementers are encouraged to publish and to use further region or domain-specific vocabularies + that are + available online. While those may not be recognised by general implementations of the Application + Profile, they + may serve to increase interoperability across applications in the same region or domain. Examples + are the full + set of concepts in GCMD [[GCMD]],and numerous other schemes.

+ +

For geospatial metadata, the working group has identified the following additional vocabularies: +

-
+
+ + - -
- -

JSON-LD context file

- -

One common technical question is the format in which the data is being exchanged. - For DCAT-US 3.0 conformance, it is not mandatory that this happens in a RDF serialisation, but the exchanged - format SHOULD be unambiguously be transformable into RDF. - For the format JSON, a popular format to exchange data between systems, DCAT-US profile provides a JSON-LD context - file. - JSON-LD is a W3C Recommendation [[[json-ld11]]] that provided a standard approach to interpret JSON structures - as RDF. The provided JSON-LD context file can be used by implementers to base their data exchange upon, and so - create - a DCAT-US conformant data exchange. This JSON-LD context is not normative, i.e. other JSON-LD contexts are - allowed - to create a a conformant - DCAT-US data exchange. The JSON-LD context file downloadable here.

+
-
- -
+
-

JSON Schemas

+
-

One common technical question is the format in which the data is being exchanged. - For DCAT-US 3.0 conformance, it is not mandatory that this happens in a RDF serialisation, but the exchanged - format SHOULD be unambiguously be transformable into RDF.

-

For JSON, which is a widely adopted format for data exchange between systems, the DCAT-US profile offers an - informative JSON Schema. This schema aids in understanding the structure expected for DCAT-US compliant data - exchanges in JSON format.

+ +
+ +

JSON-LD context file

+ +

One common technical question is the format in which the data is being exchanged. + For DCAT-US 3.0 conformance, it is not mandatory that this happens in a RDF serialisation, but the exchanged + format SHOULD be unambiguously be transformable into RDF. + For the format JSON, a popular format to exchange data between systems, DCAT-US profile provides a JSON-LD context + file. + JSON-LD is a W3C Recommendation [[[json-ld11]]] that provided a standard approach to interpret JSON structures + as RDF. The provided JSON-LD context file can be used by implementers to base their data exchange upon, and so + create + a DCAT-US conformant data exchange. This JSON-LD context is not normative, i.e. other JSON-LD contexts are + allowed + to create a a conformant + DCAT-US data exchange. The JSON-LD context file downloadable here.

-

- JSON Schema offers a compact way to describe and validate the structure and content of JSON data, ensuring - specific formatting and value constraints. However, it's more limited than JSON-LD context and RDF serialization - due to its focus on structure over meaning. -

+
-

- JSON Schema's focus on structural validation forms a contrast with JSON-LD and RDF's capabilities. JSON-LD and - RDF - go beyond just validation, allowing the creation of a graph of interconnected entities that can be easily - integrated and reused across various contexts. This interconnectedness is fundamental to the concept of the - semantic web, where data is not only readable but also comprehensible to machines. -

-

- Specifically, JSON-LD facilitates the representation of data as a graph, making it suitable for more complex, - interlinked data representations, which is a cornerstone of linked data systems. This graph-based approach - stands - in contrast to the tree-like structures that JSON Schema is confined to, limiting its utility in scenarios - requiring extensive data interconnectivity and reusability. -

-

- Implementers can use the provided JSON Schema for their data exchanges, aligning with DCAT-US standards. - However, - it's non-normative, meaning alternatives creating compliant exchanges are also valid. Download the current JSON - Schema here. -

+ +
-
+

JSON Schemas

+

One common technical question is the format in which the data is being exchanged. + For DCAT-US 3.0 conformance, it is not mandatory that this happens in a RDF serialisation, but the exchanged + format SHOULD be unambiguously be transformable into RDF.

+

For JSON, which is a widely adopted format for data exchange between systems, the DCAT-US profile offers an + informative JSON Schema. This schema aids in understanding the structure expected for DCAT-US compliant data + exchanges in JSON format.

- -
+

+ JSON Schema offers a compact way to describe and validate the structure and content of JSON data, ensuring + specific formatting and value constraints. However, it's more limited than JSON-LD context and RDF serialization + due to its focus on structure over meaning. +

-

SHACL Validation

+

+ JSON Schema's focus on structural validation forms a contrast with JSON-LD and RDF's capabilities. JSON-LD and + RDF + go beyond just validation, allowing the creation of a graph of interconnected entities that can be easily + integrated and reused across various contexts. This interconnectedness is fundamental to the concept of the + semantic web, where data is not only readable but also comprehensible to machines. +

+

+ Specifically, JSON-LD facilitates the representation of data as a graph, making it suitable for more complex, + interlinked data representations, which is a cornerstone of linked data systems. This graph-based approach + stands + in contrast to the tree-like structures that JSON Schema is confined to, limiting its utility in scenarios + requiring extensive data interconnectivity and reusability. +

+

+ Implementers can use the provided JSON Schema for their data exchanges, aligning with DCAT-US standards. + However, + it's non-normative, meaning alternatives creating compliant exchanges are also valid. Download the current JSON + Schema here. +

+
-

In order to verify whether a catalog adheres to the stipulated constraints in this Application Profile, the - constraints are articulated utilizing SHACL [[SHACL]]. All constraints in this specification that were amenable - to - SHACL expression translation have been incorporated. Consequently, this set of SHACL expressions can be employed - to construct a validation check for data exchange between two systems, a common scenario being one catalog being - harvested into another.

-

For example, it may be recognized that the data being exchanged doesn't include the organizations' details - since - they are uniquely identified by a deferenceable URI. In this scenario, enforcing rules about the mandatory - presence of a name for each organization may not be pertinent. Rigorously applying the DCAT-AP SHACL expressions - would trigger errors, even though the data is accessible via an alternative route. In this context, it's - acceptable to omit this check during the validation phase.

+ +
-

This example underscores that to achieve an optimal user experience during a validation process, it's crucial - to - consider the actual data transferred between systems and apply only the constraints relevant to the data - exchange. - To facilitate this, the SHACL expressions are organized into separate files, aligning with common validation - configurations.

-

The SHACL application profile for DCAT-US can be found here

+

SHACL Validation

- -
-
-

Namespaces

-

Namespaces and prefixes used in normative parts of this recommendation are shown in the following - table:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +

In order to verify whether a catalog adheres to the stipulated constraints in this Application Profile, the + constraints are articulated utilizing SHACL [[SHACL]]. All constraints in this specification that were amenable + to + SHACL expression translation have been incorporated. Consequently, this set of SHACL expressions can be employed + to construct a validation check for data exchange between two systems, a common scenario being one catalog being + harvested into another.

+ +

For example, it may be recognized that the data being exchanged doesn't include the organizations' details + since + they are uniquely identified by a deferenceable URI. In this scenario, enforcing rules about the mandatory + presence of a name for each organization may not be pertinent. Rigorously applying the DCAT-AP SHACL expressions + would trigger errors, even though the data is accessible via an alternative route. In this context, it's + acceptable to omit this check during the validation phase.

+ +

This example underscores that to achieve an optimal user experience during a validation process, it's crucial + to + consider the actual data transferred between systems and apply only the constraints relevant to the data + exchange. + To facilitate this, the SHACL expressions are organized into separate files, aligning with common validation + configurations.

+

The SHACL application profile for DCAT-US can be found here

+ + + +
+

Namespaces

+

Namespaces and prefixes used in normative parts of this recommendation are shown in the following + table:

+
PrefixNamespace IRISource
admshttp://www.w3.org/ns/adms#[[VOCAB-ADMS]]
cnthttp://www.w3.org/2011/content#[[Content-in-RDF10]]
dcathttps://www.w3.org/TR/vocab-dcat-3/[[VOCAB-DCAT]]
dcat-ushttp://resources.data.gov/ontology/dcat-us#[[DCAT-US]]
dcthttp://purl.org/dc/terms/[[DCTERMS]]
dqvhttps://www.w3.org/TR/vocab-dqv/[[VOCAB-DQV]]
foafhttp://xmlns.com/foaf/0.1/[[FOAF]]
gsphttp://www.opengis.net/ont/geosparql#[[GeoSPARQL]]
locnhttp://www.w3.org/ns/locn#[[LOCN]]
odrshttps://schema.theodi.org/odrs/[[ODRS]]
orghttp://www.w3c.org/ns/org#[[VOCAB-ORG]]
provhttp://www.w3.org/ns/prov#[[PROV]]
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#[[RDF-SYNTAX-GRAMMAR]]
rdfshttp://www.w3.org/2000/01/rdf-schema#[[RDF-SCHEMA]]
schemahttp://schema.org/[[schema-org]]
sdmx-attributehttp://purl.org/linked-data/sdmx/2009/attribute#[[SDMX-ATTRIBUTE]]
skoshttp://www.w3.org/2004/02/skos/core#[[SKOS-REFERENCE]]
spdxhttp://spdx.org/rdf/terms#[[SPDX]]
vcardhttp://www.w3.org/2006/vcard/ns#[[VCARD-RDF]]
xsdhttp://www.w3.org/2001/XMLSchema#[[XMLSCHEMA11-2]]
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - -
PrefixNamespace IRISource
admshttp://www.w3.org/ns/adms#[[VOCAB-ADMS]]
cnthttp://www.w3.org/2011/content#[[Content-in-RDF10]]
dcathttps://www.w3.org/TR/vocab-dcat-3/[[VOCAB-DCAT]]
dcat-ushttp://resources.data.gov/ontology/dcat-us#[[DCAT-US]]
dcthttp://purl.org/dc/terms/[[DCTERMS]]
dqvhttps://www.w3.org/TR/vocab-dqv/[[VOCAB-DQV]]
foafhttp://xmlns.com/foaf/0.1/[[FOAF]]
gsphttp://www.opengis.net/ont/geosparql#[[GeoSPARQL]]
locnhttp://www.w3.org/ns/locn#[[LOCN]]
odrshttps://schema.theodi.org/odrs/[[ODRS]]
orghttp://www.w3c.org/ns/org#[[VOCAB-ORG]]
provhttp://www.w3.org/ns/prov#[[PROV]]
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#[[RDF-SYNTAX-GRAMMAR]]
rdfshttp://www.w3.org/2000/01/rdf-schema#[[RDF-SCHEMA]]
schemahttp://schema.org/[[schema-org]]
sdmx-attributehttp://purl.org/linked-data/sdmx/2009/attribute#[[SDMX-ATTRIBUTE]]
skoshttp://www.w3.org/2004/02/skos/core#[[SKOS-REFERENCE]]
spdxhttp://spdx.org/rdf/terms#[[SPDX]]
vcardhttp://www.w3.org/2006/vcard/ns#[[VCARD-RDF]]
xsdhttp://www.w3.org/2001/XMLSchema#[[XMLSCHEMA11-2]]
-
+ + + \ No newline at end of file