Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ontological data annotation to utilise data sets with OEKG in the future #6

Open
chrwm opened this issue Jul 20, 2022 · 8 comments
Open
Assignees
Labels
question Further information is requested

Comments

@chrwm
Copy link
Member

chrwm commented Jul 20, 2022

Are there requirements to be met when annotating data sets today so that these data sets can be used with OEKG applications in the future?

I understand that the OEKG build upon metadata in RDF format. However, at the moment, no OEmetadata standard has been developed in RDF format. Therefore, currently, OEMetadata v151 in JSON format must be used to annotate tabular data ontologically.
I ask to prevent us from having to put a lot of work into annotating data sets ontologically, which might then not be usable in the future.

@chrwm chrwm added the question Further information is requested label Jul 20, 2022
@adelmemariani
Copy link
Contributor

The only requirement is that all fields in the meta information (under the tables) and that all column names should be associated with some OEO concepts or some concepts from other ontologies (in case OEO does not cover the concept yet). Also this might help.

As the OEKG can be populated repeatedly, we can read the values (via Python) from different formats (i.e., JSON) and update the knowledge graph accordingly. The main point here is that the development of the OEKG is not a static and one-shot operation. It is a recurring and evolving process. In this process, data formats seem not to be very critical, however, conceptualization (matching meta-data to ontological concepts) is important.

@Ludee
Copy link
Member

Ludee commented Sep 22, 2022

That is an important information and should be documented properly.
We need to decide for documentation formats.
Perhaps we create a new subpage under "Ontology" at the OEP or a new section "Knowledge Graph"?
In addition we need a developer documentation. This can be part of the existing OEP-RtD or a separate here.

@chrwm
Copy link
Member Author

chrwm commented Sep 23, 2022

The main point here is that the development of the OEKG is not a static and one-shot operation. It is a recurring and evolving process. In this process, data formats seem not to be very critical, however, conceptualization (matching meta-data to ontological concepts) is important.

From this, I understand that connecting a concept and the data is paramount and how it is done secondary.

For documentation here is how we go about it in the SEDOS project.

We'll use the oemetadata v1.5.1

----- Case1 -----
In cases where there is a single suitable ontology concept in the OEO we'll use the keys subject, isAbout, valueReference as intended

----- Case2 ----- (UPDATED)
In cases where there are multiple ontology concepts in the OEO that are suitable by using them compoundly we'll use them as list of dicts in the isAbout key.

For example: thermal efficiency of a heat power plant (as column in a tabular data set)

The concept thermal efficiency is not (yet, as of 23.09.22) available in the OEO, but the concepts:

(UPDATED example)

"resources": [
        {
            "profile": null,
            "name": null,
            "path": null,
            "format": null,
            "encoding": null,
            "schema": {
                "fields": [
                    {
                        "name": "thermal efficiency",
                        "description": "The column holds the values of the thermal efficiency of a heat power plant",
                        "type": null,
                        "unit": null,
                        "isAbout": [
                            {
                              "name": "heat generation process",
                              "path": "http://openenergy-platform.org/ontology/oeo/oeo-physical/OEO_00010248"
                            },
                            {
                              "name": "energy conversion efficiency",
                              "path": "http://openenergy-platform.org/ontology/oeo/OEO_00140049"
                            }
                        ],
                        "valueReference": [
                            {
                                "value": null,
                                "name": null,
                                "path": null
                            }
                        ]
                    },

We're aware that this violates the use of name (using "" it at least fits its schema) but this is our interpretation of your comment "data formats seem not to be very critical, however, conceptualization (matching meta-data to ontological concepts) is important."
@adelmemariani do you agree or do you have concerns with it?

----- Case3 -----
In cases where there is NO suitable ontology concept in the OEO we'll copy the term used in the data directly to the namekey for further data processing in SEDOS.
Note: This is SEDOS-specific and needed for data processing. Normally one would leave the annotation in isAbout empty.

For example: fantasy power plant paramter

"resources": [
        {
            "profile": null,
            "name": null,
            "path": null,
            "format": null,
            "encoding": null,
            "schema": {
                "fields": [
                    {
                        "name": "fantasy power plant paramter",
                        "description": "The column holds values of a parameter, whose concept is not yet available in the OEO, of a fantasy power plant ",
                        "type": null,
                        "unit": null,
                        "isAbout": [
                            {
                                "name": "fantasy power plant parameter",
                                "path": null
                            }
                        ],
                        "valueReference": [
                            {
                                "value": null,
                                "name": null,
                                "path": null
                            }
                        ]
                    },

@l-emele
Copy link

l-emele commented Sep 29, 2022

In the second case, thermal efficiency can fully expressed with the OEO: 'energy conversion efficiency' 'process attribute of' some 'heat generation process'. That shows also the relation between the two concepts and that is additional information compared to just listing the involved classes.

@chrwm
Copy link
Member Author

chrwm commented Oct 11, 2022

Thanks for the hint! However, the metadata should only annotate concepts and do not have the role of mapping relations. The mapping of relations, in the context of a subset of the OEO, is after all achieved by the OEKG.

@l-emele
Copy link

l-emele commented Oct 13, 2022

I don't think that simply listing classes that are somehow involved is a good solution. We once talked about about an oeo-module for composed classes. So there could then be a composed class XYZ SubClassOf: 'energy conversion efficiency' 'process attribute of' some 'heat generation process' and XYZ then the class referenced in the meta data.

@stap-m : Do you remember if we documented the idea of this oeo-module for composed classes for data annotation and the knowledge graph somewhere?

@stap-m
Copy link
Contributor

stap-m commented Oct 13, 2022

We documented the idea in the etherpad of the 6th project meeting. However, the notes that were taken are not elaborate in any way... That's it:

Discussion on the combination of terms

  • create a new class "warmwasserbedarf" and use it
  • But the number of combination is very high
  • It is not possible in RDF that easy
  • -> create a module in the oeo "compositions"

Clustering the compositions:

  • those related to projections
  • those related to narriatives
  • those related to the study report (authorship, ...)

@chrwm
Copy link
Member Author

chrwm commented Oct 19, 2022

The partners in the SEDOS project will use the oemetadata and OEO in the user role rather than the developer role. Thus, it should be as user-friendly and easy as possible to work with both.
Diving into the axioms seems to be error-prone and this additional workload is difficult to justify from the user's point of view.
I argue for a simple solution for the user and welcome a technical solution in the backend to achieve this, as it seems to have been suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants