Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing Query Results with Schema Alignment in an Aggregator #112

Open
maartyman opened this issue Jun 9, 2023 · 3 comments
Open

Enhancing Query Results with Schema Alignment in an Aggregator #112

maartyman opened this issue Jun 9, 2023 · 3 comments
Assignees
Labels
challenge technical problem applied to a use case proposal: changes needed 👷

Comments

@maartyman
Copy link

maartyman commented Jun 9, 2023

Pitch

This is a challenge on using aggregators to make a view on your Solid pod data. It proposes a solution to the issue mentioned in the "What's in a POD" paper. We suggest using an agent that allows other parties to query your pod with a SPARQL endpoint, but where the queries are first rewritten based on a mapping.

We focus on a personal health data sharing scenario, inspired by the We Are platform in
Flanders [https://we-are-health.be]. Citizens are asked to fill a health questionnaire known as GGDM. As this pertains personal information, answers to the questions are stored in their pod using a designed GGDM vocabulary. Now assume a regional research survey (RRS) which asks people access to their GGDM data in order to study diabetes. Alice is willing to participate, but only wants to share selected info. Moreover, for her diabetes status, she refers to her health record, which was directly filled in her pod at the hospital. This record using the FHIR vocabulary [7], however. Thus, Alice instructs her Web agent to invoke two schema mappings defining her view for RRS: (1) directly retrieve only selected GGDM answers; and (2) transform my diabetes status from FHIR to GGDM. Now RRS, contacting Alice’s Web agent, may come with a query to retrieve all available GGDM answers, on condition that her diabetes status is positive. POD-QUERY will automatically rewrite this query correctly, checking diabetes status in FHIR and returning only the answers (e.g., eating habits and exercising) that Alice instructed to share. For another example, RRS may
ask how many GGDM answers Alice makes available. In general, arbitrary client queries can be posed, but will be rewritten to answer only Alice wants to make available to this party.

This challenge is in collaboration with UHasselt, they have built a query rewriter for the schema alignment, and we supply the aggregator to create and maintain the view.

Desired solution

The solution should be an aggregator that receive queries and then utilizes the (by UHasselt provided) query rewriter to rewrite the queries based on predetermined mappings. It is important to note that automatic view creation or rule discovery and selection are NOT required for this challenge.

Acceptance criteria

The desired solution should include a user interface (UI) that allows users to select different queries:

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix ggdm: <https://vito.be/schema/ggdm#>
prefix sur:  <https://w3id.org/survey-ontology#>
prefix prov: <http://www.w3.org/ns/prov#>

# On condition that the diabetes status is positive (answer yes to question2),
# retrieve available GGDM answers on age, eating habits, and exercising.
SELECT ?age ?fruits ?exercise
WHERE {
  ?completedQ2 sur:answeredIn ?_s ;
               sur:completesQuestion ggdm:question2 ;
               sur:hasAnswer ggdm:yes .
  ?_s prov:wasAssociatedWith ?person .
  OPTIONAL {
    ?completedQ9_1 sur:completesQuestion ggdm:question9-1 ;
                   sur:hasAnswer ?fruits .
  }
  OPTIONAL {
    ?completedQ10 sur:completesQuestion ggdm:question10 ;
                  sur:hasAnswer ?exercise .
  }
  OPTIONAL {
    ?person foaf:age ?age .
  }
}
prefix sur:  <https://w3id.org/survey-ontology#>

# How many GGDM questions are available?
SELECT ( COUNT(DISTINCT ?completedQuestion) AS ?count )
WHERE {
  ?completedQuestion sur:answeredIn ?session .
}

The first query focuses on the schema alignment aspect, where the hospital records (in the FHIR ontology) will return results for the GGDM query. The second query shows the privatization aspect, not all the questionnaire queries are returned.

@maartyman maartyman added challenge technical problem applied to a use case proposal: pending ❓ labels Jun 9, 2023
@pheyvaer
Copy link
Contributor

pheyvaer commented Jun 9, 2023

Two things about the acceptance criteria

  • Can you provide these different queries?
  • Can you provide something concrete for "effectiveness"? I would think that this depends on the aforementioned queries.

I don't understand why you need a query rewriter for schema alignment when the alignment happens in the aggregator.

@maartyman
Copy link
Author

Made some changes and added the different queries!

@pheyvaer
Copy link
Contributor

pheyvaer commented Sep 6, 2023

@maartyman Can you add concrete steps for the acceptance criteria? You find an example at #120

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge technical problem applied to a use case proposal: changes needed 👷
Projects
None yet
Development

No branches or pull requests

4 participants