Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determining query containment for the registered queries to improve the scalability of the solid stream aggregator. #103

Open
argahsuknesib opened this issue Apr 20, 2023 · 2 comments
Assignees
Labels
challenge technical problem applied to a use case proposal: pending ❓

Comments

@argahsuknesib
Copy link

Pitch

This challenge is an extension of the challenge 84 and part of the scenario 16. The solid stream aggregator enables a query agent to maintain a continuous view of the stream stored in the solid pod by registering a query. In the scenario, there can be multiple query clients requesting a continuous view of the stream. A naive approach would be to execute each and every query registered by the query agent. However, this approach is not scalable. As the queries to be processed by the aggregator are similar but different queries over common data, it is vital to find the similarities in the queries and execute only the unique queries to improve the scalability of the aggregator. We will use the DAHCC dataset and the solid stream aggregator to test employ the query containment algorithm.

Desired solution

The desired solution is to implement a query containment algorithm to determine which queries are contained in other already registered queries. The query containment algorithm should be able to determine the containment of the queries registered in the RSP-QL syntax. The RSP-QL syntax can be simplified to SPARQL syntax by removing the expressivity required for stream based queries such as window, step, range etc. Therefore, the query containment algorithm should also be able to work with SPARQL queries. The developed algorithm should be able to assist in managing multiple views in the solid project.

Acceptance criteria

To employ the developed query containment algorithm in the query registry of the solid stream aggregator to determine if a newly registered query by a query agent is contained in already registred or executed queries of the query registry.

Pointers

As the topic of aggregation is still a novel research topic, a number of assumptions were taken:

  • Long term server-side authenticated sessions have been solved and therefore the authentication part of this challenge is not taken into account.
  • The containment problem is undecidable over the full SPARQL syntax. Therefore, only a part of the SPARQL syntax is considered.
  • The registered queries are in either in RSP-QL syntax or are SPARQL SELECT queries.

Scenarios

The challenge is part of a larger scenario on Aggregated view on sensitive personal health data streams. The scenario is described in issue 16

@rubensworks
Copy link

Query containment is also needed for mapping queries to indexes such as shapetrees and type indexes, so I'm very interested in this!

@pheyvaer pheyvaer assigned pheyvaer and unassigned RubenVerborgh and pheyvaer Apr 20, 2023
@pheyvaer
Copy link
Contributor

@pbonte Once you are doing with the review of the challenge, can you assign it to me? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge technical problem applied to a use case proposal: pending ❓
Projects
None yet
Development

No branches or pull requests

5 participants