Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve inefficient SPARQL query template #307

Closed
1 of 3 tasks
surchs opened this issue Apr 14, 2024 · 2 comments · Fixed by #308
Closed
1 of 3 tasks

Improve inefficient SPARQL query template #307

surchs opened this issue Apr 14, 2024 · 2 comments · Fixed by #308
Assignees
Labels
bug:performance Performance defects or response time degradation. released This issue/pull request has been released. type:bug Defects in shipped code and fixes for those defects

Comments

@surchs
Copy link
Contributor

surchs commented Apr 14, 2024

Our current SPARQL query template has a couple of problems that make it slow:

Unnecessary nesting of OPTIONAL clauses

api/app/api/utility.py

Lines 232 to 237 in 1e9ef34

SELECT ?subject (count(distinct ?phenotypic_session) as ?num_matching_phenotypic_sessions)
WHERE {{
?subject a nb:Subject.
OPTIONAL {{
?subject nb:hasSession ?phenotypic_session.
?phenotypic_session a nb:PhenotypicSession.

and

api/app/api/utility.py

Lines 258 to 264 in 1e9ef34

SELECT ?subject (count(distinct ?imaging_session) as ?num_matching_imaging_sessions)
WHERE {{
?subject a nb:Subject.
OPTIONAL {{
?subject nb:hasSession ?imaging_session.
?imaging_session nb:hasAcquisition/nb:hasContrastType ?image_modal.
}}

(even worse)

in order of importance

  • by not explicitly putting the desired session type in the triple pattern, we have to traverse the entire graph to see if there is anything that fits the pattern with nb:hasAcquisition/nb:hasContrastType -> very slow!
  • the OPTIONAL statement is not necessary. If a subject does not have a phenotypic or imaging session, we don't need to look any further anyway
  • we have access to ?subject from the outer scope, so no need to restate it here

also:

api/app/api/utility.py

Lines 207 to 211 in 1e9ef34

OPTIONAL {{
?session nb:hasAcquisition/nb:hasContrastType ?image_modal.
OPTIONAL {{
?session nb:hasFilePath ?session_file_path.
}}

is most likely not necessary - it would only help to capture those subjects who do not have any file system path associated and thus would not be datalad gettable.

TODOs:

  • explicitly state session types for pheno and imaging session in the triple patterns of the sub-queries
  • remove superfluous OPTIONAL statements (they are expensive)
  • remove repeated triple patterns in sub-queries

From initial testing, this will let us cut query execution time by an order of magnitude.

See also: neurobagel/planning#142

@surchs surchs added bug:performance Performance defects or response time degradation. type:bug Defects in shipped code and fixes for those defects labels Apr 14, 2024
@surchs surchs self-assigned this Apr 15, 2024
@surchs
Copy link
Contributor Author

surchs commented Apr 15, 2024

Turns out the main thing was really just stating explicitly what session type we want for the imaging session. That cut the query execution time by ~ 90%

@surchs
Copy link
Contributor Author

surchs commented Apr 16, 2024

🚀 Issue was released in v0.2.1 🚀

@surchs surchs added the released This issue/pull request has been released. label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:performance Performance defects or response time degradation. released This issue/pull request has been released. type:bug Defects in shipped code and fixes for those defects
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant