Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve OmegaConfigLoader performance when global/variable interpolations are involved #4322

Closed
ravi-kumar-pilla opened this issue Nov 12, 2024 · 6 comments · Fixed by #4367

Comments

@ravi-kumar-pilla
Copy link
Contributor

Description

Extending on the investigation in #3893 , OmegaConfigLoader lags in resolving catalog configurations when global/variable interpolations are involved.

Context

Some previous observations:
#3893 (comment)
#3893 (comment)

Steps to Reproduce

Run stress test which creates a single catalog with variable interpolations - https://github.com/kedro-org/kedro/blob/test/ocl-bottleneck/kedro_benchmarks/temp_investigate_ocl/ocl_plot_variables.py

Expected Result

Reduce the time spent on below methods (when interpolations are involved, the bottleneck seems to be OmegaConf.to_container)

Actual Result

#3893 (comment)

Your Environment

image
  • Kedro version used (pip show kedro or kedro -V): 0.19.9
  • Python version used (python -V): 3.11
  • Operating system and version: macOS and Sonoma 14.7.1
@noklam
Copy link
Contributor

noklam commented Nov 18, 2024

During the investigation, I found that there was a slow bit in _set_globals_value. I didn't spent enough time to fix it, but with a quick fix it improves roughly from 1.5s -> 0.9s, but there are probably more.

@noklam
Copy link
Contributor

noklam commented Nov 18, 2024

Particularly, there is an obvious slow path global_oc get created and destroyed for every reference of $globals

def _get_globals_value(self, variable: str, default_value: Any = _NO_VALUE) -> Any:
"""Return the globals values to the resolver"""
if variable.startswith("_"):
raise InterpolationResolutionError(
"Keys starting with '_' are not supported for globals."
)
globals_oc = OmegaConf.create(self._globals)

@ravi-kumar-pilla
Copy link
Contributor Author

Implemented suggestions mentioned by @noklam and Matthias (comment). Next steps would be to discuss on the suggestion made here which might need major backend change.

What could be a huge improvement, at least in my case, is to keep using OmegaConf objects in the rest of the kedro project (as opposed to dicts). This will probably be a major backend change but you would then postpone to_container calls as long as possible (and in our case skipping many as we only use a portion of the catalog on every kedro run)

@astrojuanlu
Copy link
Member

@Galileo-Galilei proposed the same in #2973

@astrojuanlu
Copy link
Member

@ravi-kumar-pilla What's the before/after results of this PR? Is there a way to trigger the benchmarks from https://github.com/kedro-org/kedro-benchmark-results/ ?

@ravi-kumar-pilla
Copy link
Contributor Author

@ravi-kumar-pilla What's the before/after results of this PR? Is there a way to trigger the benchmarks from https://github.com/kedro-org/kedro-benchmark-results/ ?

Hi @astrojuanlu , You can find some observations in the description of the PR - #4367 (comment)

I tried running asv benchmark locally but somehow find hard to interpret it. I will try to read on that. To publish the results, I need to tweak the workflow but I am not sure if we can publish the results to kedro-benchmark-results as this is a test branch and not main

@github-project-automation github-project-automation bot moved this from In Review to Done in Kedro Framework Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants