Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache building operations in DataflowPlanBuilder #1448

Merged
merged 4 commits into from
Oct 11, 2024
Merged

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Oct 7, 2024

This PR adds a few LRU caches to handle building operations within the DataflowPlanBuilder. Since the same metric may be used multiple times in a derived metric (or between queries), there can be significant performance improvements. Please view by commit as there were signature / type changes to making the changes cleaner.

Copy link
Contributor

@courtneyholcomb courtneyholcomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -100,6 +106,10 @@

logger = logging.getLogger(__name__)

_MEASURE_RECIPE_CACHE_SIZE = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if we've put much thought into what the size should be? I guess we'll just experiment and tune it accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are based on rough sizes of semantic manifests and performance tests on the larger ones we've seen. We do need to encapsulate all these caches into a single object and add instrumentation.

def __init__( # noqa: D107
self, find_source_node_recipe_cache_size: int = 1000, build_any_metric_output_node_cache_size: int = 1000
) -> None:
self._find_source_node_recipe_cache = LruCache[FindSourceNodeRecipeParameterSet, FindSourceNodeRecipeResult](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be pretty cool if we could use the new LruCache while still using decorator syntax to maintain readability of the methods that are cached. But I guess it's also debatable if that's more readable (e.g., it obscures the cache code if you end up needing to debug something there). And probably not worth the effort it would take to make it work anyway...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the obscuring of the dependencies with a decorator would make it harder to follow.

Base automatically changed from p--short-term-perf--22 to main October 10, 2024 16:42
@plypaul plypaul merged commit 804bee5 into main Oct 11, 2024
11 checks passed
@plypaul plypaul deleted the p--short-term-perf--23 branch October 11, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants