Cache building operations in `DataflowPlanBuilder` #1448

plypaul · 2024-10-07T14:06:18Z

This PR adds a few LRU caches to handle building operations within the DataflowPlanBuilder. Since the same metric may be used multiple times in a derived metric (or between queries), there can be significant performance improvements. Please view by commit as there were signature / type changes to making the changes cleaner.

courtneyholcomb

LGTM!

courtneyholcomb · 2024-10-07T18:27:48Z

metricflow/dataflow/builder/dataflow_plan_builder.py

@@ -100,6 +106,10 @@

 logger = logging.getLogger(__name__)

+_MEASURE_RECIPE_CACHE_SIZE = 1000


Curious if we've put much thought into what the size should be? I guess we'll just experiment and tune it accordingly?

These are based on rough sizes of semantic manifests and performance tests on the larger ones we've seen. We do need to encapsulate all these caches into a single object and add instrumentation.

courtneyholcomb · 2024-10-07T18:31:14Z

metricflow/dataflow/builder/builder_cache.py

+    def __init__(  # noqa: D107
+        self, find_source_node_recipe_cache_size: int = 1000, build_any_metric_output_node_cache_size: int = 1000
+    ) -> None:
+        self._find_source_node_recipe_cache = LruCache[FindSourceNodeRecipeParameterSet, FindSourceNodeRecipeResult](


It would be pretty cool if we could use the new LruCache while still using decorator syntax to maintain readability of the methods that are cached. But I guess it's also debatable if that's more readable (e.g., it obscures the cache code if you end up needing to debug something there). And probably not worth the effort it would take to make it work anyway...

Yeah, the obscuring of the dependencies with a decorator would make it harder to follow.

plypaul added the Skip Changelog label Oct 7, 2024

cla-bot bot added the cla:yes label Oct 7, 2024

plypaul force-pushed the p--short-term-perf--22 branch from bf1803c to 450e45e Compare October 7, 2024 14:10

plypaul force-pushed the p--short-term-perf--23 branch from 5a2200e to 0c254f2 Compare October 7, 2024 14:10

plypaul marked this pull request as ready for review October 7, 2024 14:14

plypaul force-pushed the p--short-term-perf--22 branch from 450e45e to d44fd4a Compare October 7, 2024 14:15

plypaul force-pushed the p--short-term-perf--23 branch from 0c254f2 to 33b1a10 Compare October 7, 2024 14:15

courtneyholcomb approved these changes Oct 7, 2024

View reviewed changes

plypaul force-pushed the p--short-term-perf--22 branch from d44fd4a to 23754cb Compare October 10, 2024 16:34

plypaul force-pushed the p--short-term-perf--23 branch from 33b1a10 to 8d704e1 Compare October 10, 2024 16:34

plypaul force-pushed the p--short-term-perf--22 branch from 23754cb to ef0eddc Compare October 10, 2024 16:39

Base automatically changed from p--short-term-perf--22 to main October 10, 2024 16:42

plypaul force-pushed the p--short-term-perf--23 branch from 8d704e1 to 8829989 Compare October 10, 2024 16:43

plypaul added 4 commits October 10, 2024 10:41

/* PR_START p--short-term-perf 23 */ Tuplize PredicatePushdownState.

e000c31

Move dataflow plan helper classes.

19af13d

Cache common operations in dataflow plan builder.

56f2d0a

Update snapshots as IDs have changed due to reuse via cache.

a3d5d63

plypaul force-pushed the p--short-term-perf--23 branch from 8829989 to a3d5d63 Compare October 10, 2024 17:42

plypaul merged commit 804bee5 into main Oct 11, 2024
11 checks passed

plypaul deleted the p--short-term-perf--23 branch October 11, 2024 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache building operations in `DataflowPlanBuilder` #1448

Cache building operations in `DataflowPlanBuilder` #1448

plypaul commented Oct 7, 2024

courtneyholcomb left a comment

courtneyholcomb Oct 7, 2024

plypaul Oct 10, 2024

courtneyholcomb Oct 7, 2024

plypaul Oct 10, 2024

		@@ -100,6 +106,10 @@

		logger = logging.getLogger(__name__)

		_MEASURE_RECIPE_CACHE_SIZE = 1000

Cache building operations in DataflowPlanBuilder #1448

Cache building operations in DataflowPlanBuilder #1448

Conversation

plypaul commented Oct 7, 2024

courtneyholcomb left a comment

Choose a reason for hiding this comment

courtneyholcomb Oct 7, 2024

Choose a reason for hiding this comment

plypaul Oct 10, 2024

Choose a reason for hiding this comment

courtneyholcomb Oct 7, 2024

Choose a reason for hiding this comment

plypaul Oct 10, 2024

Choose a reason for hiding this comment

Cache building operations in `DataflowPlanBuilder` #1448

Cache building operations in `DataflowPlanBuilder` #1448