-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache building operations in DataflowPlanBuilder
#1448
Conversation
bf1803c
to
450e45e
Compare
5a2200e
to
0c254f2
Compare
450e45e
to
d44fd4a
Compare
0c254f2
to
33b1a10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -100,6 +106,10 @@ | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
_MEASURE_RECIPE_CACHE_SIZE = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if we've put much thought into what the size should be? I guess we'll just experiment and tune it accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are based on rough sizes of semantic manifests and performance tests on the larger ones we've seen. We do need to encapsulate all these caches into a single object and add instrumentation.
def __init__( # noqa: D107 | ||
self, find_source_node_recipe_cache_size: int = 1000, build_any_metric_output_node_cache_size: int = 1000 | ||
) -> None: | ||
self._find_source_node_recipe_cache = LruCache[FindSourceNodeRecipeParameterSet, FindSourceNodeRecipeResult]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be pretty cool if we could use the new LruCache
while still using decorator syntax to maintain readability of the methods that are cached. But I guess it's also debatable if that's more readable (e.g., it obscures the cache code if you end up needing to debug something there). And probably not worth the effort it would take to make it work anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the obscuring of the dependencies with a decorator would make it harder to follow.
d44fd4a
to
23754cb
Compare
33b1a10
to
8d704e1
Compare
23754cb
to
ef0eddc
Compare
8d704e1
to
8829989
Compare
8829989
to
a3d5d63
Compare
This PR adds a few LRU caches to handle building operations within the
DataflowPlanBuilder
. Since the same metric may be used multiple times in a derived metric (or between queries), there can be significant performance improvements. Please view by commit as there were signature / type changes to making the changes cleaner.