-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance bottleneck in get_online_features due to repeated metadata resolution from registry #4710
Comments
thanks for digging in. _get_online_request_context is the function meant to be cached introduced by #4041 for that exact purpose. I simply never found time to implement actual caching afterwards. P.S. As for a lot of relevant functions moving to |
Have you thought about any implementation for this cache mechanism? |
That's interesting. I'm not against radical, but I think the best first step would be to keep it implementation-agnostic that could work with all online stores, not just redis. (Although we can have redis-specific caching layer on top) I was thinking of caching The invalidation is the tricky part, I think. The easy solution would be to have some sort of time-based or cache-size based ttl. Better solution would be to somehow bind the cache invalidation/refresh to the registry cache, so online store metadata cache gets invalidated only when registry cache is updated and there are some changes in the underlying metadata. |
Makes a lot of sense.
I was thinking of some similar approach to start with. We could use something like Python's lru cache or cachetools to manage the cache. With lru cache, we can set maxsize only as far as know, but with cachetools, we can set maxsize or ttl.
This solution seems interesting, but I have to take a look in the implementation to see how to connect the dots here. How to know when the registry cache is updated from the online store module? |
+1 to lru_cache |
We've struggled with this as well, and is similar to what I raised in Slack (here). We use a wrapper that uses Feast to be able to extend it, and we use the remote registry. During online-serving, the methods that interact with the registry are:
Then we implement a cache on the def cache_based_on_ttl(
cache_max_size: int, cache_ttl_seconds: int
) -> cachetools.TTLCache | cachetools.LRUCache:
if cache_ttl_seconds == 0:
return cachetools.LRUCache(maxsize=cache_max_size)
return cachetools.TTLCache(
maxsize=cache_max_size,
ttl=cache_ttl_seconds,
) so we enable a TTL Cache, or a LRU Cache. The registry config has a way of passing the Then we override those specific methods so that the cache is used. |
Registries have a The downside of applying cache at the instance level is that instances might not be garbage collected, but that should not be a problem in this case because we're not instantiating a lot of registries. |
@franco-bocci I think this is a little different to what you're doing. I guess with your wrapper you're overcoming the fact that RemoteRegistry doesn't extend CachingRegistry. (and even if it was the way CachingRegistry maintains a cache is less that ideal, but that's a different story). What Breno is suggesting here is to cache metadata wrangling work on a higher level, not just data retrieval from the registry, also the work that feast needs to do to stitch those metadata objects together. @breno-costa sounds good, we can start w/o integration between two caches, I'm fine with that, I don't know how best to connect those dots either. I think it's important for the cache to be ttl-based as in a realistic scenario a single web server might just be serving a single feature service. Invalidation that's size-based only would mean that invalidation never happens in those cases. cachetools looks like a better fit. |
It would solve the problem, but with the current design, we would be applying a cache for a private method, that would be controlled separately than the cache's registry. I agree on the point that we would benefit from caching also that wrangling happening under the hood which what I shared does not solve, but maybe we could model it better (somehow), to avoid spreading caches around? |
I toke a quick look in the codebase. The method online_store.get_online_features receives a def get_online_features(
self,
config: RepoConfig,
features: Union[List[str], FeatureService],
entity_rows: Union[
List[Dict[str, Any]],
Mapping[str, Union[Sequence[Any], Sequence[ValueProto], RepeatedValue]],
],
registry: BaseRegistry,
project: str,
full_feature_names: bool = False,
) -> OnlineResponse: However, the problem is that every registry implementation manages its cache validation in a non standardized way. Some examples:
The best solution seems to be creating a method in With that change in place, we could use the new method within ps: this proposed new registry cache interface can be handled in another issue. |
Is your feature request related to a problem? Please describe.
I'm running some benchmarks with Python SDK and profiling the code to understand more about its execution. Here's the profile report for my benchmark.
If you look at it, the
_prepare_entities_to_read_from_online_store
method and its sub-calls account for more than half of the execution time. For each call, it needs to resolve metadata from registry and this takes a lot of time in a relative comparison to other parts of the code.However, in a ML inference scenario, we usually create a feature service for each ML model application when it's deployed. The ML application calls the method
get_online_features
using the same feature service, i.e. all calls use same metadata. The current SDK implementation creates unnecessary overhead since it resolves same metadata on every call.Describe the solution you'd like
I don't know whether it's possible to make metadata resolution more efficient. If not, a potential solution would be to cache the metadata in the SDK itself. There might be some configuration that turn this caching on/off.
Additional context
I see many functions used to get online features have been moved to
utils.py
. This can make changes and optimizations more complex. There are over 500 lines of code in these util functions.The text was updated successfully, but these errors were encountered: