Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance bottleneck in get_online_features due to repeated metadata resolution from registry #4710

Open
breno-costa opened this issue Oct 28, 2024 · 10 comments
Labels
kind/feature New feature or request

Comments

@breno-costa
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I'm running some benchmarks with Python SDK and profiling the code to understand more about its execution. Here's the profile report for my benchmark.

image

If you look at it, the _prepare_entities_to_read_from_online_store method and its sub-calls account for more than half of the execution time. For each call, it needs to resolve metadata from registry and this takes a lot of time in a relative comparison to other parts of the code.

However, in a ML inference scenario, we usually create a feature service for each ML model application when it's deployed. The ML application calls the method get_online_features using the same feature service, i.e. all calls use same metadata. The current SDK implementation creates unnecessary overhead since it resolves same metadata on every call.

Describe the solution you'd like
I don't know whether it's possible to make metadata resolution more efficient. If not, a potential solution would be to cache the metadata in the SDK itself. There might be some configuration that turn this caching on/off.

Additional context
I see many functions used to get online features have been moved to utils.py. This can make changes and optimizations more complex. There are over 500 lines of code in these util functions.

@breno-costa breno-costa added the kind/feature New feature or request label Oct 28, 2024
@tokoko
Copy link
Collaborator

tokoko commented Oct 28, 2024

thanks for digging in. _get_online_request_context is the function meant to be cached introduced by #4041 for that exact purpose. I simply never found time to implement actual caching afterwards.

P.S. As for a lot of relevant functions moving to utils.py, that was easy and dirty way that allowed me to move get_online_features logic from feature_store.py to online_store.py. Some of the functions online flow depended on was also used by get_historical_features, so I shoved everything in utils instead. Feel free to move anything online store-specific to online_store.py.

@breno-costa
Copy link
Contributor Author

thanks for digging in. _get_online_request_context is the function meant to be cached introduced by #4041 for that exact purpose. I simply never found time to implement actual caching afterwards.

Have you thought about any implementation for this cache mechanism?
We've already created a custom cache implementation here, but it's quite radical. We even cache the hash keys and parts of redis keys as they use feature names. I'm not sure if we'd want to implement this level of caching here, although it is very efficient.

@tokoko
Copy link
Collaborator

tokoko commented Oct 28, 2024

That's interesting. I'm not against radical, but I think the best first step would be to keep it implementation-agnostic that could work with all online stores, not just redis. (Although we can have redis-specific caching layer on top) I was thinking of caching _get_online_request_context call outputs by either feature service name or some sort of a concatenation of feature names if list of features is used instead.

The invalidation is the tricky part, I think. The easy solution would be to have some sort of time-based or cache-size based ttl. Better solution would be to somehow bind the cache invalidation/refresh to the registry cache, so online store metadata cache gets invalidated only when registry cache is updated and there are some changes in the underlying metadata.

@breno-costa
Copy link
Contributor Author

That's interesting. I'm not against radical, but I think the best first step would be to keep it implementation-agnostic that could work with all online stores, not just redis.

Makes a lot of sense.

I was thinking of caching _get_online_request_context call outputs by either feature service name or some sort of a concatenation of feature names if list of features is used instead.

I was thinking of some similar approach to start with. We could use something like Python's lru cache or cachetools to manage the cache. With lru cache, we can set maxsize only as far as know, but with cachetools, we can set maxsize or ttl.

Better solution would be to somehow bind the cache invalidation/refresh to the registry cache, so online store metadata cache gets invalidated only when registry cache is updated and there are some changes in the underlying metadata.

This solution seems interesting, but I have to take a look in the implementation to see how to connect the dots here. How to know when the registry cache is updated from the online store module?

@franciscojavierarceo
Copy link
Member

+1 to lru_cache

@franco-bocci
Copy link
Contributor

We've struggled with this as well, and is similar to what I raised in Slack (here).

We use a wrapper that uses Feast to be able to extend it, and we use the remote registry. During online-serving, the methods that interact with the registry are:

remote_registry_methods_to_cache = (
    "get_feature_service",
    "list_feature_views",
    "list_stream_feature_views",
    "list_on_demand_feature_views",
    "list_entities",
    "get_entity",
)

Then we implement a cache on the RemoteRegistry which is the client that calls the registry server, with something like:

def cache_based_on_ttl(
    cache_max_size: int, cache_ttl_seconds: int
) -> cachetools.TTLCache | cachetools.LRUCache:
    if cache_ttl_seconds == 0:
        return cachetools.LRUCache(maxsize=cache_max_size)

    return cachetools.TTLCache(
        maxsize=cache_max_size,
        ttl=cache_ttl_seconds,
    )

so we enable a TTL Cache, or a LRU Cache. The registry config has a way of passing the ttl_seconds (or something like that), so the existing configuration can be used.

Then we override those specific methods so that the cache is used.

@franco-bocci
Copy link
Contributor

Registries have a def refresh(self, project: Optional[str] = None): method. As the cache would be implemented at an instance level, cache can be invalidated during that call.

The downside of applying cache at the instance level is that instances might not be garbage collected, but that should not be a problem in this case because we're not instantiating a lot of registries.

@tokoko
Copy link
Collaborator

tokoko commented Oct 30, 2024

@franco-bocci I think this is a little different to what you're doing. I guess with your wrapper you're overcoming the fact that RemoteRegistry doesn't extend CachingRegistry. (and even if it was the way CachingRegistry maintains a cache is less that ideal, but that's a different story).

What Breno is suggesting here is to cache metadata wrangling work on a higher level, not just data retrieval from the registry, also the work that feast needs to do to stitch those metadata objects together. _get_online_request_context encapsulates all the methods you listed, plus lots of other work, so that's a better code point to cache imho.

@breno-costa sounds good, we can start w/o integration between two caches, I'm fine with that, I don't know how best to connect those dots either. I think it's important for the cache to be ttl-based as in a realistic scenario a single web server might just be serving a single feature service. Invalidation that's size-based only would mean that invalidation never happens in those cases. cachetools looks like a better fit.

@privatedumbo
Copy link
Contributor

It would solve the problem, but with the current design, we would be applying a cache for a private method, that would be controlled separately than the cache's registry.

I agree on the point that we would benefit from caching also that wrangling happening under the hood which what I shared does not solve, but maybe we could model it better (somehow), to avoid spreading caches around?

@breno-costa
Copy link
Contributor Author

breno-costa commented Oct 30, 2024

I toke a quick look in the codebase. The method online_store.get_online_features receives a registry object as argument. In theory, we can use this registry object to access a "source of truth" method for cache invalidation.

    def get_online_features(
        self,
        config: RepoConfig,
        features: Union[List[str], FeatureService],
        entity_rows: Union[
            List[Dict[str, Any]],
            Mapping[str, Union[Sequence[Any], Sequence[ValueProto], RepeatedValue]],
        ],
        registry: BaseRegistry,
        project: str,
        full_feature_names: bool = False,
    ) -> OnlineResponse:

However, the problem is that every registry implementation manages its cache validation in a non standardized way. Some examples:

The best solution seems to be creating a method in BaseRegistry interface that returns if the registry cache is expired (e.g. def is_cache_expired(self)) and every Registry sub-class must implement this method.

With that change in place, we could use the new method within online_store.get_online_features to invalidate cache for feature metadata. Wdyt?

ps: this proposed new registry cache interface can be handled in another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants