Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return different matrix types for online serving #4714

Open
franciscojavierarceo opened this issue Oct 29, 2024 · 8 comments
Open

Return different matrix types for online serving #4714

franciscojavierarceo opened this issue Oct 29, 2024 · 8 comments
Assignees
Labels
kind/feature New feature or request

Comments

@franciscojavierarceo
Copy link
Member

Is your feature request related to a problem? Please describe.
We should allow Feature Views to return matrices/tensors natively. For example, torch.tensors.

At the moment, for some features we require the client to serialize the output into a matrix before running inference. Feast should support executing these transformations and serializing the data into matrices for both online and offline retrieval.

Describe the solution you'd like

features: torch.Tensor =  store.get_online_features()

Describe alternatives you've considered
Not supporting this is the alternative, which is the current state, which leaves users to write their own brittle logic to handle various complexities.

Additional context
@HaoXuAI @tokoko I know we discussed sklearn pipelines in the past and I thought I'd share my thoughts.

@franciscojavierarceo franciscojavierarceo added the kind/feature New feature or request label Oct 29, 2024
@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Oct 29, 2024

torch feature is nice. I guess we need to release the "timestamp" constraints in our APIs, since it probably doesn't make too much sense to attach embedding feature with a timestamp?

@breno-costa
Copy link
Contributor

The method store.get_online_features(...) returns an OnlineResponse object that has some conversion methods like to_dict() and to_df(). Should this suggestion be implemented as an another conversion method like to_torch() or something like this?

@franciscojavierarceo
Copy link
Member Author

torch feature is nice. I guess we need to release the "timestamp" constraints in our APIs, since it probably doesn't make too much sense to attach embedding feature with a timestamp?
Agreed.

@franciscojavierarceo
Copy link
Member Author

@breno-costa that code is a serialization step though. We would want to treat Torch Tensors (or xgb.DMatrix) as a first class data type.

The concrete examples I'm thinking of are one hot encoding or impact encoding. It'd be useful for us to handle this for MLEs natively, especially when handling unseen categories.

@dandawg
Copy link
Contributor

dandawg commented Oct 29, 2024

This plus sparse tensors/sparse matrices could be a really cool optimization -- less data, faster io, more powerful API.

@franciscojavierarceo
Copy link
Member Author

This plus sparse tensors/sparse matrices could be a really cool optimization -- less data, faster io, more powerful API.

Exactly.

@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Oct 29, 2024

if we can leverage "arrow" as our primary format, then it can be directly converted to pandas/torch with arrow apis i believe

@franciscojavierarceo
Copy link
Member Author

Cool, I'll check that out. This is basically the next step after vector support to making NLP a first class citizen.

@franciscojavierarceo franciscojavierarceo self-assigned this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants