-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return different matrix types for online serving #4714
Comments
torch feature is nice. I guess we need to release the "timestamp" constraints in our APIs, since it probably doesn't make too much sense to attach embedding feature with a timestamp? |
The method |
|
@breno-costa that code is a serialization step though. We would want to treat Torch Tensors (or xgb.DMatrix) as a first class data type. The concrete examples I'm thinking of are one hot encoding or impact encoding. It'd be useful for us to handle this for MLEs natively, especially when handling unseen categories. |
This plus sparse tensors/sparse matrices could be a really cool optimization -- less data, faster io, more powerful API. |
Exactly. |
if we can leverage "arrow" as our primary format, then it can be directly converted to pandas/torch with arrow apis i believe |
Cool, I'll check that out. This is basically the next step after vector support to making NLP a first class citizen. |
Is your feature request related to a problem? Please describe.
We should allow Feature Views to return matrices/tensors natively. For example,
torch.tensors
.At the moment, for some features we require the client to serialize the output into a matrix before running inference. Feast should support executing these transformations and serializing the data into matrices for both online and offline retrieval.
Describe the solution you'd like
Describe alternatives you've considered
Not supporting this is the alternative, which is the current state, which leaves users to write their own brittle logic to handle various complexities.
Additional context
@HaoXuAI @tokoko I know we discussed sklearn pipelines in the past and I thought I'd share my thoughts.
The text was updated successfully, but these errors were encountered: