-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement key batching for cassandra online store in go feature server. #165
Conversation
if !ok { | ||
keyBatchSize = 10.0 | ||
log.Warn().Msg("key_batch_size not specified, defaulting to batches of size 10") | ||
} | ||
cassandraConfig.keyBatchSize = int(keyBatchSize.(float64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want a different default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the default value to 5.
Limiting the keyBatchSize max to 100 since maximum clustering-key cartesian product size is 100
if c.keyBatchSize == 1 { | ||
return c.UnbatchedKeysOnlineRead(ctx, entityKeys, featureViewNames, featureNames) | ||
} else { | ||
return c.BatchedKeysOnlineRead(ctx, entityKeys, featureViewNames, featureNames) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need separate implementations? Usually IN clause with 1 value is same as =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that, using = avoids query planning overhead even though its not that significant. Also, with the current logic where key batching is done, using it for single key queries would be a overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be. There is so much duplicated code between them at the moment. We may need to refactor a bit to avoid code duplication.
@@ -155,6 +160,13 @@ func extractCassandraConfig(onlineStoreConfig map[string]any) (*CassandraConfig, | |||
} | |||
cassandraConfig.requestTimeoutMillis = int64(requestTimeoutMillis.(float64)) | |||
|
|||
keyBatchSize, ok := onlineStoreConfig["key_batch_size"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to add key_batch_size
field in Cassandra Config in Python as well. It may fail to deserialize when feature_store.yaml has this field for materialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/cassandra_online_store.py
Outdated
Show resolved
Hide resolved
if c.keyBatchSize == 1 { | ||
return c.UnbatchedKeysOnlineRead(ctx, entityKeys, featureViewNames, featureNames) | ||
} else { | ||
return c.BatchedKeysOnlineRead(ctx, entityKeys, featureViewNames, featureNames) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be. There is so much duplicated code between them at the moment. We may need to refactor a bit to avoid code duplication.
Co-authored-by: Bhargav Dodla <[email protected]>
Co-authored-by: Bhargav Dodla <[email protected]>
Co-authored-by: Bhargav Dodla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring the code for single and batch reads will be done in later PRs.
What this PR does / why we need it:
feat: Implement key batching for cassandra online store in go feature server.
Which issue(s) this PR fixes:
Misc