Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement key batching for cassandra online store in go feature server. #165

Merged
merged 13 commits into from
Jan 29, 2025

Conversation

vanitabhagwat
Copy link
Collaborator

@vanitabhagwat vanitabhagwat commented Jan 21, 2025

What this PR does / why we need it:

feat: Implement key batching for cassandra online store in go feature server.

Which issue(s) this PR fixes:

Misc

@vanitabhagwat vanitabhagwat changed the title Cassandra go batching feat: Cassandra go batching Jan 21, 2025
@vanitabhagwat vanitabhagwat changed the title feat: Cassandra go batching feat: Implement key batching for cassandra online store in go feature server. Jan 22, 2025
Comment on lines +164 to +168
if !ok {
keyBatchSize = 10.0
log.Warn().Msg("key_batch_size not specified, defaulting to batches of size 10")
}
cassandraConfig.keyBatchSize = int(keyBatchSize.(float64))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want a different default value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the default value to 5.

Limiting the keyBatchSize max to 100 since maximum clustering-key cartesian product size is 100
Comment on lines +562 to +566
if c.keyBatchSize == 1 {
return c.UnbatchedKeysOnlineRead(ctx, entityKeys, featureViewNames, featureNames)
} else {
return c.BatchedKeysOnlineRead(ctx, entityKeys, featureViewNames, featureNames)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need separate implementations? Usually IN clause with 1 value is same as =

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that, using = avoids query planning overhead even though its not that significant. Also, with the current logic where key batching is done, using it for single key queries would be a overkill?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be. There is so much duplicated code between them at the moment. We may need to refactor a bit to avoid code duplication.

@@ -155,6 +160,13 @@ func extractCassandraConfig(onlineStoreConfig map[string]any) (*CassandraConfig,
}
cassandraConfig.requestTimeoutMillis = int64(requestTimeoutMillis.(float64))

keyBatchSize, ok := onlineStoreConfig["key_batch_size"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to add key_batch_size field in Cassandra Config in Python as well. It may fail to deserialize when feature_store.yaml has this field for materialization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +562 to +566
if c.keyBatchSize == 1 {
return c.UnbatchedKeysOnlineRead(ctx, entityKeys, featureViewNames, featureNames)
} else {
return c.BatchedKeysOnlineRead(ctx, entityKeys, featureViewNames, featureNames)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be. There is so much duplicated code between them at the moment. We may need to refactor a bit to avoid code duplication.

vanitabhagwat and others added 3 commits January 29, 2025 11:17
Co-authored-by: Bhargav Dodla <[email protected]>
Co-authored-by: Bhargav Dodla <[email protected]>
Co-authored-by: Bhargav Dodla <[email protected]>
Copy link
Collaborator

@EXPEbdodla EXPEbdodla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring the code for single and batch reads will be done in later PRs.

@vanitabhagwat vanitabhagwat merged commit 9b70f4b into master Jan 29, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants