-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ENH]: sysdb changes to support moving collection hard deletes to garbage collector #4607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
// SetDeleteMode sets the delete mode for testing | ||
func (c *Coordinator) SetDeleteMode(mode DeleteMode) { | ||
c.deleteMode = mode | ||
func (s *Coordinator) BatchGetCollectionSoftDeleteStatus(ctx context.Context, req *coordinatorpb.BatchGetCollectionSoftDeleteStatusRequest) (*coordinatorpb.BatchGetCollectionSoftDeleteStatusResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: should we instead extend the GetCollection
method to allow returning soft deleted collections, and then get the statuses with multiple calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a single call to reduce the roundtrips. GetCollection
should not return soft deleted collections for its current usages.
afe5084
to
9aed3e4
Compare
c570721
to
bcbf758
Compare
9aed3e4
to
ca845a3
Compare
Move Collection Hard Delete Responsibility from sysdb to Garbage Collector, Always Enable Soft Delete This PR refactors the sysdb to always operate in soft-delete mode (removing the previous flag and associated configuration), and shifts the responsibility for hard deletion of collections from the sysdb's background cleaner to the garbage collector. The soft deletion remains immediate upon request, but a new explicit API ( Key Changes: Affected Areas: Potential Impact: Functionality: Removes risk of accidental hard deletion and makes the deletion lifecycle explicit and more controllable. Upgrades API surface by providing new batch query and explicit permanent deletion methods. Performance: Background cleanup logic is removed; no ongoing DB cleanups from sysdb. Deletion batch operations are more efficient than multiple single queries. Security: Explicit hard deletes may be easier to audit; potential window post-soft delete where data is still recoverable up to GC. Scalability: Decoupling hard deletes from sysdb daemon should make deletion scale better and offer more flexible cleanup strategies (GC-managed); batch APIs improve efficiency. Review Focus: Testing Needed• Verify soft delete works for all collection types and is always enabled. Code Quality Assessmentgo/pkg/sysdb/coordinator/coordinator.go: Much simplified, with soft/hard delete modes removed and API surface clarified. go/pkg/sysdb/grpc/collection_service.go: Now exposes explicit FinishCollectionDeletion endpoint; batch status methods correctly implemented. go/pkg/sysdb/metastore/db/dao/collection.go: Batch status methods could use a pre-check for empty input for DB efficiency. idl/chromadb/proto/coordinator.proto: API changes are clear; new methods/fields documented in code. go/pkg/sysdb/metastore/s3/impl.go: Improved robustness in version history checks; further guard checks for nil recommended. go/pkg/common/errors.go: New error codes for API clarity added. tests and mock files: All tests and mocks updated for new behaviors; legacy tests removed as appropriate. go/pkg/sysdb/coordinator/table_catalog.go: Well-structured, good use of error handling, but lock ordering in deletion requires careful scrutiny for deadlock. Best PracticesAPI Design: Data Consistency: Testing/Regression: Code Robustness: Potential Issues• If clients depend on immediate hard deletion or the soft/hard delete mode toggling, this is a breaking change and may require coordination. This summary was automatically generated by @propel-code-bot |
bcbf758
to
f72bd63
Compare
ca845a3
to
ccc9b82
Compare
f72bd63
to
c62bf3f
Compare
ccc9b82
to
4dcc3a4
Compare
c62bf3f
to
d6bf939
Compare
4dcc3a4
to
f2adff1
Compare
d6bf939
to
e0a0f88
Compare
f2adff1
to
660013f
Compare
7f275bb
to
a22cc8a
Compare
b10b4b0
to
77a9d62
Compare
3cd7274
to
8f20043
Compare
716f29a
to
9356e10
Compare
8f20043
to
4a3b406
Compare
9356e10
to
8253c18
Compare
4a3b406
to
e533b0c
Compare
8253c18
to
e1a6311
Compare
return err | ||
} | ||
if rootCollection == nil { | ||
return errors.New("root collection not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think there is an error constant for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not see one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ForkCollection
we directly used common.ErrCollectionNotFound
. Maybe we should have a separate error constant for this. I'm ok with this now
// Remove collection being deleted from the dependencies | ||
updatedDependencies := make([]*coordinatorpb.CollectionVersionDependency, 0) | ||
for _, dependency := range lineageFile.Dependencies { | ||
if dependency.TargetCollectionId != deleteCollection.ID.String() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when dependency.SourceCollectionId == deleteCollection.ID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependency.SourceCollectionId == deleteCollection.ID
should never evaluate to true at this point because only leaf nodes in the dependency tree are eligible for hard deletion
but this would be good to add as an invariant 👍
e1a6311
to
c85ee20
Compare
e533b0c
to
3baca7c
Compare
b2befca
to
950e307
Compare
950e307
to
d1c4e49
Compare
Merge activity
|
Description of changes
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?
n/a