-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ENH]: sysdb changes to support moving collection hard deletes to garbage collector #4607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
// SetDeleteMode sets the delete mode for testing | ||
func (c *Coordinator) SetDeleteMode(mode DeleteMode) { | ||
c.deleteMode = mode | ||
func (s *Coordinator) BatchGetCollectionSoftDeleteStatus(ctx context.Context, req *coordinatorpb.BatchGetCollectionSoftDeleteStatusRequest) (*coordinatorpb.BatchGetCollectionSoftDeleteStatusResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: should we instead extend the GetCollection
method to allow returning soft deleted collections, and then get the statuses with multiple calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a single call to reduce the roundtrips. GetCollection
should not return soft deleted collections for its current usages.
afe5084
to
9aed3e4
Compare
c570721
to
bcbf758
Compare
9aed3e4
to
ca845a3
Compare
[ENH]: sysdb changes to support moving collection hard deletes to garbage collector This major PR re-architects the sysdb collection deletion flow by enabling soft delete as the only deletion mode and shifting responsibility for hard deletion of collections to an external garbage collector. It removes all prior server-side soft delete flags and automatic hard deletion logic, introduces distinct 'SoftDeleteCollection' and 'FinishCollectionDeletion' (hard delete) code paths, adapts the proto/grpc APIs accordingly, and ensures correct lineage file management with UUIDv7 filenames. A new batch API for efficiently querying soft delete status across multiple collections is also added. Key Changes: Affected Areas: Potential Impact: Functionality: Collection hard deletes are no longer triggered by sysdb itself but must be issued explicitly (e.g., by a garbage collector process). All code and clients must now follow the new two-step deletion process. Legacy flows and flags for soft/hard delete are removed. Performance: Reduced server-side cron-like activity for purging collections; batch APIs for soft delete status improve efficiency for clients like GC. Slight impact in GC latency depending on garbage collector implementation. Security: No direct changes to authorization, but deletion pathways are now more explicit and controlled, reducing accidental permanent data loss. Scalability: Should improve with externalization of collection purging and batch status fetches, and more scalable lineage management with UUIDv7. Review Focus: Testing Needed• Verify the two-step deletion flow: collections should move to soft-delete state and only be hard-deleted via the Code Quality Assessmentgo/pkg/sysdb/coordinator/coordinator.go: Cleaner interface for clients, well-abstracted two-step flow. go/pkg/sysdb/metastore/s3/impl.go: Improved nil checks. Reviewer requested additional guards for robustness, which is acknowledged but could be further improved. go/pkg/sysdb/coordinator/table_catalog.go: Substantial refactor, mostly clear, includes explicit row-level locking for potential race conditions. Some error handling improvements; edge cases are considered. go/pkg/sysdb/grpc/collection_service.go: Clear separation of soft vs. hard delete, improved logging. New gRPC methods are well-documented. go/pkg/sysdb/metastore/db/dao/collection.go: Batch queries implemented as expected; suggestion to check empty input for further efficiency; mostly concise. idl/chromadb/proto/coordinator.proto: Major service definition changes; structure is consistent and well-documented. Best PracticesTesting: Config Hygiene: API Design: Robustness: Potential Issues• Potential for race conditions on lineage file updates (mitigated with row locks, but should be carefully verified under load/concurrent operations) This summary was automatically generated by @propel-code-bot |
bcbf758
to
f72bd63
Compare
ca845a3
to
ccc9b82
Compare
f72bd63
to
c62bf3f
Compare
ccc9b82
to
4dcc3a4
Compare
c62bf3f
to
d6bf939
Compare
4dcc3a4
to
f2adff1
Compare
d6bf939
to
e0a0f88
Compare
f2adff1
to
660013f
Compare
e0a0f88
to
7f275bb
Compare
660013f
to
b10b4b0
Compare
7f275bb
to
a22cc8a
Compare
b10b4b0
to
77a9d62
Compare
3cd7274
to
8f20043
Compare
716f29a
to
9356e10
Compare
8f20043
to
4a3b406
Compare
9356e10
to
8253c18
Compare
4a3b406
to
e533b0c
Compare
8253c18
to
e1a6311
Compare
return err | ||
} | ||
if rootCollection == nil { | ||
return errors.New("root collection not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think there is an error constant for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not see one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ForkCollection
we directly used common.ErrCollectionNotFound
. Maybe we should have a separate error constant for this. I'm ok with this now
// Remove collection being deleted from the dependencies | ||
updatedDependencies := make([]*coordinatorpb.CollectionVersionDependency, 0) | ||
for _, dependency := range lineageFile.Dependencies { | ||
if dependency.TargetCollectionId != deleteCollection.ID.String() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when dependency.SourceCollectionId == deleteCollection.ID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependency.SourceCollectionId == deleteCollection.ID
should never evaluate to true at this point because only leaf nodes in the dependency tree are eligible for hard deletion
but this would be good to add as an invariant 👍
return err | ||
} | ||
|
||
tc.metaDomain.CollectionDb(txCtx).UpdateCollectionLineageFilePath(rootCollection.ID, rootCollection.LineageFileName, newLineageFileFullName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warn: potential race condition with forkCollection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fixed with the lock I added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the lock seems to be on the deleted collection, while we need to lock the root collection to prevent race right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 whoops yes, we need to lock both
sorry tired brain today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
careful about the lock order. not sure if a deadlock is possible as forking lock two collections as well
e1a6311
to
c85ee20
Compare
e533b0c
to
3baca7c
Compare
c0914e0
to
b2befca
Compare
b2befca
to
950e307
Compare
Description of changes
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?
n/a