Skip to content

[ENH]: sysdb changes to support moving collection hard deletes to garbage collector #4607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2025

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented May 22, 2025

Description of changes

  • sysdb no longer has a flag for soft delete mode, soft delete is always enabled
  • sysdb no longer automatically transitions soft deleted collections to hard deleted
  • lineage file name is now uuidv7
    • It was previously based on the new target collection ID that was just added to the file, which works when adding dependencies but not when removing dependencies. UUIDv7 is time-sortable and random.
  • added BatchGetCollectionSoftDeleteStatus method (is called/tested in the next PR)

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

n/a

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link

⚠️ The Helm chart was updated without a version bump. Your changes will only be published if the version field in k8s/distributed-chroma/Chart.yaml is updated.

// SetDeleteMode sets the delete mode for testing
func (c *Coordinator) SetDeleteMode(mode DeleteMode) {
c.deleteMode = mode
func (s *Coordinator) BatchGetCollectionSoftDeleteStatus(ctx context.Context, req *coordinatorpb.BatchGetCollectionSoftDeleteStatusRequest) (*coordinatorpb.BatchGetCollectionSoftDeleteStatusResponse, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review: should we instead extend the GetCollection method to allow returning soft deleted collections, and then get the statuses with multiple calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a single call to reduce the roundtrips. GetCollection should not return soft deleted collections for its current usages.

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch 3 times, most recently from afe5084 to 9aed3e4 Compare May 22, 2025 20:11
@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch from c570721 to bcbf758 Compare May 22, 2025 22:23
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 9aed3e4 to ca845a3 Compare May 22, 2025 22:23
@codetheweb codetheweb marked this pull request as ready for review May 22, 2025 23:16
@codetheweb codetheweb requested a review from sanketkedia May 22, 2025 23:16
Copy link
Contributor

propel-code-bot bot commented May 22, 2025

Move Collection Hard Delete Responsibility from sysdb to Garbage Collector, Always Enable Soft Delete

This PR refactors the sysdb to always operate in soft-delete mode (removing the previous flag and associated configuration), and shifts the responsibility for hard deletion of collections from the sysdb's background cleaner to the garbage collector. The soft deletion remains immediate upon request, but a new explicit API (FinishCollectionDeletion) is introduced for initiating hard (permanent) deletion, which downstream automation (GC) should call. The PR also introduces a new batch API for querying soft-deleted statuses (BatchGetCollectionSoftDeleteStatus), replaces lineage file naming with UUIDv7 for better consistency, and makes several adjustments to tests and supporting code for the new deletion flow. Configuration parameters, CLI flags, and k8s deployment values related to soft delete cleanup internals are removed or cleaned up accordingly.

Key Changes:
• Sysdb now always enables soft-delete; config flags and soft/hard delete toggles are removed.
• Responsibility for hard-deleting collections (removal from DB) is shifted to the garbage collector, via the new FinishCollectionDeletion API.
• Lineage file naming is switched to UUIDv7 to avoid race conditions and maintain sortable/random semantics.
• Addition of BatchGetCollectionSoftDeleteStatus (with DB and gRPC plumbing) for efficient status checks.
• All previous soft/hard delete mode code paths, background cleaners, and associated test logic are removed or refactored.
• Protobuf/service definitions updated to reflect new APIs and removed legacy logic/config.
• Extensive refactor of server, coordinator, service, and DAO layers to align with the new explicit deletion flow.

Affected Areas:
• sysdb coordinator, grpc, and catalog logic
• collection create/delete/update APIs (Go and protobuf)
• testing: many unit and integration tests refactored
• S3 lineage/version-file naming and handling
• k8s deployment values, CLI/config handling

Potential Impact:

Functionality: Removes risk of accidental hard deletion and makes the deletion lifecycle explicit and more controllable. Upgrades API surface by providing new batch query and explicit permanent deletion methods.

Performance: Background cleanup logic is removed; no ongoing DB cleanups from sysdb. Deletion batch operations are more efficient than multiple single queries.

Security: Explicit hard deletes may be easier to audit; potential window post-soft delete where data is still recoverable up to GC.

Scalability: Decoupling hard deletes from sysdb daemon should make deletion scale better and offer more flexible cleanup strategies (GC-managed); batch APIs improve efficiency.

Review Focus:
• Deletion lifecycle: guarantee that collections can only be hard deleted via explicit API.
• Concurrency and locking: confirm there is no deadlock or race related to lineage updates, root/fork deletion.
• API backward compatibility, especially for clients expecting the old soft/hard delete behavior.
• Error handling and new error constants (e.g., collection not found vs. not soft deleted).
• Removal of background cleaner and its effect on old code/tests.
• Any potential edge cases in the new lineage update + UUID file naming logic.

Testing Needed

• Verify soft delete works for all collection types and is always enabled.
• Test that only explicit FinishCollectionDeletion requests (by GC or tests) result in hard DB removals.
• Validate lineage file naming with UUIDv7 is always unique and monotonic.
• Exercise new BatchGetCollectionSoftDeleteStatus API with various payloads.
• Run all regression and integration tests for collection lifecycle, particularly for forked/GCed data.
• Test proto and CLI/k8s configurations for expected backward-incompatible changes (esp. removed flags).

Code Quality Assessment

go/pkg/sysdb/coordinator/coordinator.go: Much simplified, with soft/hard delete modes removed and API surface clarified.

go/pkg/sysdb/grpc/collection_service.go: Now exposes explicit FinishCollectionDeletion endpoint; batch status methods correctly implemented.

go/pkg/sysdb/metastore/db/dao/collection.go: Batch status methods could use a pre-check for empty input for DB efficiency.

idl/chromadb/proto/coordinator.proto: API changes are clear; new methods/fields documented in code.

go/pkg/sysdb/metastore/s3/impl.go: Improved robustness in version history checks; further guard checks for nil recommended.

go/pkg/common/errors.go: New error codes for API clarity added.

tests and mock files: All tests and mocks updated for new behaviors; legacy tests removed as appropriate.

go/pkg/sysdb/coordinator/table_catalog.go: Well-structured, good use of error handling, but lock ordering in deletion requires careful scrutiny for deadlock.

Best Practices

API Design:
• Explicit deletion lifecycle; batch APIs for efficiency
• Clear error handling and constants

Data Consistency:
• Use of fine-grained locking for fork/deletion operations
• Unique, monotonic lineage file naming

Testing/Regression:
• Extensive regression suite updated
• Mocks and removals handled cleanly

Code Robustness:
• Null/nil checks improved (though further strengthening suggested in comments)

Potential Issues

• If clients depend on immediate hard deletion or the soft/hard delete mode toggling, this is a breaking change and may require coordination.
• Deadlock is possible if lock order is inconsistent between delete and fork code paths; documentation suggests this has been considered, but must be validated.
• Window between soft and hard deletion may allow for accidental recovery or delay GC-side reclamation.
• Lineage file update logic could race with forking/deletion (mitigated by locking but needs to be verified).

This summary was automatically generated by @propel-code-bot

@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch from bcbf758 to f72bd63 Compare May 22, 2025 23:24
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from ca845a3 to ccc9b82 Compare May 22, 2025 23:24
@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch from f72bd63 to c62bf3f Compare May 23, 2025 00:00
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from ccc9b82 to 4dcc3a4 Compare May 23, 2025 00:01
@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch from c62bf3f to d6bf939 Compare May 23, 2025 00:48
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 4dcc3a4 to f2adff1 Compare May 23, 2025 00:49
@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch from d6bf939 to e0a0f88 Compare May 23, 2025 01:27
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from f2adff1 to 660013f Compare May 23, 2025 01:27
@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch from 7f275bb to a22cc8a Compare May 23, 2025 17:11
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from b10b4b0 to 77a9d62 Compare May 23, 2025 17:11
@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch 2 times, most recently from 3cd7274 to 8f20043 Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch 2 times, most recently from 716f29a to 9356e10 Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch from 8f20043 to 4a3b406 Compare May 23, 2025 23:36
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 9356e10 to 8253c18 Compare May 23, 2025 23:36
@codetheweb codetheweb force-pushed the feat-validate-file-paths-during-gc branch from 4a3b406 to e533b0c Compare May 27, 2025 17:22
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 8253c18 to e1a6311 Compare May 27, 2025 17:22
return err
}
if rootCollection == nil {
return errors.New("root collection not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think there is an error constant for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not see one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ForkCollection we directly used common.ErrCollectionNotFound. Maybe we should have a separate error constant for this. I'm ok with this now

// Remove collection being deleted from the dependencies
updatedDependencies := make([]*coordinatorpb.CollectionVersionDependency, 0)
for _, dependency := range lineageFile.Dependencies {
if dependency.TargetCollectionId != deleteCollection.ID.String() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when dependency.SourceCollectionId == deleteCollection.ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency.SourceCollectionId == deleteCollection.ID should never evaluate to true at this point because only leaf nodes in the dependency tree are eligible for hard deletion
but this would be good to add as an invariant 👍

@codetheweb codetheweb changed the base branch from feat-validate-file-paths-during-gc to graphite-base/4607 May 27, 2025 23:03
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from e1a6311 to c85ee20 Compare May 27, 2025 23:03
@codetheweb codetheweb force-pushed the graphite-base/4607 branch from e533b0c to 3baca7c Compare May 27, 2025 23:03
@graphite-app graphite-app bot changed the base branch from graphite-base/4607 to main May 27, 2025 23:03
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch 3 times, most recently from b2befca to 950e307 Compare May 28, 2025 00:17
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 950e307 to d1c4e49 Compare May 28, 2025 21:14
@codetheweb codetheweb merged commit 3d65425 into main May 28, 2025
72 checks passed
Copy link
Contributor Author

Merge activity

@codetheweb codetheweb deleted the feat-gc-hard-delete-collection-sysdb branch May 28, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants