Skip to content

[ENH]: soft delete databases, add FinishDatabaseDeletion gRPC method to hard delete databases #4627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: feat-gc-hard-delete-collection
Choose a base branch
from

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented May 23, 2025

Description of changes

Databases are now always soft deleted instead of being hard deleted. Databases are hard deleted when the new FinishDatabaseDeletion method is called (the garbage collector calls this in the next PR).

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Updated existing test to cover soft delete behavior.

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

n/a

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 738703d to af4051e Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from b099250 to c84c9a6 Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from af4051e to ed65c55 Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from c84c9a6 to add6683 Compare May 23, 2025 22:56
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from ed65c55 to 230d5ac Compare May 23, 2025 22:56
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from add6683 to 42b9bff Compare May 23, 2025 22:56
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 230d5ac to cf6858c Compare May 23, 2025 23:36
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch 2 times, most recently from 156ccde to 0ee676b Compare May 23, 2025 23:40
@codetheweb codetheweb changed the title [ENH]: soft delete databases, hard delete from garbage collector [ENH]: soft delete databases, add FinishDatabaseDeletion gRPC method to hard delete databases May 23, 2025
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 0ee676b to 6ff85b5 Compare May 23, 2025 23:42
return nil
})
}

func (tc *Catalog) GetAllDatabases(ctx context.Context, ts types.Timestamp) ([]*model.Database, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method was never used

@codetheweb codetheweb marked this pull request as ready for review May 23, 2025 23:43
@codetheweb codetheweb requested a review from sanketkedia May 23, 2025 23:43
Copy link
Contributor

Implement Soft Delete for Databases and Add FinishDatabaseDeletion gRPC

This PR reworks database deletion semantics in the system: database deletions now perform a soft delete by marking the database as deleted, rather than hard-removing it. A new FinishDatabaseDeletion gRPC method is introduced, enabling hard deletion of databases that were soft deleted and now meet eligibility criteria (e.g., after a cutoff time and only if they have no collections). All relevant code paths, protobuf definitions, Rust and Go interfaces, implementations, and tests are modified to support this two-phase deletion strategy.

Key Changes:
• All database deletions are now soft deletes (is_deleted flag set).
• New FinishDatabaseDeletion gRPC and API flows for hard deletion, including proto definitions, Go, and Rust implementations.
• Coordinator and DAO layers updated to support new deletion logic and eligibility checks (e.g., only hard delete databases with no collections after cutoff).
• Test suite refactored to validate soft deletion, hard deletion, and related behaviors (including cutoff time logic).
• All code paths, APIs, and proto interfaces for hard delete are now unified under the new method; prior never-used methods removed.

Affected Areas:
• Database deletion logic (Go: dao, dbmodel, coordinator, gRPC service, test)
• Proto API (idl/chromadb/proto/coordinator.proto)
• Rust bindings for SysDB/client API and error types
• Test suites for database/collection lifecycle

Potential Impact:

Functionality: Changes how databases are deleted throughout the system; existing delete flows perform only soft delete, requiring explicit follow-up for hard deletion via the new FinishDatabaseDeletion endpoint.

Performance: Batch hard deletion is implemented efficiently (in chunks of 1000); no major perf regressions expected but GC timing may be affected.

Security: Soft delete reduces risk of accidental data loss, but it's critical that the hard delete only acts on eligible databases.

Scalability: Improved deletion scalability via batch hard delete and separation of soft/hard deletion, facilitating large-scale cleanup without blocking key operations.

Review Focus:
• Correctness of soft vs. hard deletion logic and eligibility checks.
• Correct propagation of is_deleted/soft-deleted databases through all query layers.
• API compatibility (clients must adapt to the two-phase delete).
• Backward compatibility: ensure no regressions in existing flows lacking hard-deletion.

Testing Needed

• Verify soft-deleted databases are excluded from normal queries immediately after mark/delete.
• Test FinishDatabaseDeletion: multiple scenarios (recently deleted, already deleted, with/without collections, cutoff time in past/future).
• Full integration test passes (pytest/yarn/cargo as reported).

Code Quality Assessment

go/pkg/sysdb/metastore/db/dao/database.go: Major refactor, clear transaction usage for safe updates, all major code paths adapted.

go/pkg/sysdb/grpc/tenant_database_service_test.go: Robust and expanded test coverage for soft/hard deletion and edge cases.

go/pkg/sysdb/coordinator/table_catalog.go: All API logic correctly updated; unused methods removed.

idl/chromadb/proto/coordinator.proto: Well-structured proto additions, backward compatible.

rust/sysdb/src/sysdb.rs: Fully adapted client interface, with structured error handling.

Best Practices

API Design:
• Two-phase deletion aligns with robust data retention policies
• Proto contracts kept forward and backward compatible in most cases

Testing:
• Tests for both soft and hard deletion cases, cutoff logic

Transactionality:
• Database changes occur in transactions
• Eligibility requirements enforced before physical deletion

Potential Issues

• If a collection reference is not cleaned up properly, hard deletion may not proceed (check for lingering collections).
• External clients may need awareness of new API for ultimate cleanup.
• Potential for time skew between systems when using cutoff.

This summary was automatically generated by @propel-code-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant