Skip to content

[ENH]: perform collection hard deletes from garbage collector #4605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: feat-gc-hard-delete-collection-sysdb
Choose a base branch
from

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented May 22, 2025

Description of changes

This updates the garbage collector to transition soft deleted collections to hard deleted when eligible.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

I updated the proptest with a new transition for deleting collections.

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

n/a

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

codetheweb commented May 22, 2025

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 5b21248 to 5b6a52b Compare May 22, 2025 17:29
@codetheweb codetheweb changed the base branch from feat-validate-file-paths-during-gc to feat-gc-hard-delete-collection-sysdb May 22, 2025 17:29
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from f545524 to 822e31d Compare May 22, 2025 18:06
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 5b6a52b to 65e13b1 Compare May 22, 2025 18:06
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 822e31d to afe5084 Compare May 22, 2025 18:13
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 65e13b1 to 9ab64f5 Compare May 22, 2025 18:13
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from afe5084 to 9aed3e4 Compare May 22, 2025 20:11
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch 2 times, most recently from 8aff535 to d78d710 Compare May 22, 2025 21:54
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes in this file provide a little more debugging info + a new defensive check

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 9aed3e4 to ca845a3 Compare May 22, 2025 22:23
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch 4 times, most recently from 720af86 to c815632 Compare May 22, 2025 23:11
@codetheweb codetheweb changed the title [ENH]: move collection hard deletes to garbage collector [ENH]: perform collection hard deletes from garbage collector May 22, 2025
@codetheweb codetheweb marked this pull request as ready for review May 22, 2025 23:13
@codetheweb codetheweb requested a review from sanketkedia May 22, 2025 23:13
Copy link
Contributor

propel-code-bot bot commented May 22, 2025

Garbage Collector Now Transitions & Finalizes Hard Deletes of Soft-Deleted Collections

This PR refactors and extends the garbage collection system to fully support and perform hard deletion of collections once they have been soft deleted and all eligible versions are pruned. The hard-delete process is now handled within the garbage collector, with systematic checks to ensure no dependent forked collections remain alive before escalation from soft to hard delete, and includes batch APIs to check and finalize deletion status through sysdb. Comprehensive changes span orchestrators, computation operators, sysdb, test harnesses, and the supporting types.

Key Changes:
• Garbage Collector orchestrator expanded to process soft-deleted collections, transitioning them to hard-deleted after all fork descendants are soft deleted.
• Introduced batch_get_collection_soft_delete_status and finish_collection_deletion funcs in sysdb (and GrpcSysDb), including wire-up from orchestrator.
• Operators and orchestrators now propagate soft delete status and ensure versions in soft-deleted collections are always marked for deletion.
• Testing harness and proptests upgraded to validate soft→hard delete transitions and invariants, with new tests for forked/fan-out scenarios.
• Defensive checks and improved debug output for graph invariants and connectivity added to version graph construction.
• API/types expanded for new sysdb batch soft delete status logic.

Affected Areas:
• garbage_collector/orchestrator_v2.rs
• garbage_collector/operators/compute_versions_to_delete_from_graph.rs
• garbage_collector/tests/proptest_helpers/garbage_collector_reference.rs
• garbage_collector/tests/proptest_helpers/garbage_collector_under_test.rs
• sysdb.rs, test_sysdb.rs
• types/api_types.rs
• construct_version_graph_orchestrator.rs

Potential Impact:

Functionality: Introduces new behavior: soft-deleted collections are only fully (hard) deleted by GC after all forked descendants are also soft-deleted; ensures safe pruning and data integrity.

Performance: Minimal overhead added by extra sysdb checks; batch queries optimize status retrieval. No major regressions expected.

Security: No sensitive security changes. Preserves integrity by ensuring hard deletes only when structurally safe.

Scalability: Batch operations and targeted traversal optimize handling for large numbers of collections/forks. Safe for scale.

Review Focus:
• Correctness of soft→hard delete state transition (including forked collections).
• That all sysdb/Grpc interfaces are called safely and with anticipated parameters.
• Tests cover edge conditions: forking, simultaneous deletes, partial deletion trees.
• Defensive checks for graph connectivity, and proper error escalation.
• Thread safety and error handling in new async sysdb/batch methods.

Testing Needed

• Run and inspect all proptests (now covering fork and soft/hard delete edge cases).
• Exercise full fork-and-delete lifecycles; verify that collections only disappear when all descendants are deleted.
• Confirm sysdb batch status matches expectations after GC runs.
• Check debug output and graph connectivity invariants in unusual topologies.

Code Quality Assessment

rust/garbage_collector/src/garbage_collector_orchestrator_v2.rs: Expanded state, control flow, and error handling; separation of soft/hard delete stages is clear.

rust/garbage_collector/src/operators/compute_versions_to_delete_from_graph.rs: Correct separation of logic for normal and soft-deleted collections; new tests clear.

rust/sysdb/src/sysdb.rs: Well-organized batch and finalize APIs; explicit errors and branching for mode.

rust/sysdb/src/test_sysdb.rs: Test helpers now properly simulate new logic.

rust/types/src/api_types.rs: New enums for batch status errors, well-integrated.

rust/garbage_collector/src/construct_version_graph_orchestrator.rs: Defensive, additional error cases; clear.

rust/garbage_collector/tests/proptest_helpers/garbage_collector_reference.rs: Significant refactor and logic addition; explicit, well-annotated. Edge-cases covered.

Best Practices

Error Handling:
• Explicit error propagation
• Defensive checks for invariants and graph structure

Test Coverage:
• Extended and edge-case property tests for all critical new paths

API Design:
• Batch interfaces
• Clear, explicit state transitions

Potential Issues

• If forks are created or deleted rapidly in pathological topologies, timing bugs or race conditions could occur in hard delete escalation (should be caught by proptests).
• Batch sysdb calls may need further optimization or retries if underlying storage becomes inconsistent.
• Backward compatibility for sysdb implementations not supporting new batch/check/finalize APIs is not fully maintained; TODOs present for sqlite/test.

This summary was automatically generated by @propel-code-bot

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from ca845a3 to ccc9b82 Compare May 22, 2025 23:24
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from c815632 to d56d085 Compare May 22, 2025 23:24
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from ccc9b82 to 4dcc3a4 Compare May 23, 2025 00:01
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from d56d085 to ab68c6d Compare May 23, 2025 00:02
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 4dcc3a4 to f2adff1 Compare May 23, 2025 00:49
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from ab68c6d to 3c756a0 Compare May 23, 2025 00:49
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from f2adff1 to 660013f Compare May 23, 2025 01:27
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 3c756a0 to 7e8bfb4 Compare May 23, 2025 01:28
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 660013f to b10b4b0 Compare May 23, 2025 02:08
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 7e8bfb4 to 59c1d1c Compare May 23, 2025 02:09
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from b10b4b0 to 77a9d62 Compare May 23, 2025 17:11
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 59c1d1c to 738703d Compare May 23, 2025 17:12
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch 2 times, most recently from 716f29a to 9356e10 Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch 3 times, most recently from ed65c55 to 230d5ac Compare May 23, 2025 22:56
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection-sysdb branch from 9356e10 to 8253c18 Compare May 23, 2025 23:36
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 230d5ac to cf6858c Compare May 23, 2025 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant