Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up old logs at the beginning of baseline resync. #643

Merged
merged 1 commit into from
Feb 12, 2025

Conversation

Besroy
Copy link
Contributor

@Besroy Besroy commented Feb 11, 2025

If the follower restarts during baseline resync, it will replay remaining logs first. However, we have already cleared shard info at the beginning of resync, so we cannot get shard while replaying logs, which will raise errors.
This change clean up old logs in log store at the beginning of baseline resync to avoid this situation.

Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my 0.02

The truncate is misleading and as far as we spent effort on removing this part, do not added back, a explicit HomeRaftLogStore::purge_logs() or HomeRaftLogStore::purge_all_logs() might have better readability.

Another suggestion is do not do it in save_logical_snp_obj as it will be occur in every snapshot batch, better to add it to HomeObject::pg_destroy then call down to repl_dev_::purge() , in the implementation of RaftReplDev we clean up logs.

@Besroy Besroy force-pushed the clean_logs_at_the_begin_of_br branch from fbc7b9d to 862b4a3 Compare February 11, 2025 08:41
@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 66.00%. Comparing base (1a0cef8) to head (362387a).
Report is 132 commits behind head on master.

Files with missing lines Patch % Lines
.../lib/replication/log_store/home_raft_log_store.cpp 0.00% 4 Missing ⚠️
src/lib/replication/repl_dev/raft_repl_dev.h 0.00% 2 Missing ⚠️
src/lib/replication/repl_dev/solo_repl_dev.h 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #643      +/-   ##
==========================================
+ Coverage   56.51%   66.00%   +9.49%     
==========================================
  Files         108      109       +1     
  Lines       10300    11000     +700     
  Branches     1402     1506     +104     
==========================================
+ Hits         5821     7261    +1440     
+ Misses       3894     3022     -872     
- Partials      585      717     +132     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Besroy pushed a commit to Besroy/HomeObject that referenced this pull request Feb 11, 2025
@Besroy
Copy link
Contributor Author

Besroy commented Feb 11, 2025

my 0.02

The truncate is misleading and as far as we spent effort on removing this part, do not added back, a explicit HomeRaftLogStore::purge_logs() or HomeRaftLogStore::purge_all_logs() might have better readability.

Another suggestion is do not do it in save_logical_snp_obj as it will be occur in every snapshot batch, better to add it to HomeObject::pg_destroy then call down to repl_dev_::purge() , in the implementation of RaftReplDev we clean up logs.

Thanks! updated to purge and purge_all_logs at HS side. save_logical_snp_obj will be executed at the begin of snapshot, then obj type will be set with snp_obj_id_type_app, so it won't be occur in every snapshot batch. But calling in pg_destroy is more cleaner, will submit a HO pr after HS pr merged.

@Besroy Besroy requested a review from xiaoxichen February 11, 2025 09:40
This change is necessary for baseline resync and can be called by the upper layer to purge existing logs, which resolves the following issue:
If a follower restarts during baseline resync, it will replay the remaining logs first.
However, shard info has already been cleared at the beginning of resync (from the HO side),
making it impossible to retrieve shard info while replaying logs, which results in errors.
@Besroy Besroy force-pushed the clean_logs_at_the_begin_of_br branch from 862b4a3 to 362387a Compare February 11, 2025 09:42
@xiaoxichen
Copy link
Collaborator

Thinking about moving the HeapChunkSelector::reset_pg_chunks down into purge to make HS perge self contain, but it is up to you and @Hooper9973, just a cometic change.

LGTM

@Besroy
Copy link
Contributor Author

Besroy commented Feb 12, 2025

Thinking about moving the HeapChunkSelector::reset_pg_chunks down into purge to make HS perge self contain, but it is up to you and @Hooper9973, just a cometic change.

Are you suggesting that we perform both log cleaning and chunk resetting within the RaftReplDev::purge, allowing HS to fully purge itself without requiring HO to manage chunk resetting? That would be fine. However, we are considering how to achieve this, but haven't found a very elegant solution:
Way 1: Call the selector's reset directly, but this requires HS to understand pg_id or the chunk selector to understand group_id.
Way 2: Add an on_purge method in the listener, but this would create a call chain of HSHomeObject::pg_destroy -> ReplDev::purge -> ReplicationStateMachine::on_purge -> HSHomeObject::purge, which might be too lengthy.

Would it be better if we added a purge_hs_resources method at the HO level, which would encompass both the purge of repl_dev and the reset of the chunk?

@xiaoxichen
Copy link
Collaborator

Yes you are right...lets merge this as is

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JacksonYao287 JacksonYao287 merged commit fb6fd08 into eBay:master Feb 12, 2025
21 checks passed
Besroy pushed a commit to Besroy/HomeObject that referenced this pull request Feb 12, 2025
Besroy pushed a commit to Besroy/HomeObject that referenced this pull request Feb 12, 2025
Besroy pushed a commit to Besroy/HomeObject that referenced this pull request Feb 12, 2025
xiaoxichen pushed a commit to eBay/HomeObject that referenced this pull request Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants