Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication Snapshot and Compact #368

Merged
merged 23 commits into from
Apr 18, 2024
Merged

Replication Snapshot and Compact #368

merged 23 commits into from
Apr 18, 2024

Conversation

yamingk
Copy link
Contributor

@yamingk yamingk commented Apr 6, 2024

Reviewers please focus on resource mgr, home raft repl log, raft repl dev and JournalVDev.
Not much logic change in other layer/files.

Changes:

  1. add snapshot creation to listener (HomeObject PR is in progress) and compact handling (called by raft and in repl dev it will persist compact_lsn as a barrier for truncation).
    Note: this is not a generic snapshot that we will implement for other use cases (e.g. HomeBlks). This snapshot creation is bear minimum for nuraft snapshot and HomeObject use case.
  2. add home raft log store truncation. The truncation will be triggered by nuraft server after a snapshot is created.
    in HomeStore it remembers the compact_lsn and later resource manager will trigger all logstore truncation in the system.
    HomeStore doesn't reserve any extra log items. Reservation is done through dynamic config which used to initialize raft server on startup.
  3. add resource manager resource timer to trigger raft log store truncation as needed.
  4. Add JournalVDev critical high watermark, in its callback it will trigger raft log store truncation immediately so that it doesn't need to want for next timer audit.

Testing:

snapshot creation and truncation can be triggered from the logs with below command line.
./bin/test_raft_repl_dev --gtest_filter=Snapshot_and_Compact --log_mods replication:debug --num_io=999999 --snapshot_distance=200 --num_raft_logs_resv=20000 --res_mgr_audit_timer_ms=120000
it also needs to be added to replication long duration test.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 36.90476% with 53 lines in your changes are missing coverage. Please review.

Project coverage is 57.84%. Comparing base (60808d0) to head (deb52b8).

Files Patch % Lines
src/lib/replication/repl_dev/raft_repl_dev.cpp 0.00% 15 Missing ⚠️
src/lib/common/resource_mgr.cpp 62.85% 10 Missing and 3 partials ⚠️
.../lib/replication/log_store/home_raft_log_store.cpp 0.00% 13 Missing ⚠️
src/lib/replication/log_store/repl_log_store.cpp 0.00% 4 Missing ⚠️
src/lib/replication/repl_dev/raft_repl_dev.h 0.00% 4 Missing ⚠️
...rc/lib/replication/repl_dev/raft_state_machine.cpp 0.00% 2 Missing ⚠️
src/lib/checkpoint/cp_mgr.cpp 0.00% 1 Missing ⚠️
src/lib/replication/service/raft_repl_service.cpp 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #368      +/-   ##
==========================================
+ Coverage   57.76%   57.84%   +0.07%     
==========================================
  Files         108      107       -1     
  Lines        9519     9563      +44     
  Branches     1233     1231       -2     
==========================================
+ Hits         5499     5532      +33     
- Misses       3481     3493      +12     
+ Partials      539      538       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -106,7 +106,7 @@ void LogStoreService::start(bool format) {
}

void LogStoreService::stop() {
device_truncate(nullptr, true, false);
// device_truncate(nullptr, true, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So device_truncate() is called only from resource mgr. Should we call even if there is no resource crunch issue ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this device_truncate is calling into log_dev to do actual truncate, we should not do that without setting the compact_lsn in logstore (through nuraft::compact) because only nuraft knows which should be the correct compact_lsn. We already persisted the compact_lsn in superblock, in reboot, it can catch up in next timer schedule or next critical alert is hit. I don't quite get what is the 'crunch issue' here in your comment.

res_mgr_timer_ms * 1000 * 1000, true /* recurring */, nullptr /* cookie */, iomgr::reactor_regex::all_worker,
[this](void*) {
// all resource timely audit routine should arrive here;
this->trigger_truncate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The periodic execution is affecting UT too. Should we have some condition of vdev size more than some threshold, then only trigger the truncate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify affecting by how? Sorry I probably didn't catch the question. Right now the threshold we have is the critical one, and this one is to a batch truncate for all the log stores in homestore system wide.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some UT mostly on test_journal_vdev.cpp carefully verify the journal offset/size etc in a controlled manner (append some log, check, append some logs, check, truncate, check).

I think the concern here is if the time based truncation introduced, UT follows this pattern is not possible.

Copy link
Contributor Author

@yamingk yamingk Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason we didn't reserve at log_dev, but to let nuraft do the reserve is because we don't want to make changes in log_dev layer, because that will break a lot of assumptions the current UT does in test_journal_vdev and test_log_dev.

I spent more a week on that journey and finally realized to do the reserve in logdev (crying out loud), need to basically redesign all the UTs in those layer...

In summary, the change in this PR won't break any assumption in layer below log store. Multiple truncation during multiple timer event won't cause any side effect (even though no snapshot/compact in between), because they are protected by safe_truncation_boundary which is conducted by compact_lsn also (only source in nuraft::compact call)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern is min(last - reserved, companct_lsn), if nuraft havent called compact, compact_lsn is 0 and periodic timer will truncate all entries

res_mgr_timer_ms * 1000 * 1000, true /* recurring */, nullptr /* cookie */, iomgr::reactor_regex::all_worker,
[this](void*) {
// all resource timely audit routine should arrive here;
this->trigger_truncate();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some UT mostly on test_journal_vdev.cpp carefully verify the journal offset/size etc in a controlled manner (append some log, check, append some logs, check, truncate, check).

I think the concern here is if the time based truncation introduced, UT follows this pattern is not possible.

#endif

// high watermark check for the entire journal vdev;
if (resource_mgr().check_journal_vdev_size(m_vdev.used_size(), m_vdev.size())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the m_event_cb is nullptr , how this expect to link to truncation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@xiaoxichen xiaoxichen Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, it is a bit wired as the code path here seems like

if ( DETECTION)) {
    update_metrics
    do callback to trigger action
}

however the m_event_cb is nullptr and the actual callback is done in the detection part((resource_mgr().check_journal_vdev_size) .

Can I read this as your TODO to move to m_event_cb?

Copy link
Contributor Author

@yamingk yamingk Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The goal is in resource mgr to move all layer_spcecif_registered_cb to general event_cb. Here in this journal vdev, I think the only consumer of log store and we can see if there is still any use case this event_cb is still needed since most of the job is done via resource manager.

Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

res_mgr_timer_ms * 1000 * 1000, true /* recurring */, nullptr /* cookie */, iomgr::reactor_regex::all_worker,
[this](void*) {
// all resource timely audit routine should arrive here;
this->trigger_truncate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern is min(last - reserved, companct_lsn), if nuraft havent called compact, compact_lsn is 0 and periodic timer will truncate all entries

@yamingk yamingk merged commit 6ae31e1 into eBay:master Apr 18, 2024
21 checks passed
@yamingk yamingk deleted the yk_repl_new branch May 14, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants