Restartability of ReplDev #285

hkadayam · 2024-01-23T16:56:50Z

Following changes are done as part of this commit

Because of ordering of services, restart of homestore doesn't recover the ReplDev does not work. Fixed the ordering and also made logstore and data service start/stop inside ReplService start/stop.
Error handling was not present, as a result tests failure were hard to comprehend. Added error handling of propose, commit, write failures
Aesthetical change to have open_log_store to return future< logstore > instead of callback
Added become_leader() call to ReplDev and fixed test cases to utilize that to ensure write happens/issued on correct leader.
Added testing to restart, write after restart, validate of ReplDev

Following changes are done as part of this commit 1. Because of ordering of services, restart of homestore doesn't recover the ReplDev does not work. Fixed the ordering and also made logstore and data service start/stop inside ReplService start/stop. 2. Error handling was not present, as a result tests failure were hard to comprehend. Added error handling of propose, commit, write failures 3. Aesthetical change to have open_log_store to return future< logstore > instead of callback 4. Added become_leader() call to ReplDev and fixed test cases to utilize that to ensure write happens/issued on correct leader. 5. Added testing to restart, write after restart, validate of ReplDev

yamingk · 2024-01-23T17:55:03Z

src/lib/homestore.cpp

-    // to start log store
-    if (has_log_service() && inp_params.auto_recovery) { m_log_service->start(is_first_time_boot() /* format */); }
-
+    m_cp_mgr->start_timer();


Now that HomeObject is also using CP (which doesn't rely on any other services), does it make sense to also claim CP as a standalone service? Nothing needs to be changed here, just if cp becomes a service, we might want to treat it like one, such as service->start and inside start, put the timer there and stuff.

I don't think CP itself is providing any service as such, it is an internal requirement which upper layer can tag along. Not sure it will buy us much.

src/lib/logstore/log_store_family.cpp

yamingk · 2024-01-23T18:04:57Z

src/lib/logstore/log_store_family.cpp

@@ -286,8 +294,8 @@ nlohmann::json LogStoreFamily::dump_log_store(const log_dump_req& dump_req) {
    if (dump_req.log_store == nullptr) {
        folly::SharedMutexWritePriority::ReadHolder holder(m_store_map_mtx);


nitpick: now that mutable is applied to m_store_map_mtx, probably we can make this dump_log_store as const function as well?
you can ignore this comment if it becomes messy or you want to keep as it is.

src/lib/replication/repl_dev/raft_repl_dev.cpp

yamingk · 2024-01-23T18:21:11Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

+    if (rreq->state.load() & uint32_cast(repl_req_state_t::BLK_ALLOCATED)) {
+        auto blkid = rreq->local_blkid;
+        data_service().async_free_blk(blkid).thenValue([blkid](auto&& err) {
+            HS_LOG_ASSERT(!err, "freeing blkid={} upon error failed, potential to cause blk leak", blkid.to_string());


in free callback, we probably should also reset rreq->state to remove flag BLK_ALLOCATED and DATA_RECEIVED, right?

The idea of the state is to see what stages we have come across. Felt that it would be meaningful while debugging instead of erasing. Once errors are handled, there is not much benefit to these flags other than debugging purpose.

yamingk · 2024-01-23T18:36:37Z

update conan version

JacksonYao287 · 2024-01-28T03:38:31Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

            // Release the buffer which holds the packets
            RD_LOG(INFO, "Data Channel: Data push completed for rreq=[{}]", rreq->to_compact_string());
            rreq->fb_builder.Release();
            rreq->pkts.clear();
        });
 }

+void RaftReplDev::handle_error(repl_req_ptr_t const& rreq, ReplServiceError err) {


if error happens, should we send a message to notify followers to drop the pushed data and free the coresponding block?

I thought about that, may be we need to, but right now I felt this scenario and many scenarios will be covered by ticket #279 which garbage collects all of them and independently call handle_error().

JacksonYao287 · 2024-01-28T04:11:05Z

src/lib/replication/repl_dev/raft_state_machine.cpp


-    RD_LOG(TRACE, "Received Raft log_entry=[term={}], journal_entry=[{}] ", lentry->get_term(), jentry->to_string());
+        // For inline data we don't need to transform anything
+        if (tmp_jentry->code != journal_type_t::HS_LARGE_DATA) { return nullptr; }


NIT: i prefer to change name here from code to req_type, so that it is more clear to reader

Will do as part of my next PR, as I am changing some of these parameters for destroy_repl_dev feature.

JacksonYao287 · 2024-01-29T02:41:57Z

src/lib/replication/repl_dev/raft_state_machine.cpp

@@ -141,14 +165,14 @@ repl_req_ptr_t RaftStateMachine::transform_journal_entry(nuraft::ptr< nuraft::lo
        auto new_buf = nuraft::buffer::expand(*rreq->raft_journal_buf(),


can you explain when local_size > remote_size will happen?
two homestore instances have different default blocksize?

No, but in other use cases (not HomeObject), blocks allocated are not contiguous (so MultiBlkId could be more than one entry). It could be different on different replicas right? So one proposer it could be contiguous blocks and hence serialized of the BlkId is just 1 byte, whereas it could be multiple in one follower which could 4 blocks and hence 6 bytes. Here we replace them by allocating a fresh buffer and copy them (it should be rare but can happen). For HomeObject use case, this code block will not be excercised.

JacksonYao287 · 2024-01-29T02:44:27Z

src/lib/replication/service/generic_repl_svc.cpp

@@ -120,7 +139,7 @@ void SoloReplService::load_repl_dev(sisl::byte_view const& buf, void* meta_cooki
    {
        std::unique_lock lg(m_rd_map_mtx);
        auto [_, happened] = m_rd_map.emplace(group_id, rdev);
-        (void) happened;
+        (void)happened;


NIT:do we need this line?

I don't remember adding this line, but it is needed for optimized build to succeed (unused variable happened)

JacksonYao287 · 2024-01-29T03:35:36Z

src/lib/replication/service/raft_repl_service.cpp

+    hs()->data_service().start();
+    hs()->logstore_service().start(hs()->is_first_time_boot());
+
+    // Step 6: Iterate all the repl dev and ask each one of the join the raft group.


do we have any mechanism to check partial raft_repl_dev?
for example, when creating raft_repl_dev, the leader succeed adding the first follower to the group , but fail to add the second follower. we need also consider this for recovery.

now for this case , we just return an error to upper layer , not do any cleanup. i think it is the reponsibility of the raft repl severice to cleanup up all the partial raft group, even if restart, so that we can make sure no leak of raft_repl_dev

We will be adding "remove_repl_dev" api as part of next work. There will be adding "destroy" the repl_dev functionality (which we can use for all error handling).

yamingk · 2024-01-29T17:58:59Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

+        return;
+    }
+
+    // Free the blks which is allocated already


i believe in follower_create_req function which calls do_alloc_blk, it also needs to check the status and call this handle_error, correct?

Good point. Today that is not recoverable error and we crash. I am not sure we can handle the error. Perhaps we can do the discussion and handling as part of "Error recovery/validation" strategy.

yes, sounds good

yamingk · 2024-01-29T18:17:35Z

Looks good to me except one comment. Other comments are either minor or informational purpose.

sanebay · 2024-01-29T19:36:17Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

@@ -120,15 +144,60 @@ void RaftReplDev::push_data_to_all_followers(repl_req_ptr_t rreq) {

    group_msg_service()
        ->data_service_request_unidirectional(nuraft_mesg::role_regex::ALL, PUSH_DATA, rreq->pkts)


whats the behaviour if the raft log append succeeded but the data push failed for one of the follower ?

If data is not pushed to only one follower, leader doesn't get any error. The follower which has not received the data, will do a fetch (after a timeout) and get the data. In the meanwhile quorum would have reached for that log entry and committed (assuming there is one more healthy follower).

sanebay · 2024-01-29T19:44:38Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

-            RD_LOG(INFO, "Data Channel: Data Write completed rreq=[{}]", rreq->to_compact_string());
+            if (err) {
+                RD_DBG_ASSERT(false, "Error in writing data");
+                handle_error(rreq, ReplServiceError::DRIVE_WRITE_ERROR);


If there are any errors, we should free the blks allocated ?

handle_error method does that.

sanebay · 2024-01-29T19:49:42Z

src/lib/replication/service/generic_repl_svc.cpp

@@ -36,15 +37,20 @@ std::shared_ptr< GenericReplService > GenericReplService::create(cshared< ReplAp

 GenericReplService::GenericReplService(cshared< ReplApplication >& repl_app) :
        m_repl_app{repl_app}, m_my_uuid{repl_app->get_my_repl_id()} {
+    m_sb_bufs.reserve(100);


some dynamic config MAX_NUM_PG_IN_SYSTEM ?

This is just for reserve and not for any tunables? Do we need to clutter the dynamic config with this (also homestore doesn't know anything about PG).

sanebay · 2024-01-29T19:58:08Z

src/lib/homestore.cpp


    HomeStoreDynamicConfig::init_settings_default();

    LOGINFO("Homestore is loading with following services: {}", m_services.list());
    if (has_meta_service()) { m_meta_service = std::make_unique< MetaBlkService >(); }
-    if (has_log_service()) { m_log_service = std::make_unique< LogStoreService >(); }
-    if (has_data_service()) { m_data_service = std::make_unique< BlkDataService >(std::move(s_custom_chunk_selector)); }
    if (has_index_service()) { m_index_service = std::make_unique< IndexService >(std::move(s_index_cbs)); }
    if (has_repl_data_service()) {
        m_repl_service = GenericReplService::create(std::move(s_repl_app));


Repl and data services are not mutually exclusive now with_repl_data_service()

I probably didn't quiet catch your comment. Can you please elaborate?

sanebay · 2024-01-29T19:59:11Z

src/tests/test_raft_repl_dev.cpp

@@ -127,16 +131,23 @@ class TestReplicatedDB : public homestore::ReplDevListener {
        LOGINFO("[Replica={}] Received rollback on lsn={}", g_helper->replica_num(), lsn);
    }

+    void on_error(ReplServiceError error, const sisl::blob& header, const sisl::blob& key,
+                  cintrusive< repl_req_ctx >& ctx) override {


assert() as we are not expecting any error.

Well, thought about that, but I wanted to extend this for negative check, where propose on leader and check error etc.. So didn't put it here. Later we will add counter and validate that.

JacksonYao287

LGTM, no more comments from my side, thx for the work @hkadayam

hkadayam requested review from yamingk and JacksonYao287 January 23, 2024 16:57

yamingk reviewed Jan 23, 2024

View reviewed changes

src/lib/logstore/log_store_family.cpp Show resolved Hide resolved

yamingk reviewed Jan 23, 2024

View reviewed changes

src/lib/replication/repl_dev/raft_repl_dev.cpp Show resolved Hide resolved

yamingk reviewed Jan 23, 2024

View reviewed changes

src/lib/replication/repl_dev/raft_repl_dev.cpp Show resolved Hide resolved

yamingk reviewed Jan 23, 2024

View reviewed changes

JacksonYao287 reviewed Jan 29, 2024

View reviewed changes

yamingk reviewed Jan 29, 2024

View reviewed changes

sanebay reviewed Jan 29, 2024

View reviewed changes

JacksonYao287 approved these changes Jan 30, 2024

View reviewed changes

yamingk approved these changes Jan 30, 2024

View reviewed changes

sanebay approved these changes Jan 30, 2024

View reviewed changes

hkadayam merged commit 198657e into eBay:master Jan 30, 2024
20 checks passed

hkadayam deleted the restart_repldev branch January 30, 2024 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restartability of ReplDev #285

Restartability of ReplDev #285

hkadayam commented Jan 23, 2024

yamingk Jan 23, 2024

hkadayam Jan 29, 2024

yamingk Jan 23, 2024 •

edited

Loading

yamingk Jan 23, 2024 •

edited

Loading

hkadayam Jan 29, 2024

yamingk commented Jan 23, 2024

JacksonYao287 Jan 28, 2024

hkadayam Jan 29, 2024

JacksonYao287 Jan 28, 2024

hkadayam Jan 29, 2024

JacksonYao287 Jan 29, 2024

hkadayam Jan 29, 2024

JacksonYao287 Jan 29, 2024

hkadayam Jan 29, 2024

JacksonYao287 Jan 29, 2024 •

edited

Loading

hkadayam Jan 29, 2024

yamingk Jan 29, 2024 •

edited

Loading

hkadayam Jan 29, 2024

yamingk Jan 30, 2024

yamingk commented Jan 29, 2024

sanebay Jan 29, 2024

hkadayam Jan 29, 2024

sanebay Jan 29, 2024

hkadayam Jan 29, 2024

sanebay Jan 29, 2024

hkadayam Jan 29, 2024

sanebay Jan 29, 2024

hkadayam Jan 29, 2024

sanebay Jan 29, 2024

hkadayam Jan 29, 2024

JacksonYao287 left a comment

		@@ -286,8 +294,8 @@ nlohmann::json LogStoreFamily::dump_log_store(const log_dump_req& dump_req) {
		if (dump_req.log_store == nullptr) {
		folly::SharedMutexWritePriority::ReadHolder holder(m_store_map_mtx);

		@@ -141,14 +165,14 @@ repl_req_ptr_t RaftStateMachine::transform_journal_entry(nuraft::ptr< nuraft::lo
		auto new_buf = nuraft::buffer::expand(*rreq->raft_journal_buf(),

		@@ -120,15 +144,60 @@ void RaftReplDev::push_data_to_all_followers(repl_req_ptr_t rreq) {

		group_msg_service()
		->data_service_request_unidirectional(nuraft_mesg::role_regex::ALL, PUSH_DATA, rreq->pkts)

Restartability of ReplDev #285

Restartability of ReplDev #285

Conversation

hkadayam commented Jan 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yamingk Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

yamingk Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yamingk commented Jan 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacksonYao287 Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yamingk Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yamingk commented Jan 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacksonYao287 left a comment

Choose a reason for hiding this comment

yamingk Jan 23, 2024 •

edited

Loading

yamingk Jan 23, 2024 •

edited

Loading

JacksonYao287 Jan 29, 2024 •

edited

Loading

yamingk Jan 29, 2024 •

edited

Loading