Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restartability of ReplDev #285

Merged
merged 1 commit into from
Jan 30, 2024
Merged

Restartability of ReplDev #285

merged 1 commit into from
Jan 30, 2024

Conversation

hkadayam
Copy link
Contributor

Following changes are done as part of this commit

  1. Because of ordering of services, restart of homestore doesn't recover the ReplDev does not work. Fixed the ordering and also made logstore and data service start/stop inside ReplService start/stop.

  2. Error handling was not present, as a result tests failure were hard to comprehend. Added error handling of propose, commit, write failures

  3. Aesthetical change to have open_log_store to return future< logstore > instead of callback

  4. Added become_leader() call to ReplDev and fixed test cases to utilize that to ensure write happens/issued on correct leader.

  5. Added testing to restart, write after restart, validate of ReplDev

Following changes are done as part of this commit

1. Because of ordering of services, restart of homestore doesn't recover the ReplDev
does not work. Fixed the ordering and also made logstore and data service start/stop
inside ReplService start/stop.

2. Error handling was not present, as a result tests failure were hard to comprehend.
Added error handling of propose, commit, write failures

3. Aesthetical change to have open_log_store to return future< logstore > instead of
callback

4. Added become_leader() call to ReplDev and fixed test cases to utilize that to ensure
write happens/issued on correct leader.

5. Added testing to restart, write after restart, validate of ReplDev
// to start log store
if (has_log_service() && inp_params.auto_recovery) { m_log_service->start(is_first_time_boot() /* format */); }

m_cp_mgr->start_timer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that HomeObject is also using CP (which doesn't rely on any other services), does it make sense to also claim CP as a standalone service? Nothing needs to be changed here, just if cp becomes a service, we might want to treat it like one, such as service->start and inside start, put the timer there and stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think CP itself is providing any service as such, it is an internal requirement which upper layer can tag along. Not sure it will buy us much.

@@ -286,8 +294,8 @@ nlohmann::json LogStoreFamily::dump_log_store(const log_dump_req& dump_req) {
if (dump_req.log_store == nullptr) {
folly::SharedMutexWritePriority::ReadHolder holder(m_store_map_mtx);
Copy link
Contributor

@yamingk yamingk Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: now that mutable is applied to m_store_map_mtx, probably we can make this dump_log_store as const function as well?
you can ignore this comment if it becomes messy or you want to keep as it is.

if (rreq->state.load() & uint32_cast(repl_req_state_t::BLK_ALLOCATED)) {
auto blkid = rreq->local_blkid;
data_service().async_free_blk(blkid).thenValue([blkid](auto&& err) {
HS_LOG_ASSERT(!err, "freeing blkid={} upon error failed, potential to cause blk leak", blkid.to_string());
Copy link
Contributor

@yamingk yamingk Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in free callback, we probably should also reset rreq->state to remove flag BLK_ALLOCATED and DATA_RECEIVED, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the state is to see what stages we have come across. Felt that it would be meaningful while debugging instead of erasing. Once errors are handled, there is not much benefit to these flags other than debugging purpose.

@yamingk
Copy link
Contributor

yamingk commented Jan 23, 2024

update conan version

// Release the buffer which holds the packets
RD_LOG(INFO, "Data Channel: Data push completed for rreq=[{}]", rreq->to_compact_string());
rreq->fb_builder.Release();
rreq->pkts.clear();
});
}

void RaftReplDev::handle_error(repl_req_ptr_t const& rreq, ReplServiceError err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if error happens, should we send a message to notify followers to drop the pushed data and free the coresponding block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, may be we need to, but right now I felt this scenario and many scenarios will be covered by ticket #279 which garbage collects all of them and independently call handle_error().


RD_LOG(TRACE, "Received Raft log_entry=[term={}], journal_entry=[{}] ", lentry->get_term(), jentry->to_string());
// For inline data we don't need to transform anything
if (tmp_jentry->code != journal_type_t::HS_LARGE_DATA) { return nullptr; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: i prefer to change name here from code to req_type, so that it is more clear to reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do as part of my next PR, as I am changing some of these parameters for destroy_repl_dev feature.

@@ -141,14 +165,14 @@ repl_req_ptr_t RaftStateMachine::transform_journal_entry(nuraft::ptr< nuraft::lo
auto new_buf = nuraft::buffer::expand(*rreq->raft_journal_buf(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain when local_size > remote_size will happen?
two homestore instances have different default blocksize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but in other use cases (not HomeObject), blocks allocated are not contiguous (so MultiBlkId could be more than one entry). It could be different on different replicas right? So one proposer it could be contiguous blocks and hence serialized of the BlkId is just 1 byte, whereas it could be multiple in one follower which could 4 blocks and hence 6 bytes. Here we replace them by allocating a fresh buffer and copy them (it should be rare but can happen). For HomeObject use case, this code block will not be excercised.

@@ -120,7 +139,7 @@ void SoloReplService::load_repl_dev(sisl::byte_view const& buf, void* meta_cooki
{
std::unique_lock lg(m_rd_map_mtx);
auto [_, happened] = m_rd_map.emplace(group_id, rdev);
(void) happened;
(void)happened;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:do we need this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember adding this line, but it is needed for optimized build to succeed (unused variable happened)

hs()->data_service().start();
hs()->logstore_service().start(hs()->is_first_time_boot());

// Step 6: Iterate all the repl dev and ask each one of the join the raft group.
Copy link
Contributor

@JacksonYao287 JacksonYao287 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any mechanism to check partial raft_repl_dev?
for example, when creating raft_repl_dev, the leader succeed adding the first follower to the group , but fail to add the second follower. we need also consider this for recovery.

now for this case , we just return an error to upper layer , not do any cleanup. i think it is the reponsibility of the raft repl severice to cleanup up all the partial raft group, even if restart, so that we can make sure no leak of raft_repl_dev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be adding "remove_repl_dev" api as part of next work. There will be adding "destroy" the repl_dev functionality (which we can use for all error handling).

return;
}

// Free the blks which is allocated already
Copy link
Contributor

@yamingk yamingk Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe in follower_create_req function which calls do_alloc_blk, it also needs to check the status and call this handle_error, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Today that is not recoverable error and we crash. I am not sure we can handle the error. Perhaps we can do the discussion and handling as part of "Error recovery/validation" strategy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sounds good

@yamingk
Copy link
Contributor

yamingk commented Jan 29, 2024

Looks good to me except one comment. Other comments are either minor or informational purpose.

@@ -120,15 +144,60 @@ void RaftReplDev::push_data_to_all_followers(repl_req_ptr_t rreq) {

group_msg_service()
->data_service_request_unidirectional(nuraft_mesg::role_regex::ALL, PUSH_DATA, rreq->pkts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the behaviour if the raft log append succeeded but the data push failed for one of the follower ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If data is not pushed to only one follower, leader doesn't get any error. The follower which has not received the data, will do a fetch (after a timeout) and get the data. In the meanwhile quorum would have reached for that log entry and committed (assuming there is one more healthy follower).

RD_LOG(INFO, "Data Channel: Data Write completed rreq=[{}]", rreq->to_compact_string());
if (err) {
RD_DBG_ASSERT(false, "Error in writing data");
handle_error(rreq, ReplServiceError::DRIVE_WRITE_ERROR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any errors, we should free the blks allocated ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_error method does that.

@@ -36,15 +37,20 @@ std::shared_ptr< GenericReplService > GenericReplService::create(cshared< ReplAp

GenericReplService::GenericReplService(cshared< ReplApplication >& repl_app) :
m_repl_app{repl_app}, m_my_uuid{repl_app->get_my_repl_id()} {
m_sb_bufs.reserve(100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some dynamic config MAX_NUM_PG_IN_SYSTEM ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for reserve and not for any tunables? Do we need to clutter the dynamic config with this (also homestore doesn't know anything about PG).


HomeStoreDynamicConfig::init_settings_default();

LOGINFO("Homestore is loading with following services: {}", m_services.list());
if (has_meta_service()) { m_meta_service = std::make_unique< MetaBlkService >(); }
if (has_log_service()) { m_log_service = std::make_unique< LogStoreService >(); }
if (has_data_service()) { m_data_service = std::make_unique< BlkDataService >(std::move(s_custom_chunk_selector)); }
if (has_index_service()) { m_index_service = std::make_unique< IndexService >(std::move(s_index_cbs)); }
if (has_repl_data_service()) {
m_repl_service = GenericReplService::create(std::move(s_repl_app));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repl and data services are not mutually exclusive now with_repl_data_service()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably didn't quiet catch your comment. Can you please elaborate?

@@ -127,16 +131,23 @@ class TestReplicatedDB : public homestore::ReplDevListener {
LOGINFO("[Replica={}] Received rollback on lsn={}", g_helper->replica_num(), lsn);
}

void on_error(ReplServiceError error, const sisl::blob& header, const sisl::blob& key,
cintrusive< repl_req_ctx >& ctx) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert() as we are not expecting any error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, thought about that, but I wanted to extend this for negative check, where propose on leader and check error etc.. So didn't put it here. Later we will add counter and validate that.

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, no more comments from my side, thx for the work @hkadayam

@hkadayam hkadayam merged commit 198657e into eBay:master Jan 30, 2024
20 checks passed
@hkadayam hkadayam deleted the restart_repldev branch January 30, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants