Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot pool integration, first version #673

Merged
merged 4 commits into from
Jul 31, 2024

Conversation

letFunny
Copy link
Contributor

This PR integrates the pool with the snapshot installation. This is a first version without the real implementation for the callbacks and I/O. It solves a few issues found along the way, namely it includes:

  • pool integration for background work,
  • emulating the async nature of rpc sending by scheduling a callback using libuv,
  • first versions of rpc_fill and is_a_trigger, that means that the code can now generate primitive messages and expect the correct ones,
  • tests now assert that the messages sent are the correct ones,
  • added missing state which I noticed when writing new tests.

See the commit list for more information as the functionality is split into self-contained units.

@letFunny letFunny force-pushed the snapshot-pool-integration branch from a1ab273 to fd908d3 Compare July 17, 2024 09:10
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 84.23237% with 38 lines in your changes missing coverage. Please review.

Project coverage is 77.87%. Comparing base (054b511) to head (defc3f3).
Report is 12 commits behind head on master.

Files Patch % Lines
src/raft/recv_install_snapshot.c 70.00% 8 Missing and 22 partials ⚠️
src/lib/threadpool.c 50.00% 4 Missing and 2 partials ⚠️
test/raft/unit/test_snapshot.c 98.44% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #673      +/-   ##
==========================================
+ Coverage   77.84%   77.87%   +0.02%     
==========================================
  Files         197      197              
  Lines       27761    27953     +192     
  Branches     5493     5529      +36     
==========================================
+ Hits        21611    21767     +156     
- Misses       4333     4346      +13     
- Partials     1817     1840      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@just-now just-now left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first series of comments to test_snapshot.c. review is in progress.

if (sm_state(&leader->rpc.sm) != RPC_FILLED) {
return false;
}
return state == LS_PAGE_READ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's worth defining a macro/function IN() with the following semantics: IN(state, (LS_PAGE_READ, ..., LS_CHECK_F_HAS_SIGS))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the implementation locally to use the IN macro and the only problem is that the macro might be difficult to read and debug, thoughts?

#define IN_1(E, X) E == X
#define IN_2(E, X, ...) E == X || IN_1(E,__VA_ARGS__)
#define IN_3(E, X, ...) E == X || IN_2(E,__VA_ARGS__)
#define IN_4(E, X, ...) E == X || IN_3(E,__VA_ARGS__)
#define IN_5(E, X, ...) E == X || IN_4(E,__VA_ARGS__)
#define IN_6(E, X, ...) E == X || IN_5(E,__VA_ARGS__)
#define IN_7(E, X, ...) E == X || IN_6(E,__VA_ARGS__)
#define IN_8(E, X, ...) E == X || IN_7(E,__VA_ARGS__)
#define IN_9(E, X, ...) E == X || IN_8(E,__VA_ARGS__)

#define GET_IN_MACRO(_1,_2,_3,_4,_5,_6,_7,_8,_9,NAME,...) NAME
#define IN(E, ...) \
  GET_IN_MACRO(__VA_ARGS__,IN_9,IN_8,IN_7,IN_6,IN_5,IN_4,IN_3,IN_2,IN_1)(E,__VA_ARGS__)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this with @cole-miller

/* Decorates the callback used when the pool work is done to set the test
* fixture flag to true, then calls the original callback.*/
static void test_fixture_work_cb(pool_work_t *w) {
global_fixture.work_done = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a good time to invent an implementation of is_main_thread() macro or a function is much closer than we thought... When I looked an wait_work() it wasn't obvious that the spin-lock runs in the same thread with test_fixture_work_cb().

Please put a // PRE(is_main_thread()) to both functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also add some comments to the global_fixture and these functions.

test/raft/unit/test_snapshot.c Show resolved Hide resolved
test/raft/unit/test_snapshot.c Show resolved Hide resolved
test/raft/unit/test_snapshot.c Outdated Show resolved Hide resolved
data->cb(data->s, 0);
}

int uv_sender_send_op(struct sender *s,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, uv_sender_*() functions can be static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget static and const, always.

.ht_create = pool_ht_create_op,
.work_queue = pool_work_queue_op,
.sender_send = uv_sender_send_op,
.is_main_thread = pool_is_main_thread_op,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better not to override is_main_thread().
You have an implementation of pool_is_main_thread_op()
Maybe, it's worth thinking about an implementation designed around uv_key_create(). We need to check if it's a good idea to mix __tls and uv_

Copy link
Contributor Author

@letFunny letFunny Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is __tls? EDIT: probably "thread-local-storage".

MUNIT_UNUSED void *user_data)
{
/* Prevent hangs. */
alarm(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so if you do some changes and the tests breaks in a way that makes wait_work or wait_msg_sent stall, then the execution is terminated after 2s and the test fails but you can still see the results of the rest.

test/raft/unit/test_snapshot.c Outdated Show resolved Hide resolved
test/raft/unit/test_snapshot.c Show resolved Hide resolved
Copy link
Contributor

@just-now just-now left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this patch.

test/raft/unit/test_snapshot.c Show resolved Hide resolved
test/raft/unit/test_snapshot.c Show resolved Hide resolved
if (sm_state(&leader->rpc.sm) != RPC_FILLED) {
return false;
}
return state == LS_PAGE_READ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this with @cole-miller

Comment on lines +531 to +537
PRE(incoming != M_MSG_SENT && incoming != M_TIMEOUT &&
incoming != M_WORK_DONE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need it here?
It's quite clear from the lines above that this assumption is obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason is that from this line on this is a valid pointer. If you were to do msg->type on M_MSG_SENT, or any of the others, the program would crash.

if (UNLIKELY(rc != 0)) {
sm_move(&rpc->sm, RPC_ERROR);
return;
}

follower_tick(follower, M_MSG_SENT);
}

static bool is_a_trigger_leader(const struct leader *leader, const struct raft_message *incoming)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it seem less noisy? What do you think about such a style?

static bool is_a_trigger_leader(const struct leader *leader, const struct raft_message *incoming)
{
        int state = sm_state(&leader->sm);

	return
		ergo(incoming == M_WORK_DONE,
		     IN(state, (LS_HT_WAIT, LS_PAGE_SENT,
				LS_PERSISTED_SIG_PART, LS_PAGE_READ))) &&
		ergo(incoming == M_MSG_SENT,
		     sm_state(&leader->rpc.sm) != RPC_FILLED &&
		     IN(state, (LS_PAGE_READ, LS_SNAP_DONE, LS_F_NEEDS_SNAP,
				LS_REQ_SIG_LOOP, LS_CHECK_F_HAS_SIGS))) &&
		ergo(incoming == M_TIMEOUT,
		     IN(state, (LS_PAGE_READ, LS_SNAP_DONE, LS_F_NEEDS_SNAP,
				LS_REQ_SIG_LOOP, LS_CHECK_F_HAS_SIGS,
				LS_WAIT_SIGS))) &&
		ergo(state == LS_F_ONLINE,
		     incoming->type == RAFT_IO_APPEND_ENTRIES_RESULT) &&
		ergo(sm_state(&leader->rpc.sm) != RPC_SENT, false) &&
		ergo(state == LS_PAGE_READ,
		     IN(incoming->type, (RAFT_IO_INSTALL_SNAPSHOT_CP_RESULT
					 RAFT_IO_INSTALL_SNAPSHOT_MV_RESULT))) &&
		ergo(IN(state, (LS_SNAP_DONE, LS_F_NEEDS_SNAP)),
		     incoming->type == RAFT_IO_INSTALL_SNAPSHOT_RESULT) &&
		ergo(IN(state, (LS_REQ_SIG_LOOP, LS_CHECK_F_HAS_SIGS)),
		     incoming->type == RAFT_IO_SIGNATURE_RESULT);
}

cc @cole-miller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going for this style I would prefer to have two distinct blocks because in the first three ergos the pointer is not valid M_WORK_DONE, M_MSG_SENT and M_TIMEOUT are equal to 1, 2 and 3, which means that doing incoming->type will crash the program.

My fear is that when we come back and do changes in the future we miss that bit about the ordering because it is not explicit.

Copy link
Contributor

@cole-miller cole-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments after reading through. This PR is really well-organized, nice work!

test/raft/unit/test_snapshot.c Show resolved Hide resolved
test/raft/unit/test_snapshot.c Outdated Show resolved Hide resolved
src/raft/recv_install_snapshot.h Show resolved Hide resolved
src/raft/recv_install_snapshot.h Show resolved Hide resolved
src/raft/recv_install_snapshot.h Show resolved Hide resolved
@cole-miller cole-miller self-requested a review July 25, 2024 17:52
src/lib/threadpool.c Outdated Show resolved Hide resolved
src/lib/threadpool.c Outdated Show resolved Hide resolved
src/lib/threadpool.c Outdated Show resolved Hide resolved
src/lib/threadpool.c Outdated Show resolved Hide resolved
@cole-miller cole-miller self-requested a review July 30, 2024 14:19
Copy link
Contributor

@cole-miller cole-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is coming along nicely!

@letFunny letFunny force-pushed the snapshot-pool-integration branch from 1fc2144 to 2ce00ea Compare July 31, 2024 15:14
@letFunny letFunny force-pushed the snapshot-pool-integration branch from 2ce00ea to defc3f3 Compare July 31, 2024 15:51
@cole-miller cole-miller merged commit 05ae53f into canonical:master Jul 31, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants