Add an API for querying the last raft log entry #683

cole-miller · 2024-08-14T03:48:09Z

This enables better automation of dqlite_node_recover_ext by exposing enough information to determine which of several nodes has the most up-to-date log. See canonical/go-dqlite#299. Exposing this from libdqlite is the first step, then we can add a wrapper API in go-dqlite.

The implementation in this PR is as simple as calling the load method of the uv raft_io object. This is wasteful---it loads all snapshots and entries into memory---but that seems acceptable to me, since it's no more expensive than restarting the node normally and the use-case is an exceptional failure condition (loss of quorum).

Signed-off-by: Cole Miller [email protected]

codecov · 2024-08-14T03:57:04Z

Codecov Report

Attention: Patch coverage is 87.93103% with 7 lines in your changes missing coverage. Please review.

Project coverage is 77.89%. Comparing base (2d0d71b) to head (057725e).
Report is 21 commits behind head on master.

Files	Patch %	Lines
src/raft/raft.c	64.70%	1 Missing and 5 partials ⚠️
src/server.c	83.33%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #683      +/-   ##
==========================================
+ Coverage   77.87%   77.89%   +0.02%     
==========================================
  Files         197      197              
  Lines       27953    28066     +113     
  Branches     5534     5552      +18     
==========================================
+ Hits        21768    21863      +95     
- Misses       4346     4350       +4     
- Partials     1839     1853      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cole-miller · 2024-08-14T04:20:46Z

Rather than fix the external libraft CI job, I am going to block this on a separate PR to make --enable-build-raft the default and only option.

Other test failures seem to be nondeterministic and due to these numbers varying from test to test, which I need to dig into:

https://github.com/cole-miller/dqlite/blob/6546d543d339691c1ee1f185cbcd02aa4a6be1ba/test/integration/test_cluster.c#L124-L134

letFunny

Thanks for this Cole, just a couple of questions.

letFunny · 2024-08-22T08:40:18Z

test/integration/test_cluster.c

+	 * but we seem to get two of them pretty consistently in this test.) */
+	size_t fudge = n_records >= 2200 ? 6 :
+			n_records >= 993 ? 3 :
+			2;


This seems a bit brittle especially the part about the barriers which are non-deterministic which could lead to CI failures down the line. What about changing the assertion so that if n_records < 993 then, last_entry_index <= n_records + 3 && last_entry_index >= n_records? (or something similar, the idea is to use an interval and not a single value)

I've modified this to use the calculated number just as a lower bound. Feels to me like writing something like last_entry_index <= n_records + CONSTANT is not much better than picking an exact value because we don't have an explanation for the specific CONSTANT; I would rather just have the assertion reflect our belief about the code i.e. "some number of barriers, we don't know how many".

letFunny · 2024-08-22T08:43:39Z

src/raft/raft.c

+	if (rv != 0) {
+		return rv;
+	}
+	POST(ERGO(n_entries == 0, snapshot != NULL));


This is going to crash when raft_io_describe_last_entry is called and there are no entries in the raft log + snapshots, correct? If that is the case, should this be a crash or an empty return?

Good point, thanks, this could potentially be used with the data directory of a node that never got the chance to write a log entry to disk (when bootstrapping we always write the entry with the initial configuration, but joiner nodes can start with nothing at all). In this case we should probably return (0, 0) since that will compare less than the tuple for any node that does have entries on disk.

Fixed in the latest commit.

letFunny

Forgot to publish one comment :P

just-now

Looks good.

src/raft/raft.c

src/server.c

cole-miller · 2024-08-23T05:21:17Z

Updated to reflect all review comments plus I have addded a commit to fix #689 to avoid the friction of a separate PR. If we decide that the behavior for one-node elections should stay the same I can update the assertions in this PR accordingly. (Backed out the election change, will submit a separate PR per discussion offline.)

test/integration/test_cluster.c

This enables better automation of dqlite_node_recover_ext by exposing enough information to determine which of several nodes has the most up-to-date log. Signed-off-by: Cole Miller <[email protected]>

Signed-off-by: Cole Miller <[email protected]>

We should gracefully handle the case where there are no snapshots and no entries on disk. Signed-off-by: Cole Miller <[email protected]>

letFunny

Thanks Cole, I reviewed the last three commits and everything is there. I only added one comment to set an upper bound for the assertions with the extra records. Also, I remember we talked about the term number not increasing, is that what is happening here? (only one node, term is not increased)

test/integration/test_cluster.c

cole-miller · 2024-08-26T14:53:44Z

Also, I remember we talked about the term number not increasing, is that what is happening here? (only one node, term is not increased)

Yep.

Signed-off-by: Cole Miller <[email protected]>

cole-miller changed the title ~~Most up to date~~ Add an API for querying the last raft log entry Aug 14, 2024

cole-miller requested a review from just-now August 14, 2024 03:59

cole-miller force-pushed the most-up-to-date branch 2 times, most recently from 47b7ea2 to 6546d54 Compare August 14, 2024 04:17

cole-miller force-pushed the most-up-to-date branch from d1f09c8 to 1b9e443 Compare August 21, 2024 17:11

letFunny reviewed Aug 22, 2024

View reviewed changes

just-now approved these changes Aug 22, 2024

View reviewed changes

src/raft/raft.c Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

cole-miller requested a review from letFunny August 23, 2024 05:21

cole-miller commented Aug 23, 2024

View reviewed changes

test/integration/test_cluster.c Show resolved Hide resolved

cole-miller added 6 commits August 23, 2024 09:37

Add raft and dqlite APIs for querying the last log entry

373ce3e

This enables better automation of dqlite_node_recover_ext by exposing enough information to determine which of several nodes has the most up-to-date log. Signed-off-by: Cole Miller <[email protected]>

Incorporate new dqlite API in an existing test

8b65b3e

Signed-off-by: Cole Miller <[email protected]>

Update test to make the number of barriers more deterministic

d7ad1aa

Signed-off-by: Cole Miller <[email protected]>

Address review comments

7875f3e

Signed-off-by: Cole Miller <[email protected]>

Update cluster test following review

144fd9f

Signed-off-by: Cole Miller <[email protected]>

Rectify, document, and test edge case of describe_last_entry

8f012e7

We should gracefully handle the case where there are no snapshots and no entries on disk. Signed-off-by: Cole Miller <[email protected]>

cole-miller force-pushed the most-up-to-date branch from 2115606 to 8f012e7 Compare August 23, 2024 13:45

letFunny approved these changes Aug 26, 2024

View reviewed changes

test/integration/test_cluster.c Outdated Show resolved Hide resolved

cole-miller added 4 commits August 26, 2024 11:09

Address review comment

0f6d5ad

Signed-off-by: Cole Miller <[email protected]>

Update new edge case test too

881bc4c

Signed-off-by: Cole Miller <[email protected]>

Add a shim for external raft build

642ad9a

Signed-off-by: Cole Miller <[email protected]>

Guard tests that rely on new API

057725e

Signed-off-by: Cole Miller <[email protected]>

cole-miller merged commit 5248296 into canonical:master Aug 26, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an API for querying the last raft log entry #683

Add an API for querying the last raft log entry #683

cole-miller commented Aug 14, 2024 •

edited

Loading

codecov bot commented Aug 14, 2024 •

edited

Loading

cole-miller commented Aug 14, 2024

letFunny left a comment

letFunny Aug 22, 2024

cole-miller Aug 23, 2024

letFunny Aug 22, 2024

cole-miller Aug 23, 2024

cole-miller Aug 23, 2024

letFunny left a comment

just-now left a comment

cole-miller commented Aug 23, 2024 •

edited

Loading

letFunny left a comment

cole-miller commented Aug 26, 2024

Add an API for querying the last raft log entry #683

Add an API for querying the last raft log entry #683

Conversation

cole-miller commented Aug 14, 2024 • edited Loading

codecov bot commented Aug 14, 2024 • edited Loading

Codecov Report

cole-miller commented Aug 14, 2024

letFunny left a comment

Choose a reason for hiding this comment

letFunny Aug 22, 2024

Choose a reason for hiding this comment

cole-miller Aug 23, 2024

Choose a reason for hiding this comment

letFunny Aug 22, 2024

Choose a reason for hiding this comment

cole-miller Aug 23, 2024

Choose a reason for hiding this comment

cole-miller Aug 23, 2024

Choose a reason for hiding this comment

letFunny left a comment

Choose a reason for hiding this comment

just-now left a comment

Choose a reason for hiding this comment

cole-miller commented Aug 23, 2024 • edited Loading

letFunny left a comment

Choose a reason for hiding this comment

cole-miller commented Aug 26, 2024

cole-miller commented Aug 14, 2024 •

edited

Loading

codecov bot commented Aug 14, 2024 •

edited

Loading

cole-miller commented Aug 23, 2024 •

edited

Loading