Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads block page reuse for one extra transaction #830

Closed
mconst opened this issue Jul 24, 2024 · 2 comments · Fixed by #831
Closed

Reads block page reuse for one extra transaction #830

mconst opened this issue Jul 24, 2024 · 2 comments · Fixed by #831

Comments

@mconst
Copy link
Contributor

mconst commented Jul 24, 2024

In the following situation, the pages that tx 1 adds to the freed tree get processed by tx 2, as expected:

tx 1:
    adds page A to the freed tree
    commits durably
tx 2:
    commits durably, freeing page A

And if there's a live read of tx 0, that correctly blocks the pages from being freed, because they're still reachable from tx 0:

read begins (reading tx 0)
tx 1:
    adds page A to the freed tree
    commits durably
tx 2:
    commits durably, can't free page A

But a live read of tx 1 also blocks freeing, which I believe is incorrect:

tx 1:
    adds page A to the freed tree
    commits durably
read begins (reading tx 1)
tx 2:
    commits durably, should free page A but doesn't!

Page A isn't reachable from tx 1 -- that's why tx 1 added it to the freed tree! The last transaction it was reachable from is tx 0. So I think it should get freed here, as long as there are no live reads of tx 0 or earlier.

In fact, redb already relies on the fact that it's safe to free page A in this situation. In the first example above (with no live reads), consider what happens if we crash partway through committing tx 2: we'll have to roll back to the last durable commit, which was tx 1. So tx 1 needs to remain valid, just as if there were a live read on it! Effectively, there's always a live read on the last durable commit, since we could crash at any time and need to roll back to it.

In other words, the first example and the last example are equivalent. So if it's safe to free page A in the former case (which redb already does), it should also be safe to free it in the latter case.

By itself, fixing this won't save a significant amount of disk space, because of #829. But if we can fix #829 as well, then this'll cut the disk usage of a sequence of large transactions in half, in the common case where there are concurrent reads.

I believe the fix is trivial, and I'm happy to write it up if you want -- let me know if that would be useful!

@cberner
Copy link
Owner

cberner commented Jul 26, 2024

Nice find, and thanks for the report!

@mconst
Copy link
Contributor Author

mconst commented Jul 26, 2024

Awesome, thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants