Skip to content
This repository was archived by the owner on Oct 18, 2023. It is now read-only.

Commit 8bececd

Browse files
committed
sqld: deduplicate data stored in wallog
**!!! early draft, full of debug prints, barely works !!!** This draft contains experiments around deduplicating our own `wallog` format with libSQL WAL. The potential win here is reducing write and space amplification from 2x to around 1.08x. The main idea is as follows: `wallog` is only used to store frame metadata, and frame data is only stored either in the main database file, or in WAL. That's very simple to implement in a single-node system, but it gets complicated with replicas, because a replica is allowed to ask the primary for any arbitrary wallog frame. The rough idea for dealing with replicas is to: 1. Make sure that we control checkpoints. autocheckpoint is off, and we only issue a checkpoint operation on the primary ourselves, explicitly, and periodically. 2. All streaming of frames to replicas must finish before we issue a checkpoint operation. 3. We only checkpoint in TRUNCATE mode, i.e. a write lock is taken and the whole WAL log is rewritten to the main db file. That simplifies lots of edge (sic!) cases. 4. Once we checkpoint, we drop the previous `wallog`, and instead only store the following information. Let's assume that the main db file has N pages. Pages 1..N are now available as frames X..X+N in the `wallog`, and X is the oldest frame a replica should ever ask for -> anything before X is out-of-date anyway. If any replica asks for an earlier page, it gets an error message saying "please drop whatever you're doing and start asking for frames X or greater instead.
1 parent c63b220 commit 8bececd

File tree

3 files changed

+159
-63
lines changed

3 files changed

+159
-63
lines changed

sqld/src/replication/frame.rs

+60
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,66 @@ pub struct Frame {
3434
data: Bytes,
3535
}
3636

37+
#[repr(transparent)]
38+
#[derive(Clone, Copy, Debug, Zeroable, Pod)]
39+
// NOTICE: frame number 0 indicates that the frame is in the main db file.
40+
// Any other number indicates that it's in the WAL file.
41+
// We do not use an enum here in order to make this struct transparently
42+
// serializable for C code and on-disk representation.
43+
pub struct FrameLocation {
44+
pub frame_no: u32,
45+
}
46+
47+
impl FrameLocation {
48+
pub const IN_MAIN_DB_FILE: u32 = 0;
49+
50+
pub fn new(frame_no: u32) -> Self {
51+
Self { frame_no }
52+
}
53+
54+
pub fn in_wal_file(frame_no: u32) -> Self {
55+
assert_ne!(frame_no, FrameLocation::IN_MAIN_DB_FILE);
56+
Self { frame_no }
57+
}
58+
59+
pub fn in_main_db_file() -> Self {
60+
Self {
61+
frame_no: Self::IN_MAIN_DB_FILE,
62+
}
63+
}
64+
}
65+
66+
#[repr(C)]
67+
#[derive(Clone, Copy, Debug, Zeroable, Pod)]
68+
pub struct FrameRef {
69+
pub header: FrameHeader,
70+
pub location: FrameLocation,
71+
_pad: u32,
72+
}
73+
74+
impl FrameRef {
75+
pub const SIZE: usize = size_of::<Self>();
76+
77+
pub fn new(header: FrameHeader, location: FrameLocation) -> Self {
78+
Self {
79+
header,
80+
location,
81+
_pad: 0,
82+
}
83+
}
84+
85+
pub fn as_bytes(&self) -> Bytes {
86+
Bytes::copy_from_slice(bytes_of(self))
87+
}
88+
89+
pub fn try_from_bytes(data: Bytes) -> anyhow::Result<Self> {
90+
anyhow::ensure!(data.len() == Self::SIZE, "invalid frame size");
91+
try_from_bytes(&data)
92+
.copied()
93+
.map_err(|e| anyhow::anyhow!(e))
94+
}
95+
}
96+
3797
impl fmt::Debug for Frame {
3898
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
3999
f.debug_struct("Frame")

0 commit comments

Comments
 (0)