-
Notifications
You must be signed in to change notification settings - Fork 0
Variable Block Size
Currently the protocol block size is locked at 128 KiB. This is a good size for small to medium sized files, but for very large files the list of blocks becomes unwieldy by itself. The block list is something that load into memory fairly often for comparison between old and new and so on, so keeping the block list per file to a reasonable size is desirable. To achieve that we could use a variable block size.
-
The block size would be decided when a file is detected locally / scanned.
-
There should be a defined minimum block size (128 KiB, unchanged) and maximum block size (8 MiB or something?). Block sizes should probably be kept to powers of two, i.e. 128 KiB, 256 KiB, 512 KiB and so on.
-
The block size should be chosen so that the number of blocks is close to some ideal number per file, with some defined mechanism for rounding up and down so that two devices will agree on a block size when seeing the same file, in isolation.
-
The block size may change for an existing file. When a device receives new index data for a file with a different block size it will need to rehash the file with the new block size, then compute the differences and perform the sync as usual. The block size in use should be that for the newest version of a file. Conflict resolution and so on applies as usual, and the block size in use will be the one for the "winning" file.
-
The block size need not necessarily apply to the blocks transferred in Request/Response messages. In fact, we should probably stick to a smaller size there in order to avoid sending huge messages and the resulting issues with timeouts etc.
The FileInfo
gains a new BlockSize
field.
The following implements the suggested algorithm, selecting a block size optimizing for up to 2000 blocks per file.
func sizing(fileSize int) (blockSize, numBlocks int) {
const (
MinBlockSizeExp = 17 // 2^17 = 128 Kib minimum block size
MaxBlockSizeExp = 24 // 2^24 = 16 MiB maximum block size
DesiredBlocks = 2000 // Will aim for 0-2000 blocks per file
)
blockSizeExp := log2(fileSize / DesiredBlocks)
// Clip to min/max
if blockSizeExp < MinBlockSizeExp {
blockSizeExp = MinBlockSizeExp
} else if blockSizeExp > MaxBlockSizeExp {
blockSizeExp = MaxBlockSizeExp
}
// blockSize = 2^blockSizeExp
blockSize = 1 << blockSizeExp
// numBlocks = fileSize/blockSize rounded upwards
numBlocks = fileSize / blockSize
if fileSize%blockSize != 0 {
numBlocks++
}
return
}
func log2(n int) uint {
if n <= 0 {
return 0
}
for i := uint(1); i < 64; i++ {
if n>>i == 0 {
return i
}
}
return 64
}
Resulting example sizings:
File Size | Blocks | Block Size |
---|---|---|
1 KiB | 1 | 131072 |
1 MiB | 8 | 131072 |
499 MiB | 1996 | 262144 |
500 MiB | 1000 | 524288 |
501 MiB | 1002 | 524288 |
1 GiB | 1024 | 1048576 |
2 GiB | 1024 | 2097152 |
3 GiB | 1536 | 2097152 |
500 GiB | 32000 | 16777216 |