Skip to content

Variable Block Size

Jakob Borg edited this page Jul 20, 2015 · 5 revisions

Currently the protocol block size is locked at 128 KiB. This is a good size for small to medium sized files, but for very large files the list of blocks becomes unwieldy by itself. The block list is something that load into memory fairly often for comparison between old and new and so on, so keeping the block list per file to a reasonable size is desirable. To achieve that we could use a variable block size.

Constraints and Considerations

  • The block size would be decided when a file is detected locally / scanned.

  • There should be a defined minimum block size (128 KiB, unchanged) and maximum block size (8 MiB or something?). Block sizes should probably be kept to powers of two, i.e. 128 KiB, 256 KiB, 512 KiB and so on.

  • The block size should be chosen so that the number of blocks is close to some ideal number per file, with some defined mechanism for rounding up and down so that two devices will agree on a block size when seeing the same file, in isolation.

  • The block size may change for an existing file. When a device receives new index data for a file with a different block size it will need to rehash the file with the new block size, then compute the differences and perform the sync as usual. The block size in use should be that for the newest version of a file. Conflict resolution and so on applies as usual, and the block size in use will be the one for the "winning" file.

  • The block size need not necessarily apply to the blocks transferred in Request/Response messages. In fact, we should probably stick to a smaller size there in order to avoid sending huge messages and the resulting issues with timeouts etc.

Protocol Changes

The FileInfo gains a new BlockSize field.

Block Size Calculation

The following implements the suggested algorithm, selecting a block size optimizing for up to 2000 blocks per file.

func sizing(fileSize int) (blockSize, numBlocks int) {
	const (
		MinBlockSizeExp = 17   // 2^17 = 128 Kib minimum block size
		MaxBlockSizeExp = 24   // 2^24 = 16 MiB maximum block size
		DesiredBlocks   = 2000 // Will aim for 0-2000 blocks per file
	)

	blockSizeExp := log2(fileSize / DesiredBlocks)

	// Clip to min/max
	if blockSizeExp < MinBlockSizeExp {
		blockSizeExp = MinBlockSizeExp
	} else if blockSizeExp > MaxBlockSizeExp {
		blockSizeExp = MaxBlockSizeExp
	}

	// blockSize = 2^blockSizeExp
	blockSize = 1 << blockSizeExp

	// numBlocks = fileSize/blockSize rounded upwards
	numBlocks = fileSize / blockSize
	if fileSize%blockSize != 0 {
		numBlocks++
	}
	return
}

func log2(n int) uint {
	if n <= 0 {
		return 0
	}
	for i := uint(1); i < 64; i++ {
		if n>>i == 0 {
			return i
		}
	}
	return 64
}

Resulting example sizings:

File Size Blocks Block Size
1 KiB 1 131072
1 MiB 8 131072
499 MiB 1996 262144
500 MiB 1000 524288
501 MiB 1002 524288
1 GiB 1024 1048576
2 GiB 1024 2097152
3 GiB 1536 2097152
500 GiB 32000 16777216