Serialize data more efficiently

The WAL / disk easily becomes the bottleneck for a RA system, especially on limited cloud-based environments where disks have limits on ops per second and MB per second. Hence it may be worthwhile to try to reduce the size of the data going to disk in the WAL and disks segments.

https://github.com/rabbitmq/ra/pull/186 allows the serialisation function to be pluggable. This was mostly done to try out `term_to_iovec/1` instead of `term_to_binary/1` - experiments showed minimal or no benefit, mostly due to the hard coded settings inside erlang OTP around how many buffers a vectored write can use as a maximum (64) resulting in excessive syscalls.

So what else can we do? we can try to reduce the "fixed" (ish) overhead of each serialised term. Especially for RabbitMQ there will be multiple occurrances of the same atoms: `undefined`, `basic_message` etc which each get serialised as string data in the binary representation. If we could provide a serialisation function that used an atom cache to replace any atoms with an integer index (like the distribution layer does) then we may be able to reduce the fixed disk overhead. How much depends on the workload but it may be enough to have a significant benefit.

The first task would be to write and validate `term_to_binary/1` and `binary_to_term/1` in pure erlang.

When that is done and we have some idea of the performance hit they can then be extended using an atom cache.

See: https://erlang.org/doc/apps/erts/erl_ext_dist.html for reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serialize data more efficiently #199

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Serialize data more efficiently #199

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions