You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The WAL / disk easily becomes the bottleneck for a RA system, especially on limited cloud-based environments where disks have limits on ops per second and MB per second. Hence it may be worthwhile to try to reduce the size of the data going to disk in the WAL and disks segments.
#186 allows the serialisation function to be pluggable. This was mostly done to try out term_to_iovec/1 instead of term_to_binary/1 - experiments showed minimal or no benefit, mostly due to the hard coded settings inside erlang OTP around how many buffers a vectored write can use as a maximum (64) resulting in excessive syscalls.
So what else can we do? we can try to reduce the "fixed" (ish) overhead of each serialised term. Especially for RabbitMQ there will be multiple occurrances of the same atoms: undefined, basic_message etc which each get serialised as string data in the binary representation. If we could provide a serialisation function that used an atom cache to replace any atoms with an integer index (like the distribution layer does) then we may be able to reduce the fixed disk overhead. How much depends on the workload but it may be enough to have a significant benefit.
The first task would be to write and validate term_to_binary/1 and binary_to_term/1 in pure erlang.
When that is done and we have some idea of the performance hit they can then be extended using an atom cache.
The WAL / disk easily becomes the bottleneck for a RA system, especially on limited cloud-based environments where disks have limits on ops per second and MB per second. Hence it may be worthwhile to try to reduce the size of the data going to disk in the WAL and disks segments.
#186 allows the serialisation function to be pluggable. This was mostly done to try out
term_to_iovec/1
instead ofterm_to_binary/1
- experiments showed minimal or no benefit, mostly due to the hard coded settings inside erlang OTP around how many buffers a vectored write can use as a maximum (64) resulting in excessive syscalls.So what else can we do? we can try to reduce the "fixed" (ish) overhead of each serialised term. Especially for RabbitMQ there will be multiple occurrances of the same atoms:
undefined
,basic_message
etc which each get serialised as string data in the binary representation. If we could provide a serialisation function that used an atom cache to replace any atoms with an integer index (like the distribution layer does) then we may be able to reduce the fixed disk overhead. How much depends on the workload but it may be enough to have a significant benefit.The first task would be to write and validate
term_to_binary/1
andbinary_to_term/1
in pure erlang.When that is done and we have some idea of the performance hit they can then be extended using an atom cache.
See: https://erlang.org/doc/apps/erts/erl_ext_dist.html for reference.
The text was updated successfully, but these errors were encountered: