From 0a0a9e2b05e70db7c3a678b6af1198a0a12cc987 Mon Sep 17 00:00:00 2001 From: Christopher Haster Date: Wed, 30 Oct 2024 14:49:45 -0500 Subject: [PATCH] README.md - Expanded the 255-byte limit section Perhaps too much actually... The alternative field representations for higher-order finite-fields may be too much info when concatenating/ interleaving multiple codewords into one is realistically a better solution... Oh well, it's already written... --- README.md | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index cbf018e..53928f0 100644 --- a/README.md +++ b/README.md @@ -1436,11 +1436,50 @@ And some caveats: 2. Limited to 255 byte codewords - the non-zero elements of GF(256). + An important step in Reed-Solomon is mapping each possible error + location to a non-zero element of our finite field $X_j=g^j$. + Unfortunately our finite-field is, well, finite, so there's only so + many non-zero elements we can use before error-locations start to + alias. + + This gives us a maximum codeword size of 255 bytes in GF(256), + including the bytes used for ECC. A bit annoying, but math is math. + + In theory you can increase the maximum codeword size by using a larger + finite-field, but this gets a bit tricky because the log/pow table + approach used in ramrsbd stops being practical. 512 bytes of tables + for GF(256) is fine, but 128 KiBs of tables for GF(2^16)? Not so + much... + + 1. If you have [carryless-multiplication][clmul] hardware available, + GF(2^n) multiplication can be implemented efficiently by combining + multiplication and [Barret reduction][barret-reduction]. + + Division can then be implemented on top of multiplication by + leveraging the fact that $a^{n-3} = a^{-1}$ for any element $a$ in + GF(n). [Binary exponentiation][binary-exponentiation] can make this + somewhat efficient. + + 2. In the same way GF(256) is defined as an + [extension field][extension-field] of GF(2), we can define GF(2^16) + as an extension field of GF(256), where each element is a 2 byte + polynomial containing digits in GF(256). + + This can be convenient if you already need GF(256) tables for other + parts of the codebase. + + Or, a simpler alternative, you can just pack multiple "physical" + codewords into one "logical" codeword. + + You can even consider interleaving the physical codewords if you want + to maintain the systematic encoding or are trying to protect against + specific error patterns. + 3. Support for known-location "erasures" left as an exercise for the reader. All of the above math assumes we don't know the location of errors, - which is usually the case for block devices. + which is the most common case for block devices. But it turns out if we _do_ know the location of errors, via parity bits or some other side-channel, we can do quite a bit better. We