Reed-Solomon Erasure Code engine in pure Go.
More info in my blogs (in Chinese)
It's not the fastest version here, if you want to get the fastest one, please send me email:
more than 5GB/s per physics core
- Coding over in GF(2^8).
- Primitive Polynomial: x^8 + x^4 + x^3 + x^2 + 1 (0x1d)
It released by Klauspost ReedSolomon, with some optimizations/changes:
- Use Cauchy matrix as generator matrix. Vandermonde matrix need more operations for preserving the property that any square subset of rows is invertible
- There are a math tool(mathtools/gentables.go) for generator Primitive Polynomial and it's log table, exp table, multiply table, inverse table etc. We can get more info about how galois field work
- Use a single core to encode
- New Go version have added some new instruction, and some are what we need here. The byte sequence in asm files are changed to instructions now (unfortunately, I added some new bytes codes)
- Delete inverse matrix cache part, it’s a statistical fact that only 2-3% shards need to be repaired. And calculating invert matrix is very fast, so I don't think it will improve performance very much
- Instead of copying data, I use maps to save position of data. Reconstruction is as fast as encoding now
- AVX intrinsic instructions are not mixed with any of SSE instructions, so we don't need "VZEROUPPER" to avoid AVX-SSE Transition Penalties, it seems improve performance.
- Some of Golang's asm OP codes make me uncomfortable, especially the "MOVQ", so I use byte codes to operate the register lower part sometimes. (Thanks to asm2plan9s)
- I drop the "TEST in_data is empty or not" part in asm file
- No R8-R15 register in asm codes, because it need one more byte
- Only import Golang standard library
- ...
To get the package use the standard:
go get github.com/templexxx/reedsolomon
This section assumes you know the basics of Reed-Solomon encoding. A good start is this Backblaze blog post.
There are only two public function in the package: Encode, Reconst and NewMatrix
NewMatrix: return a [][]byte for encode and reconst
Encode : calculate parity of data shards;
Reconst: calculate data or parity from present shards;
Performance depends mainly on:
- number of parity shards
- number of cores of CPU (if you want to use parallel version)
- CPU instruction extension(AVX2 or SSSE3)
- unit size of calculation
- size of shards
- speed of memory(waste so much time on read/write mem, :D )
Example of performance on my MacBook 2014-mid(i5-4278U 2.6GHz 2 physical cores). The 128KB per shards. Single core work here(fast version):
Encode/Reconst | data+parity/data+parity(lost_data) | Speed (MB/S) |
---|---|---|
E | 10+4 | 5558.60 |
R | 10+4(1) | 19050.82 |
R | 10+4(2) | 9725.64 |
R | 10+4(3) | 6974.09 |
R | 10+4(4) | 5528.68 |