Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain why this "Reed-Solomon' differs so much from storage ones #17

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,16 @@ I invite you to clone [reed-solomon-simd] and run your own benchmark:
$ cargo bench main
```

## Applications

This crate implements the Reed-Solomon codes used by distributed systems, and cryptography, but not the Reed-Solomon codes with error location and correction suitable for local storage.

Reed-Solomon codes have classically provided two functions, error location and error correction. Implementation involves matrix arithmetic or other techniques with complexity worse than `O(length * shards)`. As such, there are few shards in classical storage applications, and so they use small fields like `GF(2^8)`.

In cryptography and distributed systems, we often employs Lagrange polynomials aka Reed-Solomon for data distribution, but such uses need shards to be much larger, and they require larger fields like prime fields or `GF(2^16)`. In these cases, encoding and decoding could be accomplished with FFTs or additive FFTs plus special field representations, instead of matrix-like arithmetic. All this yields much faster codes with complexities like `O(length * log shards)`, but doing so sacrifices the error location and error correction capabilities.

It turns out this trade off makes sense though because our errors have an adversarial nature in cryptography and distributed systems, meaning if errors occur then they could easily overwhelm location or correction anyways. We can always detect the presence of errors using hashes of course, so these applications handle error detection to another layer of the protocol.

## Simple usage

1. Divide data into equal-sized original shards.
Expand Down