doc/decompressor.txt

Lets try to decompress [97, 98, 257].
We need to find a convenient way to simulate algorithm.

Read 97.
It means that first prefix is "a".
Remote dictionary applied code 257 for "ax" (x is unknown symbol).

Read 98.
It means that next prefix is "a".
So remote dictionary applied code 257 for "ab".
Remote dictionary applied code 258 for "bx".

Read 257.
It means that next prefix is "ab" (from dictionary).
So remote dictionary applied code 258 for "ba".

Destination is "abab".

-----

Now lets try to decompress [97, 98, 258].

97 and 98 means the same.

Read 258.
There is no 258 in dictionary.
We know that remote dictionary applied code 258 for "bx".
So destination looks like "abbx".
But it means that remote dictionary applied code 258 for "bb".

Destination is "abbb".
This is the most complex part of decompression algorithm.
We need to clarify it in general case.

-----

We have current prefix "a...b".
We know that remote dictionary applied code "c" for "a...bx".

Than we read this code "c" from source.
It means that destination looks like "a...ba...bx".
So "a...bx" is "a...ba".
So we have two identical strings that intersects by first symbol "a".

"...a...ba....."
         |
"........a...ba"

We read next code (last used code + 1) from source, it is not available in current dictionary.
In this case next prefix equals to current prefix + first symbol of current prefix.

1. "a(b)(bb)": current prefix code <= 255.
2. "...(a...b)(a...ba)": current prefix code > 257.
So this situation is possible for any prefix code.

-----

What operations are required for decompressor's dictionary?
1. Is code presented in dictionary?
2. Output code.
3. Store next code in dictionary by prefix code and first symbol of current code.

We don't need to find code by symbol sequence.

Compressor requires to maintain mapping between current code and next code.
The problem is that current code has many next codes.

Decompressor requires opposite mapping.
One current code has only one previous code.
It makes decompressor simple and it looks impossible to speed up it.
So we can use single implementation of dictionary.

-----

We need just one array for code mapping.
It can be named as "previous_codes".

Indexes of this array will be >= 257.
Values are previous codes including symbols.
Any code can be previous for other code except max code.

Max code (2 ** 16) - 1 cant be previous code.
So values will be between 0 and (2 ** 16) - 2.
Undefined previous code will be equal to (2 ** 16) - 1.

We don't need to initialize or clear it.
Algorithm will access only initialized codes.

-----

We need a separate array to map between code and last symbol.
It can be named as "last_symbol_by_codes".

Indexes of this array will be >= 257.

We don't need to initialize or clear it.
Algorithm will access only initialized symbols.

-----

We have to write output in an opposite direction.
It is possible to use another array as output buffer.
It can be named as "output_buffer".

Lets imagine we compressed "aa..." data.
All codes will be equal to prev code + "a" symbol.
"aa" - 257, "aaa" - 258, ... "a...a" ((2 ** 16) - 1 - 255) - ((2 ** 16) - 1).
So max length of buffer will be (2 ** 16) - 256 = (2 ** 16) - 257 + 1.

We don't need to initialize or clear it.
Algorithm will access only initialized bytes.

-----

For example [97, 98, 257, 259, 98] -> "abababab":

Imaginary dictionary:
ab - 257
ba - 258
aba - 259
abab - 260

Previous codes:
0 (257) -> 97
1 (258) -> 98
2 (259) -> 257
3 (260) -> 259

Last symbol by codes:
0 (257) -> "b"
1 (258) -> "a"
2 (259) -> "a"
3 (260) -> "b"

Output buffer (reversed):
257 -> ba
259 -> aba

-----

We don't need to clear dictionary.
Algorithm will access only initialized data in all arrays.

-----

How much memory will it consume?
Previous codes:       ((2 ** 16) - 257) * sizeof(code_t)
Last symbol by codes: (2 ** 16) - 257
Output buffer:        (2 ** 16) - 257 + 1
Total: ~ 216 KB

-----

PS If block mode is disabled we don't need clear code (256).
So we can move from 257 to 256.