Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue-101, Design Documentation #116

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions docs/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
BRRO Compressor
--

BRRO Compressor is designed to compress WavBRRO formatted time-series data. It provides such compression algorithms as:
1. Auto
2. Noop
3. Fast Fourier Transform (FFT)
4. Polynomial
5. Constant
6. Idw

If compressor is set to **auto** then BRRO compressor will decide which of the algorithm it should use to compress
some special data chunk.

Compressor usage:

```
Usage: brro-compressor [OPTIONS] <INPUT>

Arguments:
<INPUT> input file

Options:
--compressor <COMPRESSOR>
Select a compressor, default is auto [default: auto] [possible values: auto, noop, fft, constant, polynomial, idw]
-e, --error <ERROR>
Sets the maximum allowed error for the compressed data, must be between 0 and 50. Default is 5 (5%). 0 is lossless compression 50 will do a median filter on the data. In between will pick optimize for the error [default: 5]
-u
Uncompresses the input file/directory
-c, --compression-selection-sample-level <COMPRESSION_SELECTION_SAMPLE_LEVEL>
Samples the input data instead of using all the data for selecting the optimal compressor. Only impacts speed, might or not increased compression ratio. For best results use 0 (default). Only works when compression = Auto. 0 will use all the data (slowest) 6 will sample 128 data points (fastest) [default: 0]
--verbose
Verbose output, dumps everysample in the input file (for compression) and in the ouput file (for decompression)
-h, --help
Print help
-V, --version
Print version
```

CSV Compressor
--

CSV Compressor allows compressing CSV formatted time-series data. It leverages BRRO Compressor functionalities to compress
data.

Compression flow:
1. Reads provided time-series data as a CSV
2. Transforms values of time-series data into WavBRRO and generate VSRI
3. Compresses achieved WavBRRO

Decompression flow:
1. Read compressed WavBRRO (we call it **bro**)
2. Decompresses data
3. Read VSRI and retrieves timestamps
4. Outputs time-series data as CSV

In the current state in only generates a **single** WavBRRO, BRO and VSRI files which contain time-series data.

CSV Compressor usage:

```
Usage: csv-compressor [OPTIONS] <INPUT>

Arguments:
<INPUT> Path to input

Options:
-o, --output <OUTPUT>
Defines where the result will be stored
-u
Defines if we should uncompress input
--no-compression

--output-vsri
Enables output of generated VSRI
--output-wavbrro
Enables output of generated WavBrro
--output-csv
Enable output result of decompression in CSV format
--compressor <COMPRESSOR>
Select a compressor, default is auto [default: auto] [possible values: auto, noop, fft, constant, polynomial, idw]
-e, --error <ERROR>
Sets the maximum allowed error for the compressed data, must be between 0 and 50. Default is 5 (5%). 0 is lossless compression 50 will do a median filter on the data. In between will pick optimize for the error [default: 5]
-c, --compression-selection-sample-level <COMPRESSION_SELECTION_SAMPLE_LEVEL>
Samples the input data instead of using all the data for selecting the optimal compressor. Only impacts speed, might or not increased compression ratio. For best results use 0 (default). Only works when compression = Auto. 0 will use all the data (slowest) 6 will sample 128 data points (fastest) [default: 0]
-h, --help
Print help
-V, --version
Print version
```

WavBRRO
--
WavBRRO lib crate contains an implementation of WavBRRO format. The format is a based on the WAV format to be used to
store raw time-series data.

For more details on WavBRRO you may follow [here](../wavbrro/README.md).

Vsri
--
Vsri lib crate contains an implementation of the VSRI (Very Small Rolo Index). The index is made for detection of gaps
in continuous data with the same sampling rate.

Each continuous segment of data will be mapped to a line using the formula y = mx + B plus the number of points in
the data series, where:
- m - Sampling rate
- b - Series initial point in time in [x,y]
- x - sample # in the data file, this is ALWAYS sequential. There are no holes in samples
- y - time

This way, discovering the segment number is solving the above equation for X if the time provided is bigger than
the initial point.

Index structure:
1. index_name: Name of the index file we are indexing
2. min_ts: the minimum TS available in this file
3. max_ts: the highest TS available in this file
4. vsri_segments: Description of each segment:
1. Sampling rate
2. initial sample position X0
3. initial sample timestamp Y0
4. Number of samples in the segment

Example of content of an index:

55745
59435
15,0,55745,166
15,166,58505,63

Where:

- 55745 - min_ts
- 59435 - max_ts
- 15,0,55745,166 - the first segment:
1. 15 - Sampling rate
2. 0 - initial sample position for the segment X0
3. 55745 - initial sample timestamp for the segment Y0
4. 166 - the number of sampels in the first segment
- 15,166,58505,63 - the second segment:
1. 15 - Sampling rate
2. 166 - initial sample position for the segment X0
3. 58505 - initial sample timestamp for the segment Y0
4. 63 - the number of sampels in the first segment

BRRO
--

For more details on BRRO, including what it is and the concept behind it, you can refer to this [paper](../paper/BRRO.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can also add information about the full flow of the compression-decompression and possible use cases

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some information on its concept, reading/writing flows for time-series data storing.

Loading