docs: enhanced readme with a llot more info

semiotic-ai · Feb 15, 2024 · 31eb720 · 31eb720
1 parent 0db49fb
commit 31eb720
Show file tree

Hide file tree

Showing 3 changed files with 82 additions and 14 deletions.
diff --git a/Readme.md b/Readme.md
@@ -1,25 +1,87 @@
 # Flat files decoder for firehose
 
-## Usage
+[![CI status](https://github.com/semiotic-ai/flat-files-decoder/workflows/ci/badge.svg)][gh-ci]
+
+<!-- TODO: Seve please checkout if what I wrote makes sense -->
+this crate is designed to decompress and decode headers from [binary files, which are called flat files,](https://github.com/streamingfast/firehose-ethereum/blob/develop/proto/sf/ethereum/type/v2/type.proto) generated from Firehose. Flat files store all information necessary to reconstruct the transaction and receipt tries. It also checks the validity of 
+receipt roots and transaction roots present in the block headers by recalculating them via the block body data. Details of the implementation can be found [here](https://github.com/streamingfast/dbin?tab=readme-ov-file)
+
+This tool was first presented as a mean to enhance the performance and verifiability of The Graph protocol. However,
+it turns out it could be used as a solution for EIP-4444 problem of full nodes stopping to provide historical data over one year.
+The idea is that the flat files that this crate can decode could also be used as an archival format similar to era1 files, specially
+if they can be verified. 
+
+## Getting Started
 
 ### Prerequisites
-- Rust installed
-- Cargo installed
-- [protoc installed](https://grpc.io/docs/protoc-installation/)
+- [Rust (stable)](https://www.rust-lang.org/tools/install)
+- Cargo (Comes with Rust by default)
+- [protoc](https://grpc.io/docs/protoc-installation/)
 - Firehose dbin files to decode
   - An example file is provided `example0017686312.dbin`
 
-### Running
-- Run `cargo run --release` in the root directory of the project
-- The program will decode all files in the input_files directory
-  - It will verify the receipt root & transaction root matches the computed one for all blocks
+## Running
+
+### Commands
+
+The tool provides the following commands for various operations:
+
+- `stream`: Stream data continuously.
+- `decode`: Decode files from input to output.
+- `help`: Print this message or the help of the given subcommand(s).
 
 ### Options
-- `--input <path>`: Specify a directory or single file to read from (default: input_files)
-- `--output <path>`: Specify a directory to write output files to (if missing it will not write to disk)
 
-### Benchmarks
+You can use the following options with the commands for additional functionalities:
+
+- `-h, --help`: Print help information about specific command and options.
+- `-V, --version`: Print the version information of the tool.
+
+
+#### NOTICE: either streaming or reading from directory it will verify the receipt root & transaction root matches the computed one for all blocks
+
+## Usage Examples
+
+Here are some examples of how to use the commands:
+
+1. To stream data continuously from `stdin`:
+
+  ```bash
+  # simply turning on stream stdin reading
+  cargo run stream
+
+  # or from files into stdin
+  cat example0017686312.dbin | cargo run stream
+  ```
+
+This will output decoded header records as bytes into `stdout`
+
+2. To check a folder of dbin files:
+
+```bash
+cargo run decode --input ./input_files/
+```
+
+This will store the block headers as json format in the output folder. 
+By passing `--headers-dir` a folder of assumed valid block headers can be provided to compare
+with the input flat files. Valid headers can be pulled from the [sync committee subprotocol](https://github.com/ethereum/annotated-spec/blob/master/altair/sync-protocol.md) for post-merge data.
+
+<!-- TODO: once the header_accumulator is made public, link it here -->
+**NOTICE:**For pre-merge data another approach using [header accumulators](https://github.com/ethereum/portal-network-specs/blob/8ad5bc33cb0d4485d2eab73bf2decc43e7566a8f/history-network.md#the-header-accumulator) is necessary since
+sync committees will not provide these headers.
+
+## Goals
+<!-- TODO: Any other goals I should add? -->
+We hope that flat files decoder will be able to handle
+both post merge and pre merge data. Post-merge can be validated 
+using the Consensus Layer via the sync committee subprotocol. Pre-merge requires
+headers accumulators and another step besides decoding the flat files is necessary.
+
+## Benchmarking
 - Run `cargo bench` in the root directory of the project
 - Benchmark results will be output to the terminal
 - Benchmark time includes reading from disk & writing output to disk
 - Results can be found in `target/criterion/report/index.html`
+
+For proper benchmarking of future improvements, fixes and features please compare baselines.
+Refer to [the end of this section of Criterion documentation](https://bheisler.github.io/criterion.rs/book/user_guide/command_line_options.html) for more information on creating and comparing baselines.
diff --git a/src/lib.rs b/src/lib.rs
@@ -48,6 +48,7 @@ pub enum DecodeInput {
     Path(String),
     Reader(Box<dyn Read>),
 }
+
 /**
 * Decode & verify flat files from a directory or a single file.
 * Input can be a directory or a file.
@@ -210,9 +211,8 @@ pub fn extract_blocks<R: Read>(mut reader: R) -> Result<Vec<Block>, DecodeError>
 ///
 /// # Arguments
 ///
-/// * `end_block`: Header Accumulator solution is expensive. For blocks after the merge,
-/// Ethereum consensus should be used  in this scenario. This zis why the default block
-/// for this variable is the MERGE_BLOCK (block 15537393)
+/// * `end_block`: For blocks after the merge, Ethereum sync committee should be used. This is why the default block
+/// for this param is the MERGE_BLOCK (block 15537393)
 /// * `reader`: where bytes are read from
 /// * `writer`: where bytes written to
 pub async fn stream_blocks<R: Read, W: Write>(

diff --git a/src/main.rs b/src/main.rs
@@ -13,17 +13,23 @@ struct Cli {
 enum Commands {
     /// Stream data continuously
     Stream {
+        /// decompress .dibn files if they are compressed with zstd
         #[clap(short, long, default_value = "false")]
         decompress: bool,
+        /// the block to end streaming
         #[clap(short, long)]
         end_block: Option<usize>,
     },
     /// Decode files from input to output
     Decode {
+        /// input folder where flat files are stored
         #[clap(short, long)]
         input: String,
         #[clap(long)]
+        /// folder where valid headers are stored so decoded blocks can be validated against
+        /// their headers.
         headers_dir: Option<String>,
+        /// output folder where decoded headers will be stored as .json
         #[clap(short, long)]
         output: Option<String>,
     },