Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

Commit

Permalink
docs: enhanced readme with a llot more info
Browse files Browse the repository at this point in the history
  • Loading branch information
pedrohba1 committed Feb 15, 2024
1 parent 0db49fb commit 31eb720
Show file tree
Hide file tree
Showing 3 changed files with 82 additions and 14 deletions.
84 changes: 73 additions & 11 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,87 @@
# Flat files decoder for firehose

## Usage
[![CI status](https://github.com/semiotic-ai/flat-files-decoder/workflows/ci/badge.svg)][gh-ci]

<!-- TODO: Seve please checkout if what I wrote makes sense -->
this crate is designed to decompress and decode headers from [binary files, which are called flat files,](https://github.com/streamingfast/firehose-ethereum/blob/develop/proto/sf/ethereum/type/v2/type.proto) generated from Firehose. Flat files store all information necessary to reconstruct the transaction and receipt tries. It also checks the validity of
receipt roots and transaction roots present in the block headers by recalculating them via the block body data. Details of the implementation can be found [here](https://github.com/streamingfast/dbin?tab=readme-ov-file)

This tool was first presented as a mean to enhance the performance and verifiability of The Graph protocol. However,
it turns out it could be used as a solution for EIP-4444 problem of full nodes stopping to provide historical data over one year.
The idea is that the flat files that this crate can decode could also be used as an archival format similar to era1 files, specially
if they can be verified.

## Getting Started

### Prerequisites
- Rust installed
- Cargo installed
- [protoc installed](https://grpc.io/docs/protoc-installation/)
- [Rust (stable)](https://www.rust-lang.org/tools/install)
- Cargo (Comes with Rust by default)
- [protoc](https://grpc.io/docs/protoc-installation/)
- Firehose dbin files to decode
- An example file is provided `example0017686312.dbin`

### Running
- Run `cargo run --release` in the root directory of the project
- The program will decode all files in the input_files directory
- It will verify the receipt root & transaction root matches the computed one for all blocks
## Running

### Commands

The tool provides the following commands for various operations:

- `stream`: Stream data continuously.
- `decode`: Decode files from input to output.
- `help`: Print this message or the help of the given subcommand(s).

### Options
- `--input <path>`: Specify a directory or single file to read from (default: input_files)
- `--output <path>`: Specify a directory to write output files to (if missing it will not write to disk)

### Benchmarks
You can use the following options with the commands for additional functionalities:

- `-h, --help`: Print help information about specific command and options.
- `-V, --version`: Print the version information of the tool.


#### NOTICE: either streaming or reading from directory it will verify the receipt root & transaction root matches the computed one for all blocks

## Usage Examples

Here are some examples of how to use the commands:

1. To stream data continuously from `stdin`:

```bash
# simply turning on stream stdin reading
cargo run stream

# or from files into stdin
cat example0017686312.dbin | cargo run stream
```

This will output decoded header records as bytes into `stdout`

2. To check a folder of dbin files:

```bash
cargo run decode --input ./input_files/
```

This will store the block headers as json format in the output folder.
By passing `--headers-dir` a folder of assumed valid block headers can be provided to compare
with the input flat files. Valid headers can be pulled from the [sync committee subprotocol](https://github.com/ethereum/annotated-spec/blob/master/altair/sync-protocol.md) for post-merge data.

<!-- TODO: once the header_accumulator is made public, link it here -->
**NOTICE:**For pre-merge data another approach using [header accumulators](https://github.com/ethereum/portal-network-specs/blob/8ad5bc33cb0d4485d2eab73bf2decc43e7566a8f/history-network.md#the-header-accumulator) is necessary since
sync committees will not provide these headers.

## Goals
<!-- TODO: Any other goals I should add? -->
We hope that flat files decoder will be able to handle
both post merge and pre merge data. Post-merge can be validated
using the Consensus Layer via the sync committee subprotocol. Pre-merge requires
headers accumulators and another step besides decoding the flat files is necessary.

## Benchmarking
- Run `cargo bench` in the root directory of the project
- Benchmark results will be output to the terminal
- Benchmark time includes reading from disk & writing output to disk
- Results can be found in `target/criterion/report/index.html`

For proper benchmarking of future improvements, fixes and features please compare baselines.
Refer to [the end of this section of Criterion documentation](https://bheisler.github.io/criterion.rs/book/user_guide/command_line_options.html) for more information on creating and comparing baselines.
6 changes: 3 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ pub enum DecodeInput {
Path(String),
Reader(Box<dyn Read>),
}

/**
* Decode & verify flat files from a directory or a single file.
* Input can be a directory or a file.
Expand Down Expand Up @@ -210,9 +211,8 @@ pub fn extract_blocks<R: Read>(mut reader: R) -> Result<Vec<Block>, DecodeError>
///
/// # Arguments
///
/// * `end_block`: Header Accumulator solution is expensive. For blocks after the merge,
/// Ethereum consensus should be used in this scenario. This zis why the default block
/// for this variable is the MERGE_BLOCK (block 15537393)
/// * `end_block`: For blocks after the merge, Ethereum sync committee should be used. This is why the default block
/// for this param is the MERGE_BLOCK (block 15537393)
/// * `reader`: where bytes are read from
/// * `writer`: where bytes written to
pub async fn stream_blocks<R: Read, W: Write>(
Expand Down
6 changes: 6 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,23 @@ struct Cli {
enum Commands {
/// Stream data continuously
Stream {
/// decompress .dibn files if they are compressed with zstd
#[clap(short, long, default_value = "false")]
decompress: bool,
/// the block to end streaming
#[clap(short, long)]
end_block: Option<usize>,
},
/// Decode files from input to output
Decode {
/// input folder where flat files are stored
#[clap(short, long)]
input: String,
#[clap(long)]
/// folder where valid headers are stored so decoded blocks can be validated against
/// their headers.
headers_dir: Option<String>,
/// output folder where decoded headers will be stored as .json
#[clap(short, long)]
output: Option<String>,
},
Expand Down

0 comments on commit 31eb720

Please sign in to comment.