Skip to content

Data Overview

Jack Forgash edited this page Sep 24, 2024 · 2 revisions

Data Overview

Source

The NEAR Foundation maintains a Lake Indexer [https://docs.near.org/concepts/advanced/near-lake-framework] that flushes blocks and shards files to a public S3 bucket. Instead of setting up a Streamline feed to request blocks, transactions, and receipts by block number and tx hash we sync with this bucket and copy the files over to Snowflake.

These files are copied to the dbt models into silver__streamline_blocks and silver__streamline_shards which are then flattened and transformed to all of our downstream data.

dbt Models

The dbt models can be considered in 3 stages.

LOAD AND EXTRACT RAW ELEMENTS

Stage 1 loads the raw data and builds intermediate tables of blocks , transactions and receipts .

BLOCKS

Blocks are the simplest workflow. The files blocks.json contain the header and block metadata that we need for the final fact_blocks table.

SHARDS

Shards contain high-level objects chunk and receipt_execution_outcome . Chunks contain transaction objects, with the transaction input and the primary receipt. Every transaction creates at least one receipt, which calls a method on a contract that may create subsequent receipts.

Each receipt then contains actions that are executed, logs that are emitted, and and the subsequent receipt_outcome_id .

MAP RECEIPTS TO PARENT TRANSACTIONS

Receipts are executed and processed asynchronously and included in the block in which the receipt sealed. Each receipt only contains a reference to any child receipts and the initializing transaction hash is not a part of the output. Thus, we must map the tree of all receipts down the chain of child receipts until each receipt as no receipt_outcome_id . This is executed in the model silver__streamline_receipts_final with the views in models/silver/streamline/helpers with a recursive ancestrytree.

The outcome of this is a receipts table with the transaction hash appended to each receipt record. We must have every receipt in the transaction execution to actually complete this mapping. If we have a gap, this will lead to null values in tx_hash downstream.

These are then modeled into silver__streamline_transactions_final which builds an object that contains all the receipts, calculates total gas burnt (which is a function of gas burnt by each receipt).

  • Note - the object is worth revisiting. Does anyone actually query the transactions table to access receipts? Probably not. The final model is still required to append block_timestamp from blocks and to aggregate total gas.

CURATED MODELS

The final stage is everything downstream of receipts. Curated models are largely dependent on actions executed (input) and logs (output). FunctionCall actions are base64 encoded method calls on deployed contracts and are where a majority of the input data for curated models lives.