From f085b228c6af58cfb12878c8b06d6ff54af0d334 Mon Sep 17 00:00:00 2001 From: Vlad <13818348+walldiss@users.noreply.github.com> Date: Mon, 27 Jan 2025 18:07:27 +0100 Subject: [PATCH] refactor --- docs/adr/reconstruction.md | 427 ++++++++++++++++++++----------------- 1 file changed, 228 insertions(+), 199 deletions(-) diff --git a/docs/adr/reconstruction.md b/docs/adr/reconstruction.md index cc525f0041..95bbc1a199 100644 --- a/docs/adr/reconstruction.md +++ b/docs/adr/reconstruction.md @@ -1,6 +1,6 @@ ## Abstract -This document proposes a new block reconstruction protocol that addresses key bottlenecks in the current implementation, specifically targeting duplicate request reduction and bandwidth optimization. The protocol introduces a structured approach to data retrieval with explicit focus on network resource efficiency and scalability. +This document proposes a new block reconstruction protocol that addresses key bottlenecks in the current implementation, specifically targeting duplicate request reduction and bandwidth optimization. The protocol introduces a structured approach to data retrieval with an explicit focus on network resource efficiency and scalability. ## Motivation @@ -21,266 +21,295 @@ This proposal aims to implement an efficient reconstruction protocol that: - Supports network scaling across multiple dimensions - Maintains stability under varying network conditions -Engineering time. +#### Engineering Time -Initial draft aims to be optimised in terms of engineering efforts required for first iteration of implementation. Optimizations marked as optional can be implemented in subsequent updates based on network performance metrics. +The initial draft aims to be optimized in terms of engineering efforts required for the first iteration of implementation. Optimizations marked as optional can be implemented in subsequent updates based on network performance metrics. ## Specification -### Performance Requirements +### General Performance Requirements 1. Base Scenario Support: - - 32MB block size - - Minimum required light nodes for 32MB blocks - - Network of 50+ full nodes + - 32MB block size + - Minimum required light nodes for 32MB blocks + - Network of 50+ full nodes 2. Performance Metrics (in order of priority): - - System stability - - Reconstruction throughput (blocks/second) - - Per-block reconstruction time + - System stability + - Reconstruction throughput (blocks/second) + - Per-block reconstruction time 3. Scalability Dimensions: - - Block size scaling - - Node count scaling + - Block size scaling + - Node count scaling +### Protocol Flow - -### Protocol Flow - +Diagram below outlines the high-level flow of the proposed protocols. The detailed specifications are provided in the subsequent sections. Full flow diiagrams are available in the end of this document. ``` -1. Connection Establishment - Client Server - |---- Handshake (node type) ----->| - |<---- Handshake (node type) -----| - -2. Bitmap Subscription +1. Bitmap Subscription Client Server - |---- Subscribe to bitmap ------------->| - |<---- Initial bitmap ------------------| - |<---- Updates ------------------------| - |<---- End updates(full eds/max samples-| + |---- Subscribe to bitmap -------------->| + |<---- Initial bitmap -------------------| + |<---- Updates -------------------------| + |<---- End updates(full eds/max samples)-| -3. Data Request +2. Data Request Client Server |---- Request(bitmap) ----------->| |<---- [Samples + Proof] parts ---| ``` -### Bitmap Implementation - -The protocol utilizes Roaring Bitmaps for efficient bitmap operations and storage. Roaring Bitmaps provide several advantages for the reconstruction protocol: - -1. Efficient Operations - - Fast logical operations (AND, OR, XOR) - - Optimized for sparse and dense data - - Memory-efficient storage - -2. Implementation Benefits - - Native support for common bitmap operations - - Optimized serialization - - Efficient iteration over set bits - - Support for rank/select operations - -3. Performance Characteristics - - O(1) for most common operations - - Compressed storage format - - Efficient memory utilization - - Fast bitmap comparisons - - -### Core Protocol Components +## Core Components -#### 1. Connection Management +### 1. Reconstruction Process +There should be a global per-block coordinator process that will be responsible for managing the data request process. -Handshakes, Node Type Identification -- Node type must be declared during connection handshake by each of connection nodes -- Active connection state maintenance on full nodes. List should be maintained by node type -- Connection state should allow subscription for updated +1. Request Initiation may have multiple strategies: + - Immediate request of all missing samples upon bitmap receipt + - (Optional): Delayed start for bandwidth optimization + - Wait for X% peer responses + - Fixed time delay + - Complete EDS availability + - Pre-confirmation threshold + - Combination of conditions -#### 2. Sample Bitmap Protocol +2. Select which samples to request and from which peers -2.1. Bitmap Subscription Flow - -- Requested by block height -- Implements one-way stream for bitmap updates -- Full nodes: Stream until complete EDS bitmap -- Light nodes: Stream until max sampling amount reached -- Include end-of-subscription flag for easier debugging - -2.2. Update Requirements -- Minimum update frequency: every 10 seconds -- Server-side update triggers: - - Time-based: Every 5 seconds. Default in base implementation. - - Optional: Change-based with threshold -- Penalty system for delayed updates - -#### 3. Reconstruction State Management +The first iteration of the decision engine can be implemented as simply as possible to allow easier testing of other components and prove the concept. The base properties should be: +- Eliminate requests for duplicate data +- Do not request data that can be derived from other data. Request just enough data for successful reconstruction + +#### First Implementation: +1. Subscribe to bitmap updates +2. Handle bitmap updates. If any sample is not stored and not in progress, request it from a peer + - Keep track of in-progress requests in local state +3. Handle sample responses + - Verify proofs. If a proof is invalid, the peer should be penalized + - Store samples in local store and update local Have state + - Remove samples from in-progress bitmap + - Clean up information about sample coordinates from remote state to free up memory +4. If reconstruction is complete, clean up local state and shut down the reconstruction process + +#### Potential Optimizations +- Skip encoded derivable data in response by requesting from peers that have all shares from the same rows/columns +- Optimize proof sizes through range requests +- Optimize proof sizes through subtree proofs if adjacent subroots are stored +- Parallel request distribution to reduce network load on single peers +- Request from peers with smaller latency -3.1. Global State Components +### 2. State Management +1. Remote State will store information about peers that have samples for given coordinates. If it has full rows/columns, it will be stored in a separate list. ```go type RemoteState struct { coords [][]peers // Peer lists by coordinates - rows []peers // Peers with row data - cols []peers // Peers with column data + rows []peers // Peers with full row data + cols []peers // Peers with full column data available bitmap // Available samples bitmap } -// Basic peer list structure.Structure might be replaced later to implement better -//peer scoring mechanics +// Basic peer list structure. Structure might be replaced later to implement better +// peer scoring mechanics type peers []peer.ID ``` -3.2. State Query Interface +Query Interface Remote state: -- GetPeersWithSample(Coords) -> []peer.ID -- GetPeersWithAxis(axisIdx, axisType) -> []peer.ID -- Available() -> bitmap - -Local state: -- InProgress() -> bitmap. Tracks ongoing fetch sessions to prevent requesting duplicates -- Have() -> bitmap. Tracks - - - -#### 4. Data Request -There should be global per block coordinator process that will be responsible for managing the data request process. - -4.1. Request Initiation may have multiple strategies: -- Immediate request of all missing samples bitmap receipt -- Optional: Delayed start for bandwidth optimization - - Wait for X% peer responses - - Fixed time delay - - Complete EDS availability - - Pre-confirmation threshold - - Combination of conditions - -4.1.1 what and were (to request) biggest question. - -First iteration of decision engine can be done as simple as possible to allow easier testing of other components and prove the concept. The base properties should be: -- Eliminate requests for duplicate data -- Do not request data, that can be derived from other data. Request just enough data for successful reconstruction - - -Following potential optimization trategies -- Skip encoded derivable data in response by requesting from peers that have all shares from the same rows/columns -- Optimize proof sizes through range requests -- Optimize proof sizes through subtree proofs, if adjacent subroots are stored -- Parallel request distribution to reduce network load on single peers -- Request from peers with smaller latency +```go +func (s *RemoteState) GetPeersWithSample(coords []Coord) []peer.ID +func (s *RemoteState) GetPeersWithAxis(axisIdx int, axisType AxisType) []peer.ID +func (s *RemoteState) Available() bitmap +``` +Progress state: +```go +// Tracks ongoing fetch sessions to prevent requesting duplicates +func (s *ProgressState) InProgress() bitmap +// Tracks samples that are already stored locally to notify peers about it +func (s *ProgressState) Have() bitmap +``` -4.2. Sample Request Protocol +## Bitmap Protocol +### Client +- Client should send a request to subscribe to bitmap updates +- If the subscription gets closed or interrupted, client should re-subscribe -4.2.1. Request +#### Request +```protobuf +message SubscribeBitmapRequest { + uint64 height = 1; +} +``` -- Update InProgress bitmap before request initiation -- Use shrex for data retrieval. Send bitmap for data request. Shrex has built-in support for requesting rows and samplies, however bitmap-based data retrieval is more general and would be easier to support in future, because it allows requesting multiples pieces of data with single request instead of multiple requests -```go -type SampleRequest struct { - Samples *roaring.Bitmap +### Server +- Server implements a one-way stream for bitmap updates +- Server should send the first bitmap update immediately +- Next updates should be sent every 5 seconds + - (Optional): Server can send updates more frequently if there is a significant change in the bitmap +- Server should send an end-of-subscription flag when no more updates are expected + - Full nodes: stream until EDS is available on server + - Light nodes: stream until max sampling amount is reached + - (Optional): Light node can send a single response with bitmap and end flag upon successful sampling + +#### Response +``` +message BitmapUpdate { + Bitmap bitmap = 1; + bool is_end = 2; } ``` +The protocol utilizes Roaring Bitmaps for efficient bitmap operations and storage. Roaring Bitmaps provide several advantages for the reconstruction protocol: -4.2.2. Response -- (base version) Server should respond with samples with proofs +1. Efficient Operations + - Fast logical operations (AND, OR, XOR) + - Optimized for sparse and dense data + - Memory-efficient storage -Optiomisations: -- Server may split response into multiple parts -Each part contains packed samples in Range format. -- Each part of shoold have specified message prefix. It would allow client to identify sample container type, which would help to maintain backwards compatibility in case of breaking changes in packing algorithm -- Adjacent samples can be packed together to share common proofs -- If both Row and column are requested, intersections share can be sent once. +2. Implementation Benefits + - Native support for common bitmap operations + - Optimized serialization + - Efficient iteration over set bits + - Support for rank/select operations +3. Performance Characteristics + - O(1) for most common operations + - Compressed storage format + - Efficient memory utilization + - Fast bitmap comparisons + +The protocol will use 32-bit encoding for bitmaps to have greater multi-language support. Implementation can use one of the encoding-compatible libraries: +- Go: https://github.com/RoaringBitmap/roaring +- Rust: https://github.com/RoaringBitmap/roaring-rs +- C++: https://github.com/RoaringBitmap/CRoaring +- Java: https://github.com/RoaringBitmap/RoaringBitmap + +## Samples Request Protocol + +### Request + +- Use shrex for data retrieval +- Send bitmap for data request. Bitmap should contain coordinates for requested samples +```protobuf +message SampleRequest { + height uint64 = 1; + Bitmap bitmap = 2; +} +``` -4.3 Client response handling -- If response is timeout, retry request from other peers. Potentially penalize slow peer. -- Client should verify proofs. If proof is invalid, peer should be penalized -- Verified samples should stored and added to local state `Have` -- Clean up InProgress bitmap - - Clean up information from remote state to free up memory -- If reconstruction is complete clean up local state, shut down reconstruction process and all subscriptions +### Response +Server should respond with samples with proofs defined in shwap CIP [past link]. +```protobuf +message SamplesResponse { + repeated Sample samples = 1; +} +``` -### Server-Side Implementation +#### Optimizations: +- Adjacent samples can have common proofs. Server would need to send shares with common proof in a single response. + Each part contains packed samples in Range format +- If both Row and column are requested, intersection share can be sent once -#### 1. Storage Interface -New storage format needs to be implemented for efficient storage of sampels proofs. The format will be used +## Storage Backend +A new storage format needs to be implemented for efficient storage of sample proofs. The format will be initially used for storing ongoing reconstruction process and later can be used for light node storage. -1.1. Core Requirements +1. Core Requirements - Sample storage with proofs - Allow purge of proofs on successful reconstruction - Bitmap query support - Row/column access implementation - Accessor interface compliance -1.2. Optional Optimizations +2. Optional Optimizations - Bitmap subscription support for callbacks -- Efficient proof generation to reduce proofs size overhead - -#### 2. Request Processing - -2.1. Sample Request Handling -- Process bitmap-based requests -- Support multi-part responses -- Implement range-based packaging -- Optimize proof generation - -2.2. Response Optimization -- Common proof sharing -- Interval-based response packaging -- Share deduplication for intersections - - - -### Optimization Details - -1. Bandwidth Optimization - - Share common proofs across samples - - Package adjacent samples - - Optimize proof sizes for known data - - Implement efficient encoding schemes - -2. Request Distribution - - Load balancing across peers - - Geographic optimization - - Parallel request handling - - Adaptive timeout management - -3. State Management Optimization - - Efficient bitmap operations - - Memory-optimized state tracking - - Progressive cleanup of completed data - - Optimized proof verification - -## Rationale - -The design decisions in this proposal are driven by: -1. Need for efficient bandwidth utilization -2. Importance of stable reconstruction under varying conditions -3. Support for network scaling -4. Maintenance of security guarantees - - +- Efficient proof generation to reduce proof size overhead ## Backwards Compatibility -In lifespan of protocol it may requires a coordinated network upgrade. Implementation should allow for: +In the lifespan of the protocol, it may require a coordinated network upgrade. Implementation should allow for: 1. Version negotiation 2. Transition period support 3. Fallback mechanisms -## List of core components - - -1. Handshake protocol -2. Active connections management -3. Bitmap subscription protocol -4. Decision engine and state management -5. Client to request set of samples -6. Samples store -7. Samples server with packaging - - - +## List of Core Components + +1. Bitmap subscription protocol +2. Decision engine and state management +3. Client to request set of samples +4. Samples store +5. Samples server with packaging + + +## Full reconstruction process diagram +```mermaid +sequenceDiagram + participant N as New Node + participant RS as Remote State + participant RP as Reconstruction Processor + participant PS as Progress State + participant SS as Samples Store + participant P1 as Peer 1 + participant P2 as Peer 2 + participant P3 as Peer 3 + + Note over N,P3: Phase 1: Bitmap Discovery + N->>+P1: Subscribe to bitmap updates + N->>+P2: Subscribe to bitmap updates + N->>+P3: Subscribe to bitmap updates + + P1-->>-N: Initial bitmap + P2-->>-N: Initial bitmap + P3-->>-N: Initial bitmap + + N->>RS: Update remote state + + Note over N,P3: Phase 2: Reconstruction Process Start + N->>RP: Initialize reconstruction + activate RP + + loop Process Bitmaps + RP->>PS: Check progress state + RP->>RS: Query available samples + + Note over RP: Select optimal samples & peers + + par Request Samples + RP->>P1: GetSamples(bitmap subset 1) + RP->>P2: GetSamples(bitmap subset 2) + RP->>P3: GetSamples(bitmap subset 3) + end + + PS->>PS: Mark requests as in-progress + end + + Note over N,P3: Phase 3: Sample Processing + par Process Responses + P1-->>RP: Samples + Proofs 1 + P2-->>RP: Samples + Proofs 2 + P3-->>RP: Samples + Proofs 3 + end + + loop For each response + RP->>SS: Verify & store samples + RP->>PS: Update progress + RP->>RS: Update available samples + end + + Note over N,P3: Phase 4: Completion + opt Reconstruction Complete + RP->>SS: Finalize reconstruction + RP->>PS: Clear progress state + RP->>RS: Clear remote state + deactivate RP + end + + Note over N,P3: Phase 5: Continuous Updates + loop Until Complete + P1-->>N: Bitmap updates + P2-->>N: Bitmap updates + P3-->>N: Bitmap updates + N->>RS: Update remote state + end +``` \ No newline at end of file