indexer-alt: Alternative indexer architecture POC #19868

amnn · 2024-10-15T20:32:56Z

Description

A ground up rewrite/proof-of-concept of an alternative indexer pipeline, with the following major changes implemented, or planned:

Ingestion

Instrumented to count the number of bytes, checkpoints, transactions, events, objects read and their latencies.
Handle transient errors with back-off and checkpoints not found with a constant delay and retry.
Don't impose a hard timeout as it will only lead to wasted work (drop partially fetched data and refetch).
Hardcoded the remote store as the only ingestion source for now.
No longer writes anything to disk (no watermarking, no GC) -- does tracking of progress locally, and requires help from whatever is running it to know where to pick back up from.
Re-uses futures::stream and tokio channels to handle communication and parallelism between ingestion processing and commit workers. This simplifies and decouples a lot of the implementation.
This also means there's no more reducer functionality -- with each producer having its own committer, this was not necessary.
Tested using a mock HTTP server.

On my local machine, the new ingestion framework very slightly out-performs the existing one: 160 checkpoints per second at checkpoint 64M, and 60 to 70 cps at checkpoint 9M compared to 130 and 50 cps respectively on the existing framework. The new framework is limited by my local bandwidth.

Writer

Again, lots of instrumentation: Every pipeline automatically instruments the number of checkpoints it's processed, how many rows it's created, how long that took, how long it took to write them out, how many retries it took, etc.
Writes to each table are split into their own concurrent pipelines.
Pipelines commit data for checkpoints out-of-order and rely on keeping the watermark table up-to-date to maintain consistency for readers.
Uniform framework for defining pipelines, where each pipeline just describes how it processes, how it commits to the DB, and how to tune the concurrency parameters for its needs.
Avoid cloning data as much as possible.
No more partition managing -- everything that was partitioned before is destined for the KV store, so we can do away with partitioning for now.

Schema

Remove any redundancies due to backwards compatibility etc.
Make naming conventions etc consistent across the codebase.
Fix all the schema changes we wanted to fix.
Avoid redundant writes: The object contents do not need to be written to objects, objects_snapshot, objects_history and full_objects_history.
Try as much as possible to separate tables that are overwritten and tables that are appended to, in particular, split up the objects table into a separate objects (live object IDs, versions and owners), objects_type (append-only) and coins (only changes when coins are involved) tables to avoid marking more rows dead than we need to.

To be done in the near term

Experiments that this POC now unlocks:

Setting up dashboards to visualise all the data coming from the indexer.
Run the indexer in production and tune its parameters, to see how it compares with the existing pipeline, in the best case.
Investigate whether @lxfind's finding around parallelism on writes to the DB holds up in this new architecture (in the old architecture 5 parallel writers yielded the best performance, does that still hold true in the new architecture? Currently it only has one writer per table).
COPY ... FROM -- can we use this instead of INSERT to speed up pipelines that write to "append-only" tables?
Splitting out key-value tables into their own DB.

To be done in the medium term

If this seems like a promising architecture, there is still much that needs to be done to make it a full indexer:

Pruning and watermarking
Starting from formal snapshot
Compatibility checks and running migrations
Integration into our testing setup
Integration into our existing readers
Completing the schema
Handing over between reading checkpoints from the remote store and from a fullnode, based on how close the indexer is to the tip of the network.

Test plan

Lots of experiments to do (and some new unit tests)

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

vercel · 2024-10-15T20:33:01Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 18, 2024 10:42am

3 Skipped Deployments

Name	Status	Preview	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview	Oct 18, 2024 10:42am
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview	Oct 18, 2024 10:42am
sui-typescript-docs	⬜️ Ignored (Inspect)	Visit Preview	Oct 18, 2024 10:42am

crates/sui-indexer-alt/src/handlers/mod.rs

crates/sui-indexer/src/backfill/backfill_instances/ingestion_backfills/digest_task.rs

crates/sui-indexer-alt/src/handlers/kv_transactions.rs

lxfind

Also curious about where do you think is the best place to instrument and log TPS in this implementation.

crates/sui-indexer-alt/migrations/2024-10-15-170316_transactions/up.sql

amnn · 2024-10-16T10:01:58Z

Also curious about where do you think is the best place to instrument and log TPS in this implementation.

My thinking is that you should already be able to extract rates of transaction ingestion/processing/commit from the existing metrics:

The ingestion service tracks the number of transactions it sees coming in.
Each pipeline tracks the number of rows it produces from processing, and how many have been committed.

So for pipelines whose progress is measured in transactions, it should come out naturally from looking at the commit rate for their rows. For pipelines that are not inherently transaction-based, then it makes slightly less sense to track transactions, but we can still do it, and I would do it by instrumenting the watermarking flow (i.e. each pipeline includes a gauge for its checkpoint and transaction high watermark). I guess this is why you were interested in the watermarking TODO?

crates/sui-indexer-alt/src/metrics.rs

crates/sui-indexer-alt/src/ingestion/mod.rs

crates/sui-indexer-alt/src/handlers/mod.rs

crates/sui-indexer-alt/src/handlers/tx_balance_changes.rs

crates/sui-indexer-alt/src/handlers/mod.rs

gegaowp · 2024-10-17T15:39:34Z

crates/sui-indexer-alt/src/handlers/mod.rs

+
+                        let indexed = entry.get_mut();
+                        let values = &mut indexed.values;
+                        if batch_values.len() + values.len() > H::CHUNK_SIZE {


today indexer changes commit data batch also for special cases when

epoch boundary to split them so that each batch is within one epoch, one reason to do that is to avoid moving data among partitions, it's another complexity that partition introduced, as we discussed I will experiment and see if we can get rid of that; another potential reason might be the epoch end state is needed for some governance api, do you know if we still have some left dependency?

intentional lag for objects_snapshot, is it handled?

Let's leave objects_snapshot aside, because the way I would solve that is by adding an explicit buffer task on an existing objects pipeline, and partitioning, because all the tables this pipeline implementation caters to are either KV tables which won't be in postgres in the long run or tables that aren't currently dynamically partitioned, but tell me more about the requirement related to the Governance API, because I don't have enough context to answer that, but it may be an interesting exercise to see how it fits into this architecture.

okay we can leave partition and objects_snapshot aside for now, @emmazzz prob knows more about Governance API.

crates/sui-indexer-alt/src/handlers/mod.rs

wlmyng

first pass

crates/sui-indexer-alt/src/lib.rs

wlmyng · 2024-10-17T16:07:19Z

crates/sui-indexer-alt/src/lib.rs

+        // If an override has been provided, start ingestion from there, otherwise start ingestion
+        // from just after the lowest committer watermark across all enabled pipelines.
+        let first_checkpoint = self
+            .first_checkpoint
+            .unwrap_or(self.first_checkpoint_from_watermark);


Do we (have we) actually do this rn? We do set the watermark on adding a new pipeline, but Indexer's first_checkpoint_from_watermark doesn't seem to be updated per the min committer watermark across all pipelines

Indexer's first_checkpoint_from_watermark doesn't seem to be updated per the min committer watermark across all pipelines

What about L146-149 above?

crates/sui-indexer-alt/src/models/watermarks.rs

crates/sui-indexer/src/backfill/backfill_instances/ingestion_backfills/digest_task.rs

gegaowp

looks good to me to merge as the first pr with the planned changes as followups.

## Description Adding tests fo the ingestion service. In particular, how it handles various error scenarios, back-pressure, buffering, etc. ## Test plan ``` sui$ cargo nextest run -p sui-indexer-alt ```

## Description Assume that whatever store checkpoint data is being fetched from is correct. If it was not possible to get a full response back from the store, or that response could not be successfully deserialized, back-off and try again. ## Test plan New unit test: ``` sui$ cargo nextest run -p sui-indexer-alt -- retry_on_deserialization_error ```

## Description Add a label to the transient retries counter to mark the reason why we're retrying, in case there are trends in particular retries.

## Description Add a helper function to log retrying a fetched checkpoint. Updates a counter, and traces the error and reason.

## Description Recover from failures to send a request to the remote store in the first place (Initially, I mistakenly thought that `send` would only fail if the request was malformed, and that we would otherwise get a Response of some kind, but this is not true). ## Test plan New unit test: ``` sui$ cargo nextest run -p sui-indexer-alt -- retry_on_reuest_error ```

## Description If ingestion is stuck talking to a server that is repeatedly producing transient errors, it can get stuck. This change gives us a way to tell the ingestion service to give up from outside. ## Test plan New unit tests: ``` sui$ cargo nextest run -p sui-indexer-alt -- fail_on_cancel ```

## Description Initial set-up integrating a DB connection pool into the indexer. ## Test plan Run the indexer and check the stats from the DB pool are propogated to prometheus.

## Description Handle the case where one part of the indexer panics and we need to cleanly shutdown. ## Test plan Introduced a panic in the dummy ingester and make sure everything winds down correctly.

## Description Introduce a framework for writing indexing pipelines and use it to add a pipeline for checkpoints. ## Test plan Tested the pipeline locally.

## Description A pipeline to fill what was previously `full_objects_history`. ## Test plan Pipeline was run locally.

## Description Port ove an idealised KV transactions table: - Removes the following fields: - `transaction_digest` - `transaction_kind` - `success_command_count` - Standardise naming for `tx_sequence_number` and `cp_sequence_number`. - Standardise representation of arrays of BCS encoded values (use a BCS-encoded array instead). So that in the end, the schema matches the schema we would eventually store in the KV sore. This PR also re-implements balance change calculation (rather than relying on the implementation in `sui-json-rpc`). ## Test plan Pipeline run locally.

## Description Introduce an `Indexer` type to reduce the boilerplate of introducing a new pipeline, and gracefully shutting everything down. This also prepares the codebase for watermarks support: The initial checkpoint to start ingestion at can be calculated by taking the MIN checkpoint in the watermark table across all pipelines. ## Test plan Unit tests: ``` sui$ cargo nextest run -p sui-indexer-alt ``` And run the indexer locally.

## Description Introduce the watermarks table, and update the committer to update it, taking into account out-of-order writes to the underlying tables. This change also allows the indexer to pick up where it left off by consulting the watermarks table for the high watermarks it had previously written. It also introduces the ability to configure the range of checkpoints being indexed (set an upper and lowerbound). The metrics introduced by this change can be used to track per-pipeline checkpoint rate, epoch rate and transaction rate. Some TODOs have also been added for various tests and experiments that should be run based on this work. ## Test plan A lot of manual testing. Some TODOs have been left for more extensive testing once there is more of a test set-up (depends on tempdb and auto-migration work).

## Description This makes it so that calls to `diesel setup` or `diesel migration run` will not forget to add the copyright notice to the schema file. ## Test plan Run the following: ``` sui-indexer-alt$ diesel migration run --database-url="postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" --migration-dir migrations ``` Notice that the schema doesn't change (the copyright notice is not remove)

## Description Add the sequence number for the checkpoint that the pipeline processed to the tracing message, so it's easier to follow along. ## Test plan Run the indexer locally.

## Description Previously, the service would just hang after having wound down all pipelines during a graceful shutdown. This is because the graceful shutdown process was left waiting for an explicit interrupt signal to unblock the whole shutdown process. This change relaxes that constraint: If the service is already shutting down, then there is no need to wait for the explicit Ctrl-C. This makes it feasible to run the indexer for specific ranges of checkpoints. Note that we don't want to call the global cancellation token once ingestion is done, because that might stop a downstream task before it has acted on the downloaded checkpoints. Instead, the ingestion service communicates completion by closing its channel, which propagates through the handlers and committers. ## Test plan ``` RUST_LOG="info,sui_indexer_alt=debug" cargo run -p sui-indexer-alt --release -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 1000 ```

## Description This makes it easier to perform selective backfills. ## Test plan ``` sui$ RUST_LOG="info" cargo run -p sui-indexer-alt --release -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 1000 \ --pipeline kv_transactions \ --pipeline tx_affected_objects ``` Note that for now, the system does clip off the end of the checkpoint stream, rather than waiting for all workers to finish, and that still needs to be fixed.

## Description When running the indexer on a finite range of checkpoints. Make sure commiters' buffers are completely empty before shutting down their task, otherwise we may not fully write out all the intended rows for the range of checkpoints provided (there may be some data left at the bottom of the barrel). ## Test plan Ran the following: ``` cargo run -p sui-indexer-alt --release -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 2000 ``` Corroborated that the data that results in the DB at the end: - Stops at the expected checkpoint (not before or after) - Matches counts of rows in the production mainnet DB for the equivalent tables at the same checkpoints. This can/should be made into an automated test, but that requires tempdb and migrations to be implemented (a comment has been added to this effect).

## Description Add a flag to skip writing to the watermark table. This would be useful when an indexer is running in parallel with another indexer and it's just not even productive to try writing to the watermarks table, because it will always be behind. ## Test plan Tested locally.

amnn requested review from lxfind, bmwill, emmazzz, ebmifa, phoenix-o, dmitri-perelman, gegaowp, wlmyng, stefan-mysten and chris-gorham October 15, 2024 20:32

amnn self-assigned this Oct 15, 2024

lxfind reviewed Oct 15, 2024

View reviewed changes

crates/sui-indexer-alt/src/handlers/mod.rs Outdated Show resolved Hide resolved

crates/sui-indexer-alt/src/handlers/mod.rs Show resolved Hide resolved

crates/sui-indexer-alt/src/handlers/mod.rs Show resolved Hide resolved

amnn commented Oct 15, 2024

View reviewed changes

crates/sui-indexer-alt/src/handlers/mod.rs Show resolved Hide resolved

amnn commented Oct 15, 2024

View reviewed changes

crates/sui-indexer/src/backfill/backfill_instances/ingestion_backfills/digest_task.rs Show resolved Hide resolved

vercel bot deployed to Preview – sui-docs October 16, 2024 00:17 View deployment

amnn commented Oct 16, 2024

View reviewed changes

crates/sui-indexer-alt/src/handlers/kv_transactions.rs Outdated Show resolved Hide resolved

lxfind reviewed Oct 16, 2024

View reviewed changes

crates/sui-indexer-alt/migrations/2024-10-15-170316_transactions/up.sql Show resolved Hide resolved

vercel bot deployed to Preview – sui-docs October 16, 2024 00:50 View deployment

amnn force-pushed the amnn/idx-poc branch from 35f36cf to e2c1096 Compare October 16, 2024 22:00

vercel bot deployed to Preview – sui-docs October 16, 2024 22:02 View deployment

emmazzz reviewed Oct 17, 2024

View reviewed changes

vercel bot deployed to Preview – sui-docs October 17, 2024 14:22 View deployment

vercel bot deployed to Preview – sui-docs October 17, 2024 14:29 View deployment

vercel bot deployed to Preview – sui-docs October 17, 2024 15:33 View deployment

gegaowp reviewed Oct 17, 2024

View reviewed changes

vercel bot deployed to Preview – sui-docs October 17, 2024 15:51 View deployment

wlmyng reviewed Oct 17, 2024

View reviewed changes

gegaowp approved these changes Oct 17, 2024

View reviewed changes

amnn added 26 commits October 18, 2024 11:41

chore(indexer-alt): factor out ingestion clients and errors

b24162a

indexer-alt: testing ingestion service

7c51fa0

## Description Adding tests fo the ingestion service. In particular, how it handles various error scenarios, back-pressure, buffering, etc. ## Test plan ``` sui$ cargo nextest run -p sui-indexer-alt ```

indexer-alt: tag retries with reason

e457e4c

## Description Add a label to the transient retries counter to mark the reason why we're retrying, in case there are trends in particular retries.

refactor(indexer-alt): helper to log retries

145b7e4

## Description Add a helper function to log retrying a fetched checkpoint. Updates a counter, and traces the error and reason.

indexer-alt: integration diesel

c31deb2

## Description Initial set-up integrating a DB connection pool into the indexer. ## Test plan Run the indexer and check the stats from the DB pool are propogated to prometheus.

indexer-alt: graceful shutdown on panic

b9784da

## Description Handle the case where one part of the indexer panics and we need to cleanly shutdown. ## Test plan Introduced a panic in the dummy ingester and make sure everything winds down correctly.

indexer-alt: indexing pipelines + checkpoint indexing

b18512f

## Description Introduce a framework for writing indexing pipelines and use it to add a pipeline for checkpoints. ## Test plan Tested the pipeline locally.

indexer-alt: kv objects pipeline

9ae168b

## Description A pipeline to fill what was previously `full_objects_history`. ## Test plan Pipeline was run locally.

indexer-alt: tx affected objects pipeline

42a37b4

indexer-alt: split out tx balance changes

9bbb7e4

chore(indexer-alt): unpluralize StoredTxAffectedObjects

0215f22

indexer-alt: Make tx_digest the key for kv_transactions

e46025c

chore(indexer-alt): Fix typos

e54e47f

easy(indexer-alt): Process checkpoint message mentions sequence number

d27a727

## Description Add the sequence number for the checkpoint that the pipeline processed to the tracing message, so it's easier to follow along. ## Test plan Run the indexer locally.

chore(indexer-alt): note potential experiments

9693b7a

amnn force-pushed the amnn/idx-poc branch from bd8721e to 9693b7a Compare October 18, 2024 10:41

vercel bot deployed to Preview – sui-docs October 18, 2024 10:42 View deployment

amnn merged commit 27595b4 into main Oct 18, 2024
50 checks passed

amnn deleted the amnn/idx-poc branch October 18, 2024 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexer-alt: Alternative indexer architecture POC #19868

indexer-alt: Alternative indexer architecture POC #19868

amnn commented Oct 15, 2024

vercel bot commented Oct 15, 2024 •

edited

Loading

lxfind left a comment

amnn commented Oct 16, 2024

gegaowp Oct 17, 2024

amnn Oct 17, 2024

gegaowp Oct 17, 2024

wlmyng left a comment

wlmyng Oct 17, 2024

amnn Oct 17, 2024

gegaowp left a comment

indexer-alt: Alternative indexer architecture POC #19868

indexer-alt: Alternative indexer architecture POC #19868

Conversation

amnn commented Oct 15, 2024

Description

Ingestion

Writer

Schema

To be done in the near term

To be done in the medium term

Test plan

Release notes

vercel bot commented Oct 15, 2024 • edited Loading

lxfind left a comment

Choose a reason for hiding this comment

amnn commented Oct 16, 2024

gegaowp Oct 17, 2024

Choose a reason for hiding this comment

amnn Oct 17, 2024

Choose a reason for hiding this comment

gegaowp Oct 17, 2024

Choose a reason for hiding this comment

wlmyng left a comment

Choose a reason for hiding this comment

wlmyng Oct 17, 2024

Choose a reason for hiding this comment

amnn Oct 17, 2024

Choose a reason for hiding this comment

gegaowp left a comment

Choose a reason for hiding this comment

vercel bot commented Oct 15, 2024 •

edited

Loading