-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexer-alt: Alternative indexer architecture POC #19868
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
crates/sui-indexer/src/backfill/backfill_instances/ingestion_backfills/digest_task.rs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also curious about where do you think is the best place to instrument and log TPS in this implementation.
My thinking is that you should already be able to extract rates of transaction ingestion/processing/commit from the existing metrics:
So for pipelines whose progress is measured in transactions, it should come out naturally from looking at the commit rate for their rows. For pipelines that are not inherently transaction-based, then it makes slightly less sense to track transactions, but we can still do it, and I would do it by instrumenting the watermarking flow (i.e. each pipeline includes a gauge for its checkpoint and transaction high watermark). I guess this is why you were interested in the watermarking TODO? |
|
||
let indexed = entry.get_mut(); | ||
let values = &mut indexed.values; | ||
if batch_values.len() + values.len() > H::CHUNK_SIZE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
today indexer changes commit data batch also for special cases when
- epoch boundary to split them so that each batch is within one epoch, one reason to do that is to avoid moving data among partitions, it's another complexity that partition introduced, as we discussed I will experiment and see if we can get rid of that; another potential reason might be the epoch end state is needed for some governance api, do you know if we still have some left dependency?
- intentional lag for objects_snapshot, is it handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave objects_snapshot
aside, because the way I would solve that is by adding an explicit buffer task on an existing objects
pipeline, and partitioning, because all the tables this pipeline implementation caters to are either KV tables which won't be in postgres in the long run or tables that aren't currently dynamically partitioned, but tell me more about the requirement related to the Governance API, because I don't have enough context to answer that, but it may be an interesting exercise to see how it fits into this architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay we can leave partition and objects_snapshot aside for now, @emmazzz prob knows more about Governance API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass
// If an override has been provided, start ingestion from there, otherwise start ingestion | ||
// from just after the lowest committer watermark across all enabled pipelines. | ||
let first_checkpoint = self | ||
.first_checkpoint | ||
.unwrap_or(self.first_checkpoint_from_watermark); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we (have we) actually do this rn? We do set the watermark on adding a new pipeline, but Indexer's first_checkpoint_from_watermark doesn't seem to be updated per the min committer watermark across all pipelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indexer's first_checkpoint_from_watermark doesn't seem to be updated per the min committer watermark across all pipelines
What about L146-149 above?
crates/sui-indexer/src/backfill/backfill_instances/ingestion_backfills/digest_task.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me to merge as the first pr with the planned changes as followups.
## Description Adding tests fo the ingestion service. In particular, how it handles various error scenarios, back-pressure, buffering, etc. ## Test plan ``` sui$ cargo nextest run -p sui-indexer-alt ```
## Description Assume that whatever store checkpoint data is being fetched from is correct. If it was not possible to get a full response back from the store, or that response could not be successfully deserialized, back-off and try again. ## Test plan New unit test: ``` sui$ cargo nextest run -p sui-indexer-alt -- retry_on_deserialization_error ```
## Description Add a label to the transient retries counter to mark the reason why we're retrying, in case there are trends in particular retries.
## Description Add a helper function to log retrying a fetched checkpoint. Updates a counter, and traces the error and reason.
## Description Recover from failures to send a request to the remote store in the first place (Initially, I mistakenly thought that `send` would only fail if the request was malformed, and that we would otherwise get a Response of some kind, but this is not true). ## Test plan New unit test: ``` sui$ cargo nextest run -p sui-indexer-alt -- retry_on_reuest_error ```
## Description If ingestion is stuck talking to a server that is repeatedly producing transient errors, it can get stuck. This change gives us a way to tell the ingestion service to give up from outside. ## Test plan New unit tests: ``` sui$ cargo nextest run -p sui-indexer-alt -- fail_on_cancel ```
## Description Initial set-up integrating a DB connection pool into the indexer. ## Test plan Run the indexer and check the stats from the DB pool are propogated to prometheus.
## Description Handle the case where one part of the indexer panics and we need to cleanly shutdown. ## Test plan Introduced a panic in the dummy ingester and make sure everything winds down correctly.
## Description Introduce a framework for writing indexing pipelines and use it to add a pipeline for checkpoints. ## Test plan Tested the pipeline locally.
## Description A pipeline to fill what was previously `full_objects_history`. ## Test plan Pipeline was run locally.
## Description Port ove an idealised KV transactions table: - Removes the following fields: - `transaction_digest` - `transaction_kind` - `success_command_count` - Standardise naming for `tx_sequence_number` and `cp_sequence_number`. - Standardise representation of arrays of BCS encoded values (use a BCS-encoded array instead). So that in the end, the schema matches the schema we would eventually store in the KV sore. This PR also re-implements balance change calculation (rather than relying on the implementation in `sui-json-rpc`). ## Test plan Pipeline run locally.
## Description Introduce an `Indexer` type to reduce the boilerplate of introducing a new pipeline, and gracefully shutting everything down. This also prepares the codebase for watermarks support: The initial checkpoint to start ingestion at can be calculated by taking the MIN checkpoint in the watermark table across all pipelines. ## Test plan Unit tests: ``` sui$ cargo nextest run -p sui-indexer-alt ``` And run the indexer locally.
## Description Introduce the watermarks table, and update the committer to update it, taking into account out-of-order writes to the underlying tables. This change also allows the indexer to pick up where it left off by consulting the watermarks table for the high watermarks it had previously written. It also introduces the ability to configure the range of checkpoints being indexed (set an upper and lowerbound). The metrics introduced by this change can be used to track per-pipeline checkpoint rate, epoch rate and transaction rate. Some TODOs have also been added for various tests and experiments that should be run based on this work. ## Test plan A lot of manual testing. Some TODOs have been left for more extensive testing once there is more of a test set-up (depends on tempdb and auto-migration work).
## Description This makes it so that calls to `diesel setup` or `diesel migration run` will not forget to add the copyright notice to the schema file. ## Test plan Run the following: ``` sui-indexer-alt$ diesel migration run --database-url="postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" --migration-dir migrations ``` Notice that the schema doesn't change (the copyright notice is not remove)
## Description Add the sequence number for the checkpoint that the pipeline processed to the tracing message, so it's easier to follow along. ## Test plan Run the indexer locally.
## Description Previously, the service would just hang after having wound down all pipelines during a graceful shutdown. This is because the graceful shutdown process was left waiting for an explicit interrupt signal to unblock the whole shutdown process. This change relaxes that constraint: If the service is already shutting down, then there is no need to wait for the explicit Ctrl-C. This makes it feasible to run the indexer for specific ranges of checkpoints. Note that we don't want to call the global cancellation token once ingestion is done, because that might stop a downstream task before it has acted on the downloaded checkpoints. Instead, the ingestion service communicates completion by closing its channel, which propagates through the handlers and committers. ## Test plan ``` RUST_LOG="info,sui_indexer_alt=debug" cargo run -p sui-indexer-alt --release -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 1000 ```
## Description This makes it easier to perform selective backfills. ## Test plan ``` sui$ RUST_LOG="info" cargo run -p sui-indexer-alt --release -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 1000 \ --pipeline kv_transactions \ --pipeline tx_affected_objects ``` Note that for now, the system does clip off the end of the checkpoint stream, rather than waiting for all workers to finish, and that still needs to be fixed.
## Description When running the indexer on a finite range of checkpoints. Make sure commiters' buffers are completely empty before shutting down their task, otherwise we may not fully write out all the intended rows for the range of checkpoints provided (there may be some data left at the bottom of the barrel). ## Test plan Ran the following: ``` cargo run -p sui-indexer-alt --release -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 2000 ``` Corroborated that the data that results in the DB at the end: - Stops at the expected checkpoint (not before or after) - Matches counts of rows in the production mainnet DB for the equivalent tables at the same checkpoints. This can/should be made into an automated test, but that requires tempdb and migrations to be implemented (a comment has been added to this effect).
## Description Add a flag to skip writing to the watermark table. This would be useful when an indexer is running in parallel with another indexer and it's just not even productive to try writing to the watermarks table, because it will always be behind. ## Test plan Tested locally.
Description
A ground up rewrite/proof-of-concept of an alternative indexer pipeline, with the following major changes implemented, or planned:
Ingestion
futures::stream
and tokio channels to handle communication and parallelism between ingestion processing and commit workers. This simplifies and decouples a lot of the implementation.On my local machine, the new ingestion framework very slightly out-performs the existing one: 160 checkpoints per second at checkpoint 64M, and 60 to 70 cps at checkpoint 9M compared to 130 and 50 cps respectively on the existing framework. The new framework is limited by my local bandwidth.
Writer
Schema
objects
,objects_snapshot
,objects_history
andfull_objects_history
.objects
table into a separateobjects
(live object IDs, versions and owners),objects_type
(append-only) andcoins
(only changes when coins are involved) tables to avoid marking more rows dead than we need to.To be done in the near term
Experiments that this POC now unlocks:
COPY ... FROM
-- can we use this instead ofINSERT
to speed up pipelines that write to "append-only" tables?To be done in the medium term
If this seems like a promising architecture, there is still much that needs to be done to make it a full indexer:
Test plan
Lots of experiments to do (and some new unit tests)
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.