Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexer-alt: event indices #19926

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

indexer-alt: event indices #19926

wants to merge 3 commits into from

Conversation

amnn
Copy link
Member

@amnn amnn commented Oct 19, 2024

Description

This PR includes three changes (separated across three commits). The first one fixes some edge cases in the committer related to handling checkpoint streams that are either empty themselves, or produce no data to write, the second introduces the event index tables roughly as they are in the current schema, and the last commit suggests a simplification.

The main changes to the schema are as follows:

  • When filters correspond to different levels of nesting in a cascading type/module filter, they are all held in a single table, with all the cascading filter's fields, and separate indices are added to this table to deal with each level of nesting.
    • We need to confirm that the reader side can handle this change as well.
  • They do not include an event sequence number: They will be used to fetch the transaction that emitted the event in question, and to fetch the relevant events, we will need to scan through that transactions events (which will be bounded by the protocol limit on number of events emitted).
  • The struct instantiation table stores the instantiation information (name and type params) in two separate fields, with the type params stored as a BCS encoded array of TypeTag. This is to keep footprint down as the textual representation of a type can be up to twice as large as the BCS representation.
  • Names have been standardised.

Test plan

Run the indexer twice on the first 1000 checkpoints and using only the event pipelines. This produces no information on the first round, but updates the watermark, and on the second iteration, it will iterate through no checkpoints before shutting down:

sui$ cargo run -p sui-indexer-alt --                                             \
  --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \
  --remote-store-url https://checkpoints.mainnet.sui.io                          \
  --last-checkpoint 1000                                                         \
  --pipeline ev_emit_mod --pipeline ev_struct_inst

Then run the same pipelines over checkpoints 9,000,000 to 9,002,000, which do include events:

sui$ cargo run -p sui-indexer-alt --                                             \
  --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \
  --remote-store-url https://checkpoints.mainnet.sui.io                          \
  --first-checkpoint 9000000 --last-checkpoint 9002000                           \
  --pipeline ev_emit_mod --pipeline ev_struct_inst

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

@amnn amnn self-assigned this Oct 19, 2024
Copy link

vercel bot commented Oct 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 19, 2024 5:22pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Oct 19, 2024 5:22pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Oct 19, 2024 5:22pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Oct 19, 2024 5:22pm

## Description

This change handles two edge cases related to out-of-order commits that
were uncovered through the work on event indices. In both cases the
committer could get stuck, because none of the conditions guarding
select arms were met, but the scenario was different in each case:

- In the first case, the committer could get stuck because there were no
  more pending rows to write, but there were still pre-committed
  checkpoints to handle, and the logic to update watermarks based on the
  pre-committed checkpoints was guarded by a testing that `pending_rows
  > 0`.
- In the second case, the committer could get stuck because the pipeline
  shutdown before any checkpoints came through, meaning it had no work
  to do.

The fix was to allow the main `poll.tick()` arm to run if the receiving
channel was closed, or there were pending precommits left. A short
circuit was also added to move empty batches directly into the precommit
list because we can treat them as already written out.

## Test plan

Ran the following pipeline twice: The first time, it should exit without
writing any data (except the watermark), and the second time it should
just exit because there is no data between its watermark and the given
last checkpoint:

```
sui$ cargo run -p sui-indexer-alt --                                             \
  --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \
  --remote-store-url https://checkpoints.mainnet.sui.io                          \
  --last-checkpoint 1000 --pipeline ev_struct_inst
```
## Description

Adding pipelines to index all tables used to filter events. They differ
from the equivalent schemas in the existing indexer in the following
ways:

- They only mention the transaction sequence number, and not the event
  sequent number. To use these tables, we first filter down to the
  transaction containing the event, and then scan the events in that
  transaction.
- Struct instantiations are stored as a separate name field and then a
  BCS encoded type tag. This is to reduce their footprint (package IDs
  weight twice as much when stored as text compared to BCS), and because
  we only ever filter using an exact match, so we don't need to store
  the instantiation as text.

## Test plan

Ran the indexer locally and spot checked the events.
## Description

Instead of having a separate table for each component of the cascading
index, have a single table, and add multiple indices to it for each of
the cascading cases.

This should reduce the footprint and ingress to the DB, but mildly
increases the risk that the DB picks a bad query plan.

## Test plan

Run all the existing tests, and also run an experiment to confirm that
the DB can successfully plan queries against this kind of schema.

Our initial fear was that if we had multiple indices on a single table,
then the DB may pick the wrong index, and there is still a chance that
might happen if we add indices for disparate filters to the same table
(i.e. we combine the event emit module and event struct instantiation
fields into one table), but we can guarantee that the reader will only
issue one query to each of these merged tables, and it should entirely
overlap with one of its indices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant