-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexer-alt: event indices #19926
Open
amnn
wants to merge
3
commits into
main
Choose a base branch
from
amnn/idx-alt-evs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
indexer-alt: event indices #19926
+265
−20
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
## Description This change handles two edge cases related to out-of-order commits that were uncovered through the work on event indices. In both cases the committer could get stuck, because none of the conditions guarding select arms were met, but the scenario was different in each case: - In the first case, the committer could get stuck because there were no more pending rows to write, but there were still pre-committed checkpoints to handle, and the logic to update watermarks based on the pre-committed checkpoints was guarded by a testing that `pending_rows > 0`. - In the second case, the committer could get stuck because the pipeline shutdown before any checkpoints came through, meaning it had no work to do. The fix was to allow the main `poll.tick()` arm to run if the receiving channel was closed, or there were pending precommits left. A short circuit was also added to move empty batches directly into the precommit list because we can treat them as already written out. ## Test plan Ran the following pipeline twice: The first time, it should exit without writing any data (except the watermark), and the second time it should just exit because there is no data between its watermark and the given last checkpoint: ``` sui$ cargo run -p sui-indexer-alt -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 1000 --pipeline ev_struct_inst ```
## Description Adding pipelines to index all tables used to filter events. They differ from the equivalent schemas in the existing indexer in the following ways: - They only mention the transaction sequence number, and not the event sequent number. To use these tables, we first filter down to the transaction containing the event, and then scan the events in that transaction. - Struct instantiations are stored as a separate name field and then a BCS encoded type tag. This is to reduce their footprint (package IDs weight twice as much when stored as text compared to BCS), and because we only ever filter using an exact match, so we don't need to store the instantiation as text. ## Test plan Ran the indexer locally and spot checked the events.
amnn
force-pushed
the
amnn/idx-alt-evs
branch
from
October 19, 2024 16:37
59571f6
to
bb72581
Compare
## Description Instead of having a separate table for each component of the cascading index, have a single table, and add multiple indices to it for each of the cascading cases. This should reduce the footprint and ingress to the DB, but mildly increases the risk that the DB picks a bad query plan. ## Test plan Run all the existing tests, and also run an experiment to confirm that the DB can successfully plan queries against this kind of schema. Our initial fear was that if we had multiple indices on a single table, then the DB may pick the wrong index, and there is still a chance that might happen if we add indices for disparate filters to the same table (i.e. we combine the event emit module and event struct instantiation fields into one table), but we can guarantee that the reader will only issue one query to each of these merged tables, and it should entirely overlap with one of its indices.
amnn
force-pushed
the
amnn/idx-alt-evs
branch
from
October 19, 2024 17:20
3b81f17
to
e57832f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR includes three changes (separated across three commits). The first one fixes some edge cases in the committer related to handling checkpoint streams that are either empty themselves, or produce no data to write, the second introduces the event index tables roughly as they are in the current schema, and the last commit suggests a simplification.
The main changes to the schema are as follows:
TypeTag
. This is to keep footprint down as the textual representation of a type can be up to twice as large as the BCS representation.Test plan
Run the indexer twice on the first 1000 checkpoints and using only the event pipelines. This produces no information on the first round, but updates the watermark, and on the second iteration, it will iterate through no checkpoints before shutting down:
Then run the same pipelines over checkpoints 9,000,000 to 9,002,000, which do include events:
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.