You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been doing some experimenting with using an alternative data source (as opposed to the RPC) for the default friend-tech ponder indexer created in the pnpm create ponder flow.
Its an interesting case as there more than 6 million events to be indexed and from what I can tell it will take more than 15hours to sync using a paid RPC service. My experimentation goal was to try and populate the ponder_sync.db >100x faster (than using an RPC) and then being able to start the ponder service and allow it to index using this inserted data.
So far I've found some pretty interesting results:
1 - I am able to fetch all the required log, block and transaction data for the ponder_sync.db in this case (+6m events, blocks and txs) in around 12minutes from scratch.
2 - Writing to the sqlite db is currently the bottleneck, and seems to take in the order of 24minutes for the fetched data to be written and persisted to the db.
The experimental approach essentially opens a connection to the ponder_sync.db and continually batch inserts required data into the blocks, logs, logFilterIntervals and transactions tables. It runs completely independently of the ponder core code and no ponder core code is modified. Following this alternative historical sync, running the main ponder process successfully starts indexing from this data.
The alternative to the RPC enabling this speed up is hypersync (disclaimer: I am part of the team). Its a fast flexible alternative to the RPC catered for data heavy use-cases such as indexing.
A deeper integration of hypersync as an additional alternative to RPC in the historical sync service in ponder core might allow Ponder users to achieve much quicker (>100x) historical sync times. This opt-in alternative could be useful for Ponder users who value or require this performance for their specific use-case. I completely understand this increases code complexity and maintenance. I was wondering if it might be worth it given the advantages it unlocks and curious to hear your thoughts and considerations.
moose-code
changed the title
[Idea] Supporting historical sync source alternative to RPC
[Feature] Supporting historical sync source alternative to RPC
May 16, 2024
Hi there!
I've been doing some experimenting with using an alternative data source (as opposed to the RPC) for the default friend-tech ponder indexer created in the
pnpm create ponder flow
.Its an interesting case as there more than 6 million events to be indexed and from what I can tell it will take more than 15hours to sync using a paid RPC service. My experimentation goal was to try and populate the ponder_sync.db >100x faster (than using an RPC) and then being able to start the ponder service and allow it to index using this inserted data.
So far I've found some pretty interesting results:
1 - I am able to fetch all the required log, block and transaction data for the ponder_sync.db in this case (+6m events, blocks and txs) in around 12minutes from scratch.
2 - Writing to the sqlite db is currently the bottleneck, and seems to take in the order of 24minutes for the fetched data to be written and persisted to the db.
The experimental approach essentially opens a connection to the ponder_sync.db and continually batch inserts required data into the blocks, logs, logFilterIntervals and transactions tables. It runs completely independently of the ponder core code and no ponder core code is modified. Following this alternative historical sync, running the main ponder process successfully starts indexing from this data.
The alternative to the RPC enabling this speed up is hypersync (disclaimer: I am part of the team). Its a fast flexible alternative to the RPC catered for data heavy use-cases such as indexing.
A deeper integration of hypersync as an additional alternative to RPC in the historical sync service in ponder core might allow Ponder users to achieve much quicker (>100x) historical sync times. This opt-in alternative could be useful for Ponder users who value or require this performance for their specific use-case. I completely understand this increases code complexity and maintenance. I was wondering if it might be worth it given the advantages it unlocks and curious to hear your thoughts and considerations.
Here is an example repo for you to try it out too: https://github.com/enviodev/friendtech-ponder-hypersync/tree/main
(keep in mind its currently bottlenecked by being a single thread process frequently blocked by sqlite inserts - this would change)
The text was updated successfully, but these errors were encountered: