Missing Blocks After RPC Failure, Resolved by Pruning DB #1008

iHiteshAgrawal · 2024-07-31T08:13:11Z

We are experiencing an issue where a number of blocks are missing from the blockchain after encountering RPC failures. We are utilizing load balancer for our RPC connections.

Summary:

RPC calls fail at specific block heights (RPC was down).
The blocks at these heights are missing from indexer, after switching to good RPC (without pruning the db).

Fix: Pruning the database and restarting the indexer successfully recovered the missing blocks.

PS: The load balancer doesn't seem to be switching to other RPC URL after one fails at a particular request.

Related: #974 and #861

0xOlias · 2024-08-05T15:00:56Z

Thanks for opening. You're correct that the loadBalance transport does not "skip" an inner transport if it starts returning errors. We're working on a new transport that combines the behavior of loadBalance and Viem's fallback transport to handle this scenario.

Regarding the missing blocks issue, it would be helpful to understand how specifically the RPC was "down". If it was fully down (eg 500 status codes) it seems unlikely that Ponder would continue marking block ranges as cached. However, if it was serving incorrect data (e.g. incorrect block.logsBloom or incomplete eth_getLogs responses) that could cause the issue you're seeing. Any additional info you have here would be helpful.

Also, if you happen to still have the "corrupted" database around, it would be helpful to inspect it. Let me know if you do and perhaps you could DM me a connection string (or database file if using SQLite) on Telegram.

iHiteshAgrawal · 2024-08-06T12:39:18Z

Thanks for opening. You're correct that the loadBalance transport does not "skip" an inner transport if it starts returning errors. We're working on a new transport that combines the behavior of loadBalance and Viem's fallback transport to handle this scenario.

Regarding the missing blocks issue, it would be helpful to understand how specifically the RPC was "down". If it was fully down (eg 500 status codes) it seems unlikely that Ponder would continue marking block ranges as cached. However, if it was serving incorrect data (e.g. incorrect block.logsBloom or incomplete eth_getLogs responses) that could cause the issue you're seeing. Any additional info you have here would be helpful.

Also, if you happen to still have the "corrupted" database around, it would be helpful to inspect it. Let me know if you do and perhaps you could DM me a connection string (or database file if using SQLite) on Telegram.

Thanks for the insight on the load balancer behavior. We're looking forward to the new transport that addresses this.

To clarify the RPC issue, it was indeed completely down, returning 500 status codes. We also observed errors in the Ponder logs, specifically messages like: "BlockNotFoundError: Block at hash .... could not be found". This seems to suggest that Ponder was attempting to process blocks but couldn't access them, and skipped them (blocks) after we changed RPC.

Unfortunately, we no longer have access to the corrupted database. We performed the pruning operation as a way to quickly recover and resume normal operations.

0xOlias · 2024-08-07T15:38:21Z

Got it, thanks for the response. Without the corrupted database, there's not much we can do. However, we are working on an improvement to the internal realtime sync engine, and before we ship that we will improve testing for the "temporary 500" case to make sure we aren't marking invalid data as cached/valid.

Closing for now. If this happens again, please keep the corrupted database so we can inspect it, and re-open. Thanks again for reporting, every issue helps.

github-project-automation bot added this to Ponder Roadmap Jul 31, 2024

github-project-automation bot moved this to Todo in Ponder Roadmap Jul 31, 2024

0xOlias closed this as completed Aug 7, 2024

github-project-automation bot moved this from Todo to Done in Ponder Roadmap Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Blocks After RPC Failure, Resolved by Pruning DB #1008

Missing Blocks After RPC Failure, Resolved by Pruning DB #1008

iHiteshAgrawal commented Jul 31, 2024 •

edited

Loading

0xOlias commented Aug 5, 2024

iHiteshAgrawal commented Aug 6, 2024

0xOlias commented Aug 7, 2024

Missing Blocks After RPC Failure, Resolved by Pruning DB #1008

Missing Blocks After RPC Failure, Resolved by Pruning DB #1008

Comments

iHiteshAgrawal commented Jul 31, 2024 • edited Loading

0xOlias commented Aug 5, 2024

iHiteshAgrawal commented Aug 6, 2024

0xOlias commented Aug 7, 2024

iHiteshAgrawal commented Jul 31, 2024 •

edited

Loading