Skip to content

Decouple blocking content validation from main Overlay Service loop #342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrferris opened this issue Jun 12, 2022 · 3 comments · Fixed by #373
Closed

Decouple blocking content validation from main Overlay Service loop #342

mrferris opened this issue Jun 12, 2022 · 3 comments · Fixed by #373

Comments

@mrferris
Copy link
Collaborator

mrferris commented Jun 12, 2022

This thought relates to content validation code that was merged in #331. Let me know if I'm wrong, or if this is a duplicate/already discussed.

I think that we should remove any blocking async code in the main overlay_service.rs loop, which currently occurs here. This line's await was added in order to call the chain of process_content->process_received_content->validate_content, where validate_content is a blocking async function that makes this entire chain of functions need to be async.

The reasoning for avoiding this is that as long as this function is being awaited on, the select! loop cannot proceed. This means that no queries can progress, requests from the overlay cannot be acted upon, our responses to other nodes cannot happen (leading to peer requests timing out / us being marked disconnected), bucket refreshes can't happen, etc.

Under nominal conditions this shouldn't block for long, but an infura request taking longer than expected would turn our node into a brick for that time. And later it gets even worse if we need to request a new accumulator snapshot in order to validate data: the content query to do that won't be able to progress, leaving us deadlocked unable to get the content that we need in order to validate.

An alternative would be to send our unvalidated content to a dedicated content validation thread, which will then store it if valid.

@mrferris mrferris changed the title Remove blocking validation from main Overlay Service loop Decouple blocking validation from main Overlay Service loop Jun 12, 2022
@mrferris mrferris changed the title Decouple blocking validation from main Overlay Service loop Decouple blocking content validation from main Overlay Service loop Jun 12, 2022
@njgheorghita
Copy link
Collaborator

And later it gets even worse if we need to request a new accumulator snapshot in order to validate data: the content query to do that won't be able to progress, leaving us deadlocked unable to get the content that we need in order to validate.

Hahaha, yup definitely seems like this is going to be an issue. Thanks for typing this up. I'm almost done with purely infura based validation, which I'll try to wrap up quickly then I'll spend some time on this issue

@njgheorghita
Copy link
Collaborator

@mrferris Digging into this one now. Feels like a separate "validate and store content" thread is a suitable solution (for this issue and #340). But want to point out that in #345 @ogenev introduced an async fn process_accept() to the overlay_service which will also need to be updated in order to fully remove the blocking await. It is possible that the delay from this call may not be as significant as the time it would take to validate a piece of content, therefore it's not as big of a concern.

@carver
Copy link
Collaborator

carver commented Jun 29, 2022

Yeah, this tokio::select loop pattern may just be making our life difficult. Trin's event-loop handling here should probably be spawning all of its handling code as non-blocking right? If tokio::select isn't good at that, maybe we should look at a different approach. Rather than figure out one-by-one how to offload it into a different thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants