Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Detect connection failures forwarded from warcprox and retry th… #285

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

adam-miller
Copy link
Contributor

…em with backoff

@adam-miller
Copy link
Contributor Author

Brozzler didn't really have a retry loop for failed pages except when connection to warcprox failed, and this too would fail in a tight loop. Warcprox connection failures and timeouts are returned to the browser as 502 and 504 status codes, so I'm checking for those and adding a retry loop with backoff. This is accomplished by adding a retry_after field to the page in rethinkdb, and then adjusting the query for claiming a page. This then causes a tight loop on claiming the site, so I add a delay there to avoid attempting to immediately claim a site that was just disclaimed.

brozzler/worker.py Outdated Show resolved Hide resolved
brozzler/worker.py Outdated Show resolved Hide resolved
retry_delay = min(60, 60 * (1.5**page.failed_attempts))
page.retry_after = doublethink.utcnow() + datetime.timedelta(
seconds=retry_delay
)
page.failed_attempts = (page.failed_attempts or 0) + 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're setting this in populate_defaults we don't need to have a default here

@adam-miller adam-miller marked this pull request as ready for review November 12, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant