Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validator: retries on 5xx & 429's #19

Merged
merged 1 commit into from
Mar 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 25 additions & 6 deletions validator/service/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,29 @@ type CheckBlobResult struct {
MismatchedData []string
}

// shouldRetry returns true if the status code is one of the retryable status codes
func shouldRetry(status int) bool {
switch status {
case http.StatusInternalServerError, http.StatusBadGateway, http.StatusServiceUnavailable, http.StatusGatewayTimeout, http.StatusTooManyRequests:
return true
default:
return false
}
}

// fetchWithRetries fetches the sidecar and handles retryable error cases (5xx status codes + 429 + connection errors)
func fetchWithRetries(ctx context.Context, endpoint BlobSidecarClient, id string, format Format) (int, storage.BlobSidecars, error) {
return retry.Do2(ctx, retryAttempts, retry.Exponential(), func() (int, storage.BlobSidecars, error) {
status, resp, err := endpoint.FetchSidecars(id, format)

if err == nil && status != http.StatusOK && shouldRetry(status) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you retry on (certain) errors as well? I can see occasional connectivity issues causing errors to be returned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will retry when endpoint.FetchSidecars returns a error (due to a connectivity issue). In that case err != nil, and it'll just fall through to return the original error (which triggers the retry).

Adding this if statement, just ensures that err is set (and will trigger a retry) when the status code is a 5xx or 429.

If you think it's cleaner, we could change this to:

if shouldRetry(status) { return 0, BlobSidecars{}, errors.New(...) }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, no it's fine as is, should have taken a bit more time to understand what was going on.

err = fmt.Errorf("retryable status code: %d", status)
}

return status, resp, err
})
}

// checkBlobs iterates all blocks in the range start:end and checks that the blobs from the beacon-node and blob-api
// are identical, when encoded in both JSON and SSZ.
func (a *ValidatorService) checkBlobs(ctx context.Context, start phase0.Slot, end phase0.Slot) CheckBlobResult {
Expand All @@ -108,19 +131,15 @@ func (a *ValidatorService) checkBlobs(ctx context.Context, start phase0.Slot, en

l := a.log.New("format", format, "slot", slot)

blobStatus, blobResponse, blobError := retry.Do2(ctx, retryAttempts, retry.Exponential(), func() (int, storage.BlobSidecars, error) {
return a.blobAPI.FetchSidecars(id, format)
})
blobStatus, blobResponse, blobError := fetchWithRetries(ctx, a.blobAPI, id, format)

if blobError != nil {
result.ErrorFetching = append(result.ErrorFetching, id)
l.Error(validationErrorLog, "reason", "error-blob-api", "error", blobError, "status", blobStatus)
continue
}

beaconStatus, beaconResponse, beaconErr := retry.Do2(ctx, retryAttempts, retry.Exponential(), func() (int, storage.BlobSidecars, error) {
return a.beaconAPI.FetchSidecars(id, format)
})
beaconStatus, beaconResponse, beaconErr := fetchWithRetries(ctx, a.beaconAPI, id, format)

if beaconErr != nil {
result.ErrorFetching = append(result.ErrorFetching, id)
Expand Down