Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry upload requests under certain conditions #210

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zachmullen
Copy link
Contributor

Just posting this as a draft for preliminary review. I still need to test it manually. It's also a reasonable target for automated testing.

@zachmullen zachmullen force-pushed the retry-requests branch 3 times, most recently from 1c5610f to 4c76e98 Compare March 5, 2021 02:00
@zachmullen zachmullen marked this pull request as ready for review March 9, 2021 14:29
@zachmullen
Copy link
Contributor Author

I've tested this manually, so I'm marking it ready for review.

One thing I discovered is that shutting down my minio worked well to simulate an axios network error, but Chrome devtools switching to "offline" mode triggers a different error type that I'm not sure we should attempt to catch.

There is no test framework in this repository yet, so I haven't added any unit tests. I think it would be good to do so, but I didn't want to blow up the scope of the PR.

const axiosErr = (error as AxiosError);
return axiosErr.isAxiosError && (
!axiosErr.response
|| [429, 500, 502, 503, 504].includes(axiosErr.response.status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not all 5xx errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danlamanna suggested this list, and I added one extra code to his suggestion. I think I prefer a whitelist of specific codes, such that any "unexpected" situation will break out of our retry loop, but I'm happy to expand that list if there are other codes you want to add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

429 makes sense, but I think we should retry the entire range of 5xx errors unless we know it's inappropriate. @danlamanna what do you think?

);
}

async function retry<T>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using one of these:

I prefer https://www.npmjs.com/package/retry-axios , since it uses proper Axios interceptors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer not to have to rewrite this, but if you think using one of those libraries is going to be a better experience, go ahead and make the decision.

Copy link
Contributor

@brianhelba brianhelba Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using https://www.npmjs.com/package/retry-axios has the potential to remove most of the code (and maintenance burden) here, I think it would be worth trying. @zachmullen If you don't have time, I'd be happy to try adding it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm using axios-retry in NLI, and it works fine. retry-axios does look a little more refined though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianhelba that would be great if you want to take a crack at it. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure, but from the README it looks like axios-retry doesn't have a callback when retry is happening, but retry-axios does.

@@ -140,15 +186,18 @@ export default class S3FileFieldClient {
protected async completeUpload(
multipartInfo: MultipartInfo, parts: UploadedPart[],
): Promise<void> {
const response = await this.api.post('upload-complete/', {
const response = await retry<AxiosResponse>(() => this.api.post('upload-complete/', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since POST is not necessarily idempotent, do we know what happens if a client re-calls the endpoint mistakenly (perhaps due to the network dropping only the response)?

If this endpoint is actually idempotent, maybe we should change it to a PUT request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either the endpoint should be idempotent, or if not, it should return 400 if a duplicate happens, which would break us out of the loop.

@@ -168,8 +219,10 @@ export default class S3FileFieldClient {
* @param multipartInfo Signed information returned from /upload-complete/.
*/
protected async finalize(multipartInfo: MultipartInfo): Promise<string> {
const response = await this.api.post('finalize/', {
const response = await retry<AxiosResponse>(() => this.api.post('finalize/', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, do we know what happens when this is called repeatedly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully either 200 OK or 400 bad request. I don't know for sure.

// Send the CompleteMultipartUpload operation to S3
await axios.post(completeUrl, body, {
await retry<AxiosResponse>(() => axios.post(completeUrl, body, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that if this is called repeatedly, AWS will either:

  • idempotently succeed again, which is fine
  • return an error in the body, which is unrecoverable (and the retries should stop)

I don't think we need to do anything more here, but we should be sure to handle this with #209.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Needs review
Development

Successfully merging this pull request may close these issues.

3 participants