Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fsync and handle write errors #313

Merged
merged 1 commit into from
Jun 29, 2023
Merged

Conversation

passaro
Copy link
Contributor

@passaro passaro commented Jun 23, 2023

Implement fsync to allow users to complete a put request and receive confirmation that it succeeded or failed. If a file handle is released without a call to fsync, release will still complete the upload as before.

Wrap the UploadRequest in the file handle in a new UploadState enum, in order to detect:

  • on release, whether the request had been already completed by an fsync call,
  • write is invoked after an fsync,
  • write (or fsync) is invoked after a previous call failed.

Also adds support for put failures to FailureClient.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@passaro passaro temporarily deployed to PR integration tests June 23, 2023 14:55 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 23, 2023 14:55 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 23, 2023 14:55 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 23, 2023 14:55 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 26, 2023 10:42 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 26, 2023 10:42 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 26, 2023 10:42 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 26, 2023 10:42 — with GitHub Actions Inactive
@passaro passaro changed the title Implement fsync and handle write errors (v2) Implement fsync and handle write errors Jun 26, 2023
@passaro
Copy link
Contributor Author

passaro commented Jun 26, 2023

Note that this PR replaces #308

@passaro passaro marked this pull request as ready for review June 26, 2023 10:45
FileHandleType::Read { .. } => return Ok(()),
};
let Some(in_progress_request) = request.take() else {
error!(key=?file_handle.full_key, "upload already completed or aborted");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about tracking a little more state in the handle so we could distinguish these two cases. Is that worth doing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should be warn!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added UploadState to keep track of failure vs completion.

Comment on lines 650 to 651
let request = request.into_inner();
let size = request.size() as usize;
let put = request.complete().await;
let result = match put {
Ok(_) => {
debug!(key, size, "put succeeded");
Ok(())
}
Err(e) => {
error!(key, size, "put failed, object was not uploaded: {e:?}");
// This won't actually be seen by the user because `release` is async, but
// it's the right thing to do.
Err(libc::EIO)
if let Some(in_progress_request) = request {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be

if let Some(request) = request.into_inner() {
    // ...

match in_progress_request.complete().await {
Ok(_) => debug!(key, size, "put succeeded"),
Err(e) => {
error!(key, size, "put failed, object was not uploaded: {e:?}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the comment about the error not being user-visible.

bucket: String,
key: String,
next_request_offset: u64,
request: Client::PutObjectRequest,
_handle: Handle,
Copy link
Member

@jamesbornholt jamesbornholt Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the handle should be here -- it doesn't really have anything to do with S3 uploads (it's a filesystem concept). It should probably be back in fs. If that gets too messy, we could factor out a WriteFileHandle type and change the enum to FileHandleType::Write(AsyncMutex<Option<WriteFileHandle>>) so that we can put the transition logic onto that struct instead.

So something like

struct WriteFileHandle {
    upload: UploadRequest<Client, WriteHandle>,
    token: WriteHandle,
}

impl WriteFileHandle {
    fn complete(mut self) -> Result<(), libc::c_int> {
        // ...the common logic that lives in both `fsync` and `release` right now...

        // and explicitly complete the write handle
        self.token.finish_writing()?;
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new UploadState struct does something similar, but I've actually moved the WriteHandle out of it and I'm calling finish_writing() only on release.
I still need to think a bit more about what should happen to the inode on failure and whether the current finish_writing() needs to behave differently in case of failure (e.g. should it still convert local directories to remote?). But maybe we can review that in a separate change.

Comment on lines 843 to 850
impl Drop for WriteHandle {
fn drop(&mut self) {
if let Err(e) = self.finish_writing() {
error!(error=?e, ino=self.ino, "error closing file handle");
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment -- I don't like moving fallible stuff like this into Drop impls, it robs you of any chance to expose the error.

@passaro passaro temporarily deployed to PR integration tests June 27, 2023 14:46 — with GitHub Actions Inactive
@passaro passaro had a problem deploying to PR integration tests June 27, 2023 14:46 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests June 27, 2023 14:46 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests June 27, 2023 14:46 — with GitHub Actions Failure
@passaro passaro temporarily deployed to PR integration tests June 27, 2023 15:05 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 27, 2023 15:05 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 27, 2023 15:05 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 27, 2023 15:05 — with GitHub Actions Inactive
Comment on lines 120 to 127
let upload = match self {
Self::InProgress(upload) => upload,
Self::Failed | Self::Completed => {
// Nothing to do.
return Ok(());
}
};
Self::complete_upload(upload, key).await
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
let upload = match self {
Self::InProgress(upload) => upload,
Self::Failed | Self::Completed => {
// Nothing to do.
return Ok(());
}
};
Self::complete_upload(upload, key).await
match self {
Self::InProgress(upload) => Self::complete_upload(upload, key).await,
Self::Failed | Self::Completed => Ok(()),
}

}

async fn complete(&mut self, key: &str) -> Result<(), libc::c_int> {
let upload = match std::mem::replace(self, Self::Completed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda weird because it can replace Self::Failed with Self::Completed. I think that only messes up error messages in some edge cases, but probably we should try to get it right anyway.

@passaro passaro had a problem deploying to PR integration tests June 29, 2023 05:31 — with GitHub Actions Failure
@passaro passaro deployed to PR integration tests June 29, 2023 05:31 — with GitHub Actions Active
@passaro passaro deployed to PR integration tests June 29, 2023 05:31 — with GitHub Actions Active
@passaro passaro deployed to PR integration tests June 29, 2023 05:31 — with GitHub Actions Active
@passaro passaro enabled auto-merge (squash) June 29, 2023 05:36
@passaro passaro had a problem deploying to PR integration tests June 29, 2023 10:05 — with GitHub Actions Failure
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 10:05 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 10:05 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 10:05 — with GitHub Actions Inactive
Signed-off-by: Alessandro Passaro <[email protected]>
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:13 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:13 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:13 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:13 — with GitHub Actions Inactive
@passaro passaro merged commit c89f05d into awslabs:main Jun 29, 2023
@passaro passaro deleted the fsync-v2 branch July 3, 2023 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants