Upload integrity considerations #883
Replies: 1 comment 1 reply
-
You are correct: it is not possible to end up with partial content in your remote bucket for either s3fs or gcsfs. Each call will either complete or not: if a success code is returned, it certainly completed; if an error core is received, it certainly failed; if no code is returned, it may have succeeded, but you do not know. Any larger uploads are actually split into several calls (partial uploads), and the file in the bucket is not changed to the new state until this process is completed. This can mean that, in the case of some problem, the upload parts remain on the server without complete being called. We try to avoid this situation, but some cases (like sudden power loss) cannot be avoided. You would be charged for the storage of the parts even though the bucked doesn't show their contents, but they will expire and be deleted on some timescale. See s3fs methods .list_multipart_uploads and .clear_multipart_uploads |
Beta Was this translation helpful? Give feedback.
-
Hi @martindurant, thanks for a great tool.
I'm wondering if I can use s3fs/gcsfs in such a way so that either file was uploaded in full, or not created? This helps a lot with ensuring that cloud sstorage has only 'correct' data.
AFAIK that's how AWS S3 operates (there is no 'partial upload'), but I am not sure about fsspec/gcsfs
Beta Was this translation helpful? Give feedback.
All reactions