Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support accessing previous versions of objects #1191

Open
anelson opened this issue Dec 9, 2024 · 4 comments
Open

Support accessing previous versions of objects #1191

anelson opened this issue Dec 9, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@anelson
Copy link

anelson commented Dec 9, 2024

Tell us more about this new feature.

We have a use case in which we need to process customers' S3 objects using a third-party tool that only operates on POSIX-like files. We're using mountpoint-s3 very successfully to let us use this tool on potentially very large S3 objects, without having to download them locally.

However in some cases we want to perform this processing on older versions of those objects. I don't see a way to access older versions in the FUSE filesystem as presented. So today we revert to the S3 REST APIs to discover these older versions, and we would have to download the objects locally to a temp directory in order to process them. This is not ideal, particularly since this tool very seldom needs to actually see every byte in the file in order to do its work.

I'm curious if there are any plans to provide a way to access older versions of objects. Maybe with some hack like a .previous_versions/ directory at the root of the mount, with each object represented as a path under that directory but the object key is itself a directory, and within it is one "file" per version named with the S3 object version key. That's just the first idea that comes to mind, we're not picky as to the details as long as we can surface prior versions of objects as POSIX-like files.

@anelson anelson added the enhancement New feature or request label Dec 9, 2024
@muddyfish
Copy link
Contributor

Thanks for creating this issue - I just wanted to get a few clarifications. Do you need to be able to view multiple versions at once, or is a point in time view fine? If so, is the timestamp known at mount time?

@anelson
Copy link
Author

anelson commented Dec 11, 2024

In this scenario we would need to be able to see all versions of an object. It's not critical for us that this include new versions created since the mount, but being able to access just a single non-current version of an object isn't enough for our use case.

@dannycjones
Copy link
Contributor

Thanks for sharing the use case, Adam! I see where the need to access multiple versions may be coming from.

We don't have any plans to support this right now but I'll leave the issue open so we can gauge interest (through 👍 reactions to the issue).

As an aside, if you didn't need multiple versions you could try using Amazon S3 Object Lambda to provide a view into your bucket with Mountpoint. There's a blog published in October 2024 covering that use case, although it would require you to create the access point knowing the point in time you want to view in advance: https://aws.amazon.com/blogs/storage/access-a-point-in-time-with-amazon-s3-object-lambda/

@anelson
Copy link
Author

anelson commented Dec 17, 2024

Thanks Danny for the suggestion.

I'm afraid that solution would be a problem for us for reasons of cost. If the bucket has millions of objects in it, the addition of Object Lambda plus a database would dramatically increase the cost to process a bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants