Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for S3 storage #22

Open
abollini opened this issue Aug 9, 2019 · 3 comments
Open

Add support for S3 storage #22

abollini opened this issue Aug 9, 2019 · 3 comments

Comments

@abollini
Copy link
Contributor

abollini commented Aug 9, 2019

Thanks for the great project. We are currently running the system using AWS S3 to store source images and pre-scaled sizes (thumbnail and a larger preview). Access to S3 is currently hidden via s3fs but we are hitting some limitation and issue that we guess are related to internal caching mechanism of digilib about the underline storage structure (sometime it doesn't see new files). Indeed, images flow to S3 continuously without pass for s3fs (that is known to don't support distribute access to S3) so this can be also the cause or at least a co-cause.

To improve the situation we want to evaluate the effort needed to implement a direct S3 connector for digilib and, if this ends to be a suitable solution, we would like to contribute it back.

Do you have any additional documentation about how the storage layer work on Digilib? can you give us direction about how to make Digilib modular and plugin in a different Storage layer?

@robcast
Copy link
Owner

robcast commented Aug 9, 2019

It would be really useful to have a S3 storage access option!

I have added a storage connector for the CDSTAR storage system in the last version of digilib. The code lives in the common-cdstar module and there is some documentation how to enable it in the digilib-config documentation under "Storage backends". The code is only lightly tested ;-)

Sorry, there is no real documentation on how to implement a different storage layer but you can check out the code for the CDSTAR plugin and I'll be happy to help as I can and answer any questions.

The code to detect filesystem changes relies on change dates for directories and I haven't tested it thorougly in a long time ;-)

I started some optimizations: digilib tries to read the image sizes of a whole "directory" from metadata first so it does not have to load the images to get the sizes. It may still do en extra image load the first time to determine size and type that I want to get rid of.

I have not included code to deal with multiple sizes for thumbnails but that would be a useful feature and I would like to help with that. The normal filesystem code does this and I planned to make the mechanism more pluggable to also enable mixed online and disk caches (TextGrid does this with custom code).

@abollini
Copy link
Contributor Author

abollini commented Sep 4, 2019

Hi @robcast
thanks for these extra information and sorry for the late reply.
We have not yet decided if implement that or not, honestly we are currently also investigating alternative like a serverless IIIF implementation. We will post here any update on this work

@robcast
Copy link
Owner

robcast commented Sep 5, 2019

Serverless IIIF sounds interesting. Do you think about FaaS-like IIIF servers or a static level0 implementation? Let us know what you find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants