Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files upload #22

Closed
wants to merge 12 commits into from
Closed

Large files upload #22

wants to merge 12 commits into from

Conversation

mpucholblasco
Copy link

@mpucholblasco mpucholblasco commented Dec 30, 2019

We are going to use s3-sftp-proxy on our production environment and we realised that large files are stored on memory, so we needed a powerful instance.

This PR aims to allow large file uploads without having to store them on memory. Insted, it uses S3 multipart upload feature (included more information on Details section in readme). In order to improve its performance, also parallel uploads (in both sides, server and S3) are included.

Now we're going to work on parallel downloads using multipart downloads (if it is feasible).

It also includes: #13, #17, #18, #19, #20.

Fixes #21

mpucholblasco and others added 12 commits December 23, 2019 13:59
* Added audit logging for file put

* Added audit logs for all file operations

* Audit logging to info

* [CHG] initialization based on struct

Co-authored-by: Dmitry Chepurovskiy <[email protected]>
* [CHG] return error to client when upload to s3 fails

* [CHG] remove unused path
* Add support for mkdir/rmdir methods

* Add Prometheus metrics support

* Fix readme formatting

* Fix readme formatting

* Add support for mkdir/rmdir methods (#2)

* Add gitignore

* Lock dependencies

* Add dockerfile

* [CHG] dockerfile to use go mod

Co-authored-by: Yurii Vlasenko <[email protected]>
@leosunmo
Copy link

leosunmo commented Jan 10, 2020

This is some really impressive work! I was also looking at potentially using s3-sftp-proxy and had similar concerns. Thank you so much for this work as it seems to solve all of my problems. I hope it gets merged soon!

@drakkan
Copy link

drakkan commented Jan 22, 2020

Hi,

maybe you could be interested to sftpgo

https://github.com/drakkan/sftpgo/

S3 support is quite new, it is supported since few days, but it should not have this issue since it uses multipart uploads and parallel downloads for storing and retrieving files from S3 without keeping the files in memory

DISCLAIMER: I'm the author

@mpucholblasco
Copy link
Author

Hello @drakkan ,
really impressive project! A lot of features, congrats!

I have a doubt on which is the process you use to interact with S3. As I could see on source code, you use S3 manager (offered by AWS) and pipeat (based on your PR https://github.com/eikenb/pipeat/pull/1/files), which creates a unix file pipe (correct me if I'm wrong). Due to that, I don't know which is the memory consumption on large files (for both upload and download). Please, could you give me information about it? Have you done tests on these big files to ensure that memory consumption is not excessive? (this is the most problem I have right now with current s3-sftp-proxy process to download files).

Thanks!

@drakkan
Copy link

drakkan commented Jan 23, 2020

Hello @drakkan ,
really impressive project! A lot of features, congrats!

thanks!

I have a doubt on which is the process you use to interact with S3. As I could see on source code, you use S3 manager (offered by AWS) and pipeat (based on your PR https://github.com/eikenb/pipeat/pull/1/files), which creates a unix file pipe (correct me if I'm wrong). Due to that, I don't know which is the memory consumption on large files (for both upload and download). Please, could you give me information about it? Have you done tests on these big files to ensure that memory consumption is not excessive? (this is the most problem I have right now with current s3-sftp-proxy process to download files).

sftpgo uses my pipeat fork for now (until the needed changes are merged upstream)

https://github.com/drakkan/pipeat

pipeat use unliked files, S3 uploads/downloads are stored inside these unliked files.

An unliked file is a file "marked for deletion", it will remain in existence until the last file descriptor referring to it is closed, and it will be deleted even if the process that use it crashs, so no temporary files will be left.

The unliked files require disk space until they are deleted, so if you download a 2GB file you need this space available on your disk but the memory requirement is low, SFTPGo creates the unliked files inside the user local home directory.

pipeat is written by the same author of pkg/sftp and it was originally written for connecting SFTP and S3

I really appreciate if you can test SFTPGo, for your use case, and report any issue you found, thanks!

Thanks!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support multipart uploading to decrease needed RAM for proxying big files
3 participants