Large files upload #22

mpucholblasco · 2019-12-30T20:20:18Z

We are going to use s3-sftp-proxy on our production environment and we realised that large files are stored on memory, so we needed a powerful instance.

This PR aims to allow large file uploads without having to store them on memory. Insted, it uses S3 multipart upload feature (included more information on Details section in readme). In order to improve its performance, also parallel uploads (in both sides, server and S3) are included.

Now we're going to work on parallel downloads using multipart downloads (if it is feasible).

It also includes: #13, #17, #18, #19, #20.

Fixes #21

* Added audit logging for file put * Added audit logs for all file operations * Audit logging to info * [CHG] initialization based on struct Co-authored-by: Dmitry Chepurovskiy <[email protected]>

* [CHG] return error to client when upload to s3 fails * [CHG] remove unused path

* Add support for mkdir/rmdir methods * Add Prometheus metrics support * Fix readme formatting * Fix readme formatting * Add support for mkdir/rmdir methods (#2) * Add gitignore * Lock dependencies * Add dockerfile * [CHG] dockerfile to use go mod Co-authored-by: Yurii Vlasenko <[email protected]>

leosunmo · 2020-01-10T01:36:54Z

This is some really impressive work! I was also looking at potentially using s3-sftp-proxy and had similar concerns. Thank you so much for this work as it seems to solve all of my problems. I hope it gets merged soon!

drakkan · 2020-01-22T18:51:26Z

Hi,

maybe you could be interested to sftpgo

https://github.com/drakkan/sftpgo/

S3 support is quite new, it is supported since few days, but it should not have this issue since it uses multipart uploads and parallel downloads for storing and retrieving files from S3 without keeping the files in memory

DISCLAIMER: I'm the author

mpucholblasco · 2020-01-23T10:23:06Z

Hello @drakkan ,
really impressive project! A lot of features, congrats!

I have a doubt on which is the process you use to interact with S3. As I could see on source code, you use S3 manager (offered by AWS) and pipeat (based on your PR https://github.com/eikenb/pipeat/pull/1/files), which creates a unix file pipe (correct me if I'm wrong). Due to that, I don't know which is the memory consumption on large files (for both upload and download). Please, could you give me information about it? Have you done tests on these big files to ensure that memory consumption is not excessive? (this is the most problem I have right now with current s3-sftp-proxy process to download files).

Thanks!

drakkan · 2020-01-23T10:51:17Z

Hello @drakkan ,
really impressive project! A lot of features, congrats!

thanks!

I have a doubt on which is the process you use to interact with S3. As I could see on source code, you use S3 manager (offered by AWS) and pipeat (based on your PR https://github.com/eikenb/pipeat/pull/1/files), which creates a unix file pipe (correct me if I'm wrong). Due to that, I don't know which is the memory consumption on large files (for both upload and download). Please, could you give me information about it? Have you done tests on these big files to ensure that memory consumption is not excessive? (this is the most problem I have right now with current s3-sftp-proxy process to download files).

sftpgo uses my pipeat fork for now (until the needed changes are merged upstream)

https://github.com/drakkan/pipeat

pipeat use unliked files, S3 uploads/downloads are stored inside these unliked files.

An unliked file is a file "marked for deletion", it will remain in existence until the last file descriptor referring to it is closed, and it will be deleted even if the process that use it crashs, so no temporary files will be left.

The unliked files require disk space until they are deleted, so if you download a 2GB file you need this space available on your disk but the memory requirement is low, SFTPGo creates the unliked files inside the user local home directory.

pipeat is written by the same author of pkg/sftp and it was originally written for connecting SFTP and S3

I really appreciate if you can test SFTPGo, for your use case, and report any issue you found, thanks!

Thanks!

mpucholblasco and others added 12 commits December 23, 2019 13:59

Added go module files and git ignore (#1)

732c23a

Dm3ch audit logs (#3)

662520a

* Added audit logging for file put * Added audit logs for all file operations * Audit logging to info * [CHG] initialization based on struct Co-authored-by: Dmitry Chepurovskiy <[email protected]>

Minor changes (#4)

c3efc8d

* [CHG] return error to client when upload to s3 fails * [CHG] remove unused path

[WIP] multipartupload

ed26ee2

[WIP] large files

9dbc042

[ADD] partition pool timeout and cancel

a95d8bb

[FIX] incorrect locks

59636b8

[CHG] improved dockerfile

b0a454a

[CHG] readme with latest changes

69c0ca1

[CHG] renamed part pool to memory buffer pool

ba996af

[CHG] old config names

1273565

mpucholblasco mentioned this pull request Dec 31, 2019

Hashed password support #23

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large files upload #22

Large files upload #22

mpucholblasco commented Dec 30, 2019 •

edited

Loading

leosunmo commented Jan 10, 2020 •

edited

Loading

drakkan commented Jan 22, 2020 •

edited

Loading

mpucholblasco commented Jan 23, 2020

drakkan commented Jan 23, 2020 •

edited

Loading

Large files upload #22

Large files upload #22

Conversation

mpucholblasco commented Dec 30, 2019 • edited Loading

leosunmo commented Jan 10, 2020 • edited Loading

drakkan commented Jan 22, 2020 • edited Loading

mpucholblasco commented Jan 23, 2020

drakkan commented Jan 23, 2020 • edited Loading

mpucholblasco commented Dec 30, 2019 •

edited

Loading

leosunmo commented Jan 10, 2020 •

edited

Loading

drakkan commented Jan 22, 2020 •

edited

Loading

drakkan commented Jan 23, 2020 •

edited

Loading