Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real AWS and Transfer-Encoding: chunked #139

Open
makcuk opened this issue Jun 17, 2016 · 3 comments
Open

Real AWS and Transfer-Encoding: chunked #139

makcuk opened this issue Jun 17, 2016 · 3 comments

Comments

@makcuk
Copy link

makcuk commented Jun 17, 2016

I'm trying to prototype a Mar-FS with S3 AWS and got something which cannot explain. So, my repo definition in config is very simple:

<name>Testbed</name>
<version>1.0</version>
<mnt_top>/marfs</mnt_top>
<repo>
  <name>arc</name>
  <host>s3.amazonaws.com</host>
  <update_in_place>no</update_in_place>
  <ssl>yes</ssl>
  <access_method>S3</access_method>
  <chunk_size>2147483648</chunk_size> # 2GB
  <max_get_size>0</max_get_size> # no limit (use chunk_size)
  <max_pack_file_count>10</max_pack_file_count>
  <min_pack_file_count>10</min_pack_file_count>
  <max_pack_file_size>100</max_pack_file_size>
  <min_pack_file_size>100</min_pack_file_size>
  <pack_size>0</pack_size>

  <security_method>S3_AWS_USER</security_method>
  <enc_type>NONE</enc_type>
  <comp_type>NONE</comp_type>
  <correct_type>NONE</correct_type>
  <latency>10000</latency>
</repo>

What is see in debug log, when I try to copy file to /marfs/..., marfs_fuse tries to use chunked upload, but S3 doesn't support it (https://forums.aws.amazon.com/message.jspa?messageID=561616), and i'm getting 501 Not Implemented:

DEBUG=3 MARFSCONFIGRC=/home/vagrant/marfs.cfg ./marfs_fuse /marfs -d -f

DBG: Reading Config File ID[root]
DBG: Config File /root/.awsAuth
FUSE library version: 2.9.2
nullpath_ok: 0
nopath: 0
utime_omit_ok: 0
unique: 1, opcode: INIT (26), nodeid: 0, insize: 56, pid: 0
INIT: 7.22
flags=0x0000f7fb
max_readahead=0x00020000
INIT: 7.19
flags=0x00000020
max_readahead=0x00020000
max_write=0x08000000
max_background=0
congestion_threshold=0
unique: 1, success, outsize: 40
unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 47, pid: 15331
LOOKUP /source
getattr /source
NODEID: 2
unique: 2, success, outsize: 144
unique: 3, opcode: LOOKUP (1), nodeid: 2, insize: 53, pid: 15331
LOOKUP /source/testfile.bin
getattr /source/testfile.bin
NODEID: 3
unique: 3, success, outsize: 144
unique: 4, opcode: OPEN (14), nodeid: 3, insize: 48, pid: 15331
open flags: 0x8001 /source/testfile.bin
here, without content-length
open[140693330726592] flags: 0x8001 /source/testfile.bin
unique: 4, success, outsize: 32
unique: 5, opcode: SETATTR (4), nodeid: 3, insize: 128, pid: 15331
truncate /source/testfile.bin 0
getattr /source/testfile.bin
unique: 5, success, outsize: 120
unique: 6, opcode: REMOVEXATTR (24), nodeid: 3, insize: 53, pid: 15331
removexattr /source/testfile.bin security.ima
unique: 6, error: -61 (No data available), outsize: 16
unique: 7, opcode: GETXATTR (22), nodeid: 3, insize: 68, pid: 15331
getxattr /source/testfile.bin security.capability 0
DBG: Request Time: Fri, 17 Jun 2016 16:26:31 +0000
DBG: StrToSign:
PUT

Fri, 17 Jun 2016 16:26:31 +0000
/xxx/arc/ver.001_003/ns.admins/F___/inode.0000001138/md_ctime.20160617_162538+0000_0/obj_ctime.20160617_162631+0000_0/unq.0/chnksz.80000000/chnkno.0
DBG: Signature: n7W5l5qBlEqjCssP7qZOmoRLaFU=
DBG: aws_curl_enter: 'aws4c.c', line 2084

  • Hostname was NOT found in DNS cache
    unique: 7, error: -61 (No data available), outsize: 16
    unique: 8, opcode: WRITE (16), nodeid: 3, insize: 131152, pid: 15331
    write[140693330726592] 131072 bytes to 0 flags: 0x8001
  • Trying 54.231.17.64...
  • Connected to s3.amazonaws.com (54.231.17.64) port 80 (#0)

    PUT /xxx/arc/ver.001_003/ns.admins/F___/inode.0000001138/md_ctime.20160617_162538+0000_0/obj_ctime.20160617_162631+0000_0/unq.0/chnksz.80000000/chnkno.0 HTTP/1.1
    Host: s3.amazonaws.com
    Accept: /
    Date: Fri, 17 Jun 2016 16:26:31 +0000
    Authorization: AWS zzzzzz=
    Transfer-Encoding: chunked
    Expect: 100-continue

< HTTP/1.1 501 Not Implemented
< x-amz-request-id: E9B2C441E4092C6D
< x-amz-id-2: 7vs5C6DC9/RT+zzzzz=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Fri, 17 Jun 2016 16:32:56 GMT
< Connection: close

  • Server AmazonS3 is not blacklisted
    < Server: AmazonS3
    <

  • Closing connection 0
    DBG: Return Code: 0
    DBG: aws_curl_exit: 'aws4c.c', line 2270

    unique: 8, error: -110 (Connection timed out), outsize: 16
    unique: 9, opcode: FLUSH (25), nodeid: 3, insize: 64, pid: 15331
    unique: 9, error: -38 (Function not implemented), outsize: 16
    unique: 10, opcode: RELEASE (18), nodeid: 3, insize: 64, pid: 0
    release[140693330726592] flags: 0x8001
    unique: 10, success, outsize: 16
    NotImplementedA header you provided implies functionality that is
    not implemented

    Transfer-Encoding
    E9B2C441E4092C6D

Is it a configuration issue or real S3 AWS doesn't supported ?

@jti-lanl
Copy link
Contributor

Hi Max,

Thanks for trying out MarFS.

For writes through fuse, the ultimate size of the object isn't known ahead of time, so we have to use chunked transfer-encoding instead of providing an explicit content-length in the PUT headers. We tested against a different S3 implementation, which does support CTE.

Based on the forum message you mentioned, it looks like we would have to add an 'x-amz-decoded-content-length' header. That sounds feasible, at first, but they also say:

Set the value to the length, in bytes, of the data to be chunked, without counting any metadata. For example, if you are uploading a 4 GB file, set the value to 4294967296.

[See http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html]

Maybe I don't understand, but that sounds like it defeats at least one purpose of CTE.

Meanwhile, we have another tool that we use for copying files in parallel to MarFS. (https://github.com/pftool/pftool/tree/cpp, make sure to use the "cpp" branch) In this case, we do know the final size of the destination, so we can skip using CTE. We haven't been testing with S3 repos in quite a while, but this was working many months ago. If you just wanted to copy files into MarFS, this might work for you.

Thanks,
Jeff

On Jun 17, 2016, at 1:09 PM, Max Speransky [email protected]
wrote:

I'm trying to prototype a Mar-FS with S3 AWS and got something which cannot explain. So, my repo definition in config is very simple:

Testbed
1.0
/marfs

arc
s3.amazonaws.com
no
yes
S3
2147483648 # 2GB
0 # no limit (use chunk_size)
10
10
100
100
0

S3_AWS_USER
NONE
NONE
NONE
10000
What is see in debug log, when I try to copy file to /marfs/..., marfs_fuse tries to use chunked upload, but S3 doesn't support it (https://forums.aws.amazon.com/message.jspa?messageID=561616), and i'm getting 501 Not Implemented:

DEBUG=3 MARFSCONFIGRC=/home/vagrant/marfs.cfg ./marfs_fuse /marfs -d -f

DBG: Reading Config File ID[root]
DBG: Config File /root/.awsAuth
FUSE library version: 2.9.2
nullpath_ok: 0
nopath: 0
utime_omit_ok: 0
unique: 1, opcode: INIT (26), nodeid: 0, insize: 56, pid: 0
INIT: 7.22
flags=0x0000f7fb
max_readahead=0x00020000
INIT: 7.19
flags=0x00000020
max_readahead=0x00020000
max_write=0x08000000
max_background=0
congestion_threshold=0
unique: 1, success, outsize: 40
unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 47, pid: 15331
LOOKUP /source
getattr /source
NODEID: 2
unique: 2, success, outsize: 144
unique: 3, opcode: LOOKUP (1), nodeid: 2, insize: 53, pid: 15331
LOOKUP /source/testfile.bin
getattr /source/testfile.bin
NODEID: 3
unique: 3, success, outsize: 144
unique: 4, opcode: OPEN (14), nodeid: 3, insize: 48, pid: 15331
open flags: 0x8001 /source/testfile.bin
here, without content-length
open[140693330726592] flags: 0x8001 /source/testfile.bin
unique: 4, success, outsize: 32
unique: 5, opcode: SETATTR (4), nodeid: 3, insize: 128, pid: 15331
truncate /source/testfile.bin 0
getattr /source/testfile.bin
unique: 5, success, outsize: 120
unique: 6, opcode: REMOVEXATTR (24), nodeid: 3, insize: 53, pid: 15331
removexattr /source/testfile.bin security.ima
unique: 6, error: -61 (No data available), outsize: 16
unique: 7, opcode: GETXATTR (22), nodeid: 3, insize: 68, pid: 15331
getxattr /source/testfile.bin security.capability 0
DBG: Request Time: Fri, 17 Jun 2016 16:26:31 +0000
DBG: StrToSign:
PUT

Fri, 17 Jun 2016 16:26:31 +0000
/xxx/arc/ver.001_003/ns.admins/F___/inode.0000001138/md_ctime.20160617_162538+0000_0/obj_ctime.20160617_162631+0000_0/unq.0/chnksz.80000000/chnkno.0
DBG: Signature: n7W5l5qBlEqjCssP7qZOmoRLaFU=
DBG: aws_curl_enter: 'aws4c.c', line 2084

Hostname was NOT found in DNS cache unique: 7, error: -61 (No data available), outsize: 16 unique: 8, opcode: WRITE (16), nodeid: 3, insize: 131152, pid: 15331 write[140693330726592] 131072 bytes to 0 flags: 0x8001
Trying 54.231.17.64...
Connected to s3.amazonaws.com (54.231.17.64) port 80 (#0) > PUT /xxx/arc/ver.001_003/ns.admins/F___/inode.0000001138/md_ctime.20160617_162538+0000_0/obj_ctime.20160617_162631+0000_0/unq.0/chnksz.80000000/chnkno.0 HTTP/1.1 Host: s3.amazonaws.com Accept: / Date: Fri, 17 Jun 2016 16:26:31 +0000 Authorization: AWS zzzzzz= Transfer-Encoding: chunked Expect: 100-continue
< HTTP/1.1 501 Not Implemented
< x-amz-request-id: E9B2C441E4092C6D
< x-amz-id-2: 7vs5C6DC9/RT+zzzzz=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Fri, 17 Jun 2016 16:32:56 GMT
< Connection: close

Server AmazonS3 is not blacklisted < Server: AmazonS3 <
Closing connection 0
DBG: Return Code: 0
DBG: aws_curl_exit: 'aws4c.c', line 2270

unique: 8, error: -110 (Connection timed out), outsize: 16
unique: 9, opcode: FLUSH (25), nodeid: 3, insize: 64, pid: 15331
unique: 9, error: -38 (Function not implemented), outsize: 16
unique: 10, opcode: RELEASE (18), nodeid: 3, insize: 64, pid: 0
release[140693330726592] flags: 0x8001
unique: 10, success, outsize: 16
^CNotImplementedA header you provided implies functionality that is not implemented

Transfer-EncodingE9B2C441E4092C6D7vs5C6DC9/RT+wuQTWhQEpSPDyXaevk7haTfSKbt79bJCjBRMMQUqyytXFHatEWbMcPt1EqXE+A=

Is it a configuration issue or real S3 AWS doesn't supported ?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@makcuk
Copy link
Author

makcuk commented Jun 21, 2016

Hi Jeff,

Thank you for your reply. I tried to build 'cpp' branch of pftool but gave up quick :) look like that branch is under active development and not all library targets are under autoconf control. Nevertheless, I used fakes3 to simulate S3 and complete my test scenarios. Also testing produced two questions:

  1. Is it possible to use other Copytool, not pftool, to perform copy without modifying sources ? I didn't see a way to configure this.

  2. Trying to understand chunking deeper:

For writes through fuse, the ultimate size of the object isn't known ahead of time, so we have to use chunked transfer-encoding instead of providing an explicit content-length in the PUT headers.

I see that in context of object_stream it's possible to do stat() call against source file pathname, that will give a size for PUT. Is this architecturally wrong ?

@jti-lanl
Copy link
Contributor

Hi Max,

pftool should build reliably, but needs some context. In the MarFS_Install document, there's an illustration of the (ugly) invocation of "configure", to build pftool for MarFS.

[Just noticed: you need to run './autogen' before configuring pftool for the first time. I'll add that to the document.]

(1) You can always just copy to/from a MarFS fuse mount using 'cp' or 'rsync'. (This isn't recommend for production systems, without some extra measures to protect against consequences of someone rooting the box.) You can also run multiple such copies in parallel.

(2) pftool internally calls the libmarfs function marfs_open_at_offset(). This function shouldn't be used unless, like pftool, you have some understanding of MarFS chunking (different concept from chunked transfer-encoding). If you want to take on that responsibility, you could use that function to get MarFS to put content-lengths into PUT requests. If you want to go this route, you would probably first call get_chunksize(), giving it the total size of the file you intend to write. It will return a chunksize. Your calls to marfs_open_at_offset() should then always be at multiples of that chunksize, and the total amount of writes you do on each resulting file-handle should always add up to that chunksize (except perhaps in the final chunk, at the largest offset).

Thanks,
Jeff

On Jun 21, 2016, at 4:32 AM, Max Speransky [email protected]
wrote:

Hi Jeff,

Thank you for your reply. I tried to build 'cpp' branch of pftool but gave up quick :) look like that branch is under active development and not all library targets are under autoconf control. Nevertheless, I used fakes3 to simulate S3 and complete my test scenarios. Also testing produced two questions:

  1. Is it possible to use other Copytool, not pftool, to perform copy without modifying sources ? I didn't see a way to configure this.

  2. Trying to understand chunking deeper:

For writes through fuse, the ultimate size of the object isn't known ahead of time, so we have to use chunked transfer-encoding instead of providing an explicit content-length in the PUT headers.

I see that in context of object_stream it's possible to do stat() call against source file pathname, that will give a size for PUT. Is this architecturally wrong ?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants