Skip to content

Commit b413c39

Browse files
authored
Merge pull request #99 from mattlqx/catalog
Add catalog functionality to prevent re-downloads
2 parents 8321380 + caf6b5b commit b413c39

File tree

4 files changed

+66
-12
lines changed

4 files changed

+66
-12
lines changed

README.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ An Amazon Web Services account and something in S3 to fetch.
1010

1111
Multi-part S3 uploads do not put the MD5 of the content in the ETag header. If x-amz-meta-digest is provided in User-Defined Metadata on the S3 Object it is processed as if it were a Digest header (RFC 3230).
1212

13-
The MD5 of the local file will be checked against the MD5 from x-amz-meta-digest if it is present. It not it will check against the ETag. If there is no match or the local file is absent it will be downloaded.
13+
The MD5 of the local file will be checked against the MD5 from x-amz-meta-digest if it is present. If not it will check against the ETag. If there is no match or the local file is absent it will be downloaded.
14+
15+
By default, a catalog file in Chef's cache path will be kept for all downloaded files tracking their etag and md5 at time of download. If either of these don't match, the file will be downloaded. To disable this behavior, set `node['s3_file']['use_catalog']` to `false`.
1416

1517
If credentials are not provided, s3_file will attempt to use the first instance profile associated with the instance. See documentation at http://docs.aws.amazon.com/IAM/latest/UserGuide/instance-profiles.html for more on instance profiles.
1618

@@ -45,7 +47,7 @@ Example:
4547
decryption_key "my SHA256 digest key"
4648
decrypted_file_checksum "SHA256 hex digest of decrypted file"
4749
end
48-
50+
4951
#MD5 and Multi-Part Upload
5052
s3_file compares the MD5 hash of a local file, if present, and the ETag header of the S3 object. If they do not match, then the remote object will be downloaded and notifiations will be fired.
5153

attributes/default.rb

+3
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,5 @@
11
default['s3_file']['mime-types']['version'] = '2.6.2'
22
default['s3_file']['rest-client']['version'] = '1.7.3'
3+
4+
# Keep a catalog of each downloaded file's etag and md5 at time of download.
5+
default['s3_file']['use_catalog'] = true

libraries/s3_file.rb

+24-3
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,15 @@ def self.verify_sha256_checksum(checksum, file)
293293

294294
def self.verify_md5_checksum(checksum, file)
295295
s3_md5 = checksum
296+
local_md5 = buffered_md5_checksum(file)
297+
298+
Chef::Log.debug "md5 of remote object is #{s3_md5}"
299+
Chef::Log.debug "md5 of local object is #{local_md5.hexdigest}"
300+
301+
local_md5.hexdigest == s3_md5
302+
end
303+
304+
def self.buffered_md5_checksum(file)
296305
local_md5 = Digest::MD5.new
297306

298307
# buffer the checksum which should save RAM consumption
@@ -301,11 +310,23 @@ def self.verify_md5_checksum(checksum, file)
301310
local_md5.update buffer
302311
end
303312
end
313+
local_md5
314+
end
304315

305-
Chef::Log.debug "md5 of remote object is #{s3_md5}"
306-
Chef::Log.debug "md5 of local object is #{local_md5.hexdigest}"
316+
def self.verify_etag(etag, file)
317+
catalog.fetch(file, nil) == etag
318+
end
307319

308-
local_md5.hexdigest == s3_md5
320+
def self.catalog_path
321+
File.join(Chef::Config[:file_cache_path], 's3_file_etags.json')
322+
end
323+
324+
def self.catalog
325+
File.exist?(catalog_path) ? JSON.parse(IO.read(catalog_path)) : {}
326+
end
327+
328+
def self.write_catalog(data)
329+
File.open(catalog_path, 'w', 0644) { |f| f.write(JSON.dump(data)) }
309330
end
310331

311332
def self.client

providers/default.rb

+35-7
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,11 @@
3434
end
3535

3636
if ::File.exists?(new_resource.path)
37+
s3_etag = S3FileLib::get_md5_from_s3(new_resource.bucket, new_resource.s3_url, remote_path, aws_access_key_id, aws_secret_access_key, token)
38+
3739
if decryption_key.nil?
3840
if new_resource.decrypted_file_checksum.nil?
39-
s3_md5 = S3FileLib::get_md5_from_s3(new_resource.bucket, new_resource.s3_url, remote_path, aws_access_key_id, aws_secret_access_key, token)
40-
41-
if S3FileLib::verify_md5_checksum(s3_md5, new_resource.path)
41+
if S3FileLib::verify_md5_checksum(s3_etag, new_resource.path)
4242
Chef::Log.debug 'Skipping download, md5sum of local file matches file in S3.'
4343
download = false
4444
end
@@ -59,6 +59,16 @@
5959
end
6060
end
6161
end
62+
63+
# Don't download if content and etag match prior download
64+
if node['s3_file']['use_catalog']
65+
catalog_data = S3FileLib::catalog.fetch(new_resource.path, nil)
66+
existing_file_md5 = S3FileLib::buffered_md5_checksum(new_resource.path)
67+
if catalog_data && existing_file_md5 == catalog_data['local_md5'] && s3_etag == catalog_data['etag']
68+
Chef::Log.debug 'Skipping download, md5 of local file and etag matches prior download.'
69+
download = false
70+
end
71+
end
6272
end
6373

6474
if download
@@ -78,16 +88,34 @@
7888
raise e
7989
end
8090

81-
::FileUtils.mv(decrypted_file.path, new_resource.path)
91+
downloaded_file = decrypted_file
8292
else
83-
::FileUtils.mv(response.file.path, new_resource.path)
93+
downloaded_file = response.file
94+
end
95+
96+
# Write etag and md5 to catalog for future reference
97+
if node['s3_file']['use_catalog']
98+
catalog = S3FileLib::catalog
99+
catalog[new_resource.path] = {
100+
'etag' => response.headers[:etag].gsub('"',''),
101+
'local_md5' => S3FileLib::buffered_md5_checksum(downloaded_file.path)
102+
}
103+
S3FileLib::write_catalog(catalog)
104+
end
105+
106+
# Take ownership and permissions from existing object
107+
if ::File.exist?(new_resource.path)
108+
stat = ::File::Stat.new(new_resource.path)
109+
::FileUtils.chown(stat.uid, stat.gid, downloaded_file)
110+
::FileUtils.chmod(stat.mode, downloaded_file)
84111
end
112+
::FileUtils.mv(downloaded_file.path, new_resource.path)
85113
end
86114

87115
f = file new_resource.path do
88116
action :create
89-
owner new_resource.owner || ENV['user']
90-
group new_resource.group || ENV['user']
117+
owner new_resource.owner || ENV['USER']
118+
group new_resource.group || ENV['USER']
91119
mode new_resource.mode || '0644'
92120
end
93121

0 commit comments

Comments
 (0)