Skip to content

Commit 86d15e5

Browse files
author
Craig Hagan
committed
Validate md5sum of downloaded s3 objects and retry on mismatch
Transient network errors, timeouts, Ruby Net::HTTP bugs, and other things can cause s3 downloads to be incomplete or outright incorrect. Only succeed a download if the md5 of the downloaded object matches the md5 known to s3.
1 parent 8a97160 commit 86d15e5

File tree

1 file changed

+17
-6
lines changed

1 file changed

+17
-6
lines changed

libraries/s3_file.rb

+17-6
Original file line numberDiff line numberDiff line change
@@ -120,22 +120,33 @@ def self.get_md5_from_s3(bucket, url, path, aws_access_key_id, aws_secret_access
120120
get_digests_from_s3(bucket, url, path, aws_access_key_id, aws_secret_access_key, token, region)["md5"]
121121
end
122122

123-
def self.get_digests_from_s3(bucket,url,path,aws_access_key_id,aws_secret_access_key,token, region)
124-
response = do_request("HEAD", url, bucket, path, aws_access_key_id, aws_secret_access_key, token, region)
125123

126-
etag = response.headers[:etag].gsub('"','')
127-
digest = response.headers[:x_amz_meta_digest]
124+
def self.get_digests_from_headers(headers)
125+
etag = headers[:etag].gsub('"','')
126+
digest = headers[:x_amz_meta_digest]
128127
digests = digest.nil? ? {} : Hash[digest.split(",").map {|a| a.split("=")}]
129-
130128
return {"md5" => etag}.merge(digests)
131129
end
132130

133-
def self.get_from_s3(bucket, url, path, aws_access_key_id, aws_secret_access_key, token, region = nil)
131+
def self.get_digests_from_s3(bucket,url,path,aws_access_key_id,aws_secret_access_key,token, region)
132+
response = do_request("HEAD", url, bucket, path, aws_access_key_id, aws_secret_access_key, token, region)
133+
return self.get_digests_from_headers(response.headers)
134+
end
135+
136+
def self.get_from_s3(bucket, url, path, aws_access_key_id, aws_secret_access_key, token, region = nil, verify_md5=true)
134137
response = nil
135138
retries = 5
136139
for attempts in 0..retries
137140
begin
138141
response = do_request("GET", url, bucket, path, aws_access_key_id, aws_secret_access_key, token, region)
142+
143+
if verify_md5
144+
md5 = self.get_digests_from_headers(response.headers)["md5"]
145+
if not self.verify_md5_checksum(md5,response.file.path)
146+
raise "unable to validate md5 checksum of downloaded object"
147+
end
148+
end
149+
139150
return response
140151
# break
141152
rescue client::MovedPermanently, client::Found, client::TemporaryRedirect => e

0 commit comments

Comments
 (0)