Fix #73 - Change cache to take header changes into account #78

TheCloudlessSky · 2015-08-06T15:49:15Z

As per #73, the current caching implementation doesn't take header changes into account. This is still a WIP until the following is done:

Sanity checks
Tests
Update docs

Cache File Format Change

The cache file format has been updated to include the headers:

{
  "<file.s3.path>": {
    "etag": "<etag>",
    "headers": {
      "header1": "header1-value",
      "header2": "header2-value"
    }
  }
}

When loading the cache file, it'll upgrade the old format to the new format (backwards compatibility).

Implementation

From what I found with the current implementation, there are two states that are affected: cache and skip. The skip state does the same work as the cache comparison, except it compares with the current headers of the actual S3 object.

The cache() stream can now take an option to include headers as part of the cache check:

.pipe(cache({ headers: true }))

For the cache state, the etag and all headers set on file.s3.headers are compared against the cached values. If a header is added, deleted or modified, the file's state is not set to cache.

For the skip state, the etag and all headers set on file.s3.headers are compared against the
normalized response headers. If a header is added, deleted or modified, the file's state is not set to skip. However, because the x-amz-acl header uses a name of a canned ACL, it can't compared since the canned ACL is transformed into a full ACL on the object. Using client.getObjectAcl() could be used, to get the full ACL and then map it back to the canned name, but that seems to difficult. So, if the x-amz-acl changes, the user should use the force option on the publish() stream.

Also, the cache headers option is ignored when doing the comparison for the skip state because skip is used regardless of caching being enabled/disabled.

Questions

The current implementation always reads from the cache file but the cache() stream isn't used. Should this be changed to not read from the cache unless the cache() option is used? It could lead to confusing behavior with the current implementation.
In .cache() doesn't take header changes into account #73 @pgherveou mentioned specifying the headers to be compared. The problem I ran into with this is what about the skip state? If the user doesn't use cache, we still should be comparing headers so that the file isn't skipped when headers change. As it stands right now, having the headers option on cache() doesn't totally make sense (since it affects the cache and skip states). Should we remove the option all together and compare all headers for both the cache and skip states?

😄

pgherveou · 2015-08-08T09:10:06Z

Hey thks for taking care of this

for your first question
I think this is a limitation of the way its currently implemented. maybe a better way would be to break down into more streams one that prepare the file with the headers, one that filter cached file, one that publish and finally one that add to the cache

not sure to fully understand the second one but go for the simplest option if this make your life easier ;)

TheCloudlessSky added 4 commits August 5, 2015 12:39

Early exit instead of long condition.

cf7737d

Move cache checking to methods.

3039801

Add lodash and save headers to cache.

afe02f1

Compare headers for caching/skipping files.

d6cac7c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #73 - Change cache to take header changes into account #78

Fix #73 - Change cache to take header changes into account #78

TheCloudlessSky commented Aug 6, 2015

pgherveou commented Aug 8, 2015

Fix #73 - Change cache to take header changes into account #78

Are you sure you want to change the base?

Fix #73 - Change cache to take header changes into account #78

Conversation

TheCloudlessSky commented Aug 6, 2015

Cache File Format Change

Implementation

Questions

pgherveou commented Aug 8, 2015