Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #73 - Change cache to take header changes into account #78

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

TheCloudlessSky
Copy link

As per #73, the current caching implementation doesn't take header changes into account. This is still a WIP until the following is done:

  • Sanity checks
  • Tests
  • Update docs

Cache File Format Change

The cache file format has been updated to include the headers:

{
  "<file.s3.path>": {
    "etag": "<etag>",
    "headers": {
      "header1": "header1-value",
      "header2": "header2-value"
    }
  }
}

When loading the cache file, it'll upgrade the old format to the new format (backwards compatibility).

Implementation

From what I found with the current implementation, there are two states that are affected: cache and skip. The skip state does the same work as the cache comparison, except it compares with the current headers of the actual S3 object.

The cache() stream can now take an option to include headers as part of the cache check:

.pipe(cache({ headers: true }))

For the cache state, the etag and all headers set on file.s3.headers are compared against the cached values. If a header is added, deleted or modified, the file's state is not set to cache.

For the skip state, the etag and all headers set on file.s3.headers are compared against the
normalized response headers. If a header is added, deleted or modified, the file's state is not set to skip. However, because the x-amz-acl header uses a name of a canned ACL, it can't compared since the canned ACL is transformed into a full ACL on the object. Using client.getObjectAcl() could be used, to get the full ACL and then map it back to the canned name, but that seems to difficult. So, if the x-amz-acl changes, the user should use the force option on the publish() stream.

Also, the cache headers option is ignored when doing the comparison for the skip state because skip is used regardless of caching being enabled/disabled.

Questions

  1. The current implementation always reads from the cache file but the cache() stream isn't used. Should this be changed to not read from the cache unless the cache() option is used? It could lead to confusing behavior with the current implementation.
  2. In .cache() doesn't take header changes into account #73 @pgherveou mentioned specifying the headers to be compared. The problem I ran into with this is what about the skip state? If the user doesn't use cache, we still should be comparing headers so that the file isn't skipped when headers change. As it stands right now, having the headers option on cache() doesn't totally make sense (since it affects the cache and skip states). Should we remove the option all together and compare all headers for both the cache and skip states?

😄

@pgherveou
Copy link
Owner

Hey thks for taking care of this

for your first question
I think this is a limitation of the way its currently implemented. maybe a better way would be to break down into more streams one that prepare the file with the headers, one that filter cached file, one that publish and finally one that add to the cache

not sure to fully understand the second one but go for the simplest option if this make your life easier ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants