HTTP Caching in v3 - Default Policy and Configuration Options #356
dillonkearns
started this conversation in
Ideas
Replies: 1 comment 2 replies
-
While I'm not using Maybe you've already thought of it, but another approach to having defaults and functions to override them is to have |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
'm working on the default HTTP caching behavior, as well as configuration options, for the HTTP API in elm-pages (
BackendTask.Http
, previously calledDataSource.Http
in V2). This NPM package gives me a lot of what I need out of the box and is well-maintained: https://github.com/npm/make-fetch-happenOne design challenge I'm running into is that a lot of these semantics for caching HTTP responses are designed for clients (browsers), but when you're talking about caching things for consumption on servers I wonder if those semantics should change at all.
When running a local dev server it might actually be reasonable to use the default caching semantics (unless the user explicitly opts out of them and opts for more aggressive freshness revalidation), but in prod you probably want to err more on the side of making sure things are fresh.
Definition of freshness for Cached HTTP Responses
What does freshness mean for an HTTP response? Servers can return response headers like
cache-control
andexpires
to explicitly set the policy for how long data stays fresh and when it needs to be revalidated to make sure it's fresh. In some cases this might actually represent how often the data can change - for example, if there is data that changes every hour, or every week, then the server could send headers that accurately describe exactly when it will expire. But often servers set these as heuristics, for example saying that you can cache something for 5 minutes locally without checking for freshness (using Conditional Requests to check if the resource is stale and skip returning the actual data if it is still fresh so it can be fetched from local cache instead and avoid the extra network load).There are also implicit heuristics for freshness when a server doesn't explicitly describe it. It's up to the client to decide which heuristics to use, but the recommendation is to consider a resource as stale after about 10% of the time since it was last modified.
Should heuristics be used for any defaults?
My gut feeling is that for a production build, it might make sense to bypass both of these heuristics (implicit, and explicit policies defined by server response headers that may represent heuristics). This seems like a reasonable default to me (allow conditional requests but always check for freshness in production builds). Then the user can modify their production HTTP caching configuration to bypass this setting if they prefer.
For the dev server and debug builds, I think it is reasonable to follow the explicit caching policy set by HTTP response headers. I am a little unsure about the implicit heuristics, though. For many of my projects, I don't want to barrage the APIs I consume (and potentially risk getting rate-limited) when I'm doing development, and with elm-pages hot reloading of Route data it's easy for this to happen. On the other hand, it could be frustrating if users see stale data. Getting rate limited is likely to happen and frustrating as well. So maybe it's a reasonable default here to use the default caching heuristics (both implicit and explicit), and let users configure it explicitly to opt out of this.
Global vs. Per-Request Configuration
My initial thought is that there are 3 places to configure HTTP caching options:
The global configuration options would live as JSON keys in the
elm-pages.config.mjs
file. The per-request configuration would be part of the Elm function calls forBackendTask.Http.request
, etc.Implications for Determinism
elm-pages
v1 and v2 were focused on static site generation - you do a build step, it resolves the data for each page, and outputs a set of files.With v3 introducing dynamic SSR (rendering pages at request-time), and being able to perform effectful operations through
BackendTask
s (calledDataSource
s in v2), the semantics have changed. Before v3, the idea was that resolving data should always be deterministic. If you hit an API, it would aggressively cache that HTTP response data when running the build step which helped ensure that you could think of an HTTP request as a resource at a fixed point in time, not a side-effect that could be run multiple times and potentially return different data each time it is called.With v3, that has changed because you may want to fire off a request to an API that sends an email, updates the number of views of a page, etc. (does some side-effect in the world).
So while it's now important to be able to rely on an HTTP request being performed every time it's called for "unsafe" HTTP methods (methods besides
GET
andHEAD
), I wonder if it's okay or even desirable to avoid performing safe GET requests more than once within a build. If you reference a GET HTTPBackendTask
to the same endpoint in several different Routes that get run in the course of anelm-pages build
in v3, maybe some amount of HTTP caching for production builds can avoid making repeat requests in quick succession.Feedback
I'm writing my thoughts here since there are a lot of references that are good to have linked to and a lot of design decisions that I want to document. I'd also love to hear any ideas before I finalize these APIs for the v3 release. Thanks for reading!
Beta Was this translation helpful? Give feedback.
All reactions