HTTP Caching in v3 - Default Policy and Configuration Options #356

dillonkearns · 2023-01-05T19:04:47Z

dillonkearns
Jan 5, 2023
Maintainer

'm working on the default HTTP caching behavior, as well as configuration options, for the HTTP API in elm-pages (BackendTask.Http, previously called DataSource.Http in V2). This NPM package gives me a lot of what I need out of the box and is well-maintained: https://github.com/npm/make-fetch-happen

One design challenge I'm running into is that a lot of these semantics for caching HTTP responses are designed for clients (browsers), but when you're talking about caching things for consumption on servers I wonder if those semantics should change at all.

When running a local dev server it might actually be reasonable to use the default caching semantics (unless the user explicitly opts out of them and opts for more aggressive freshness revalidation), but in prod you probably want to err more on the side of making sure things are fresh.

Definition of freshness for Cached HTTP Responses

What does freshness mean for an HTTP response? Servers can return response headers like cache-control and expires to explicitly set the policy for how long data stays fresh and when it needs to be revalidated to make sure it's fresh. In some cases this might actually represent how often the data can change - for example, if there is data that changes every hour, or every week, then the server could send headers that accurately describe exactly when it will expire. But often servers set these as heuristics, for example saying that you can cache something for 5 minutes locally without checking for freshness (using Conditional Requests to check if the resource is stale and skip returning the actual data if it is still fresh so it can be fetched from local cache instead and avoid the extra network load).

There are also implicit heuristics for freshness when a server doesn't explicitly describe it. It's up to the client to decide which heuristics to use, but the recommendation is to consider a resource as stale after about 10% of the time since it was last modified.

Should heuristics be used for any defaults?

My gut feeling is that for a production build, it might make sense to bypass both of these heuristics (implicit, and explicit policies defined by server response headers that may represent heuristics). This seems like a reasonable default to me (allow conditional requests but always check for freshness in production builds). Then the user can modify their production HTTP caching configuration to bypass this setting if they prefer.

For the dev server and debug builds, I think it is reasonable to follow the explicit caching policy set by HTTP response headers. I am a little unsure about the implicit heuristics, though. For many of my projects, I don't want to barrage the APIs I consume (and potentially risk getting rate-limited) when I'm doing development, and with elm-pages hot reloading of Route data it's easy for this to happen. On the other hand, it could be frustrating if users see stale data. Getting rate limited is likely to happen and frustrating as well. So maybe it's a reasonable default here to use the default caching heuristics (both implicit and explicit), and let users configure it explicitly to opt out of this.

Global vs. Per-Request Configuration

My initial thought is that there are 3 places to configure HTTP caching options:

Global dev mode configuration
Global prod mode configuration
Per-request configuration

The global configuration options would live as JSON keys in the elm-pages.config.mjs file. The per-request configuration would be part of the Elm function calls for BackendTask.Http.request, etc.

Implications for Determinism

elm-pages v1 and v2 were focused on static site generation - you do a build step, it resolves the data for each page, and outputs a set of files.

With v3 introducing dynamic SSR (rendering pages at request-time), and being able to perform effectful operations through BackendTasks (called DataSources in v2), the semantics have changed. Before v3, the idea was that resolving data should always be deterministic. If you hit an API, it would aggressively cache that HTTP response data when running the build step which helped ensure that you could think of an HTTP request as a resource at a fixed point in time, not a side-effect that could be run multiple times and potentially return different data each time it is called.

With v3, that has changed because you may want to fire off a request to an API that sends an email, updates the number of views of a page, etc. (does some side-effect in the world).

So while it's now important to be able to rely on an HTTP request being performed every time it's called for "unsafe" HTTP methods (methods besides GET and HEAD), I wonder if it's okay or even desirable to avoid performing safe GET requests more than once within a build. If you reference a GET HTTP BackendTask to the same endpoint in several different Routes that get run in the course of an elm-pages build in v3, maybe some amount of HTTP caching for production builds can avoid making repeat requests in quick succession.

Feedback

I'm writing my thoughts here since there are a lot of references that are good to have linked to and a lot of design decisions that I want to document. I'd also love to hear any ideas before I finalize these APIs for the v3 release. Thanks for reading!

j-maas · 2023-01-08T14:03:28Z

j-maas
Jan 8, 2023

While I'm not using DataSource.Http, this all sounds very logical. It feels natural that safe HTTP methods get cached, and that the build fetches everything anew.

Maybe you've already thought of it, but another approach to having defaults and functions to override them is to have BackendTask.getUncached and BackendTask.getCached. That way the user has to make deliberate choices. You can document that when they are unsure, they can start with the cached option and if they notice stale data switch to the uncached one.

2 replies

dillonkearns Jan 8, 2023
Maintainer Author

Thank you for the feedback @j-maas!

That's a great idea about having a getUncached helper. I've actually just gone in and revamped the Http API quite a bit, removing a lot of cruft. I'm trying to get it to be very easy and intuitive to use the 99% use cases, while exposing a minimal API for doing everything else that doesn't add a ton of noise. Here's the current API:

https://elm-doc-preview.netlify.app/BackendTask-Http?repo=dillonkearns%2Felm-pages-v3-beta&version=91d009e70d98774a6b2c9ffe3cb11b002e0260ea

I removed the expectStringResponse and expectBytesResponse helpers and the Response type because I was able to accomplish the same thing in a way that I find to be more simple and composable by introducing withMetadata.

I also renamed what was previously just get which took a JSON Decoder and renamed it to getJson, adding get which takes an Expect to make it more general-purpose (and more like the elm/http API). I figure that 99% of the time users are just making a GET request to a JSON API so this makes that easier to reach for.

On the other hand, if we start to add to many helpers then we get into a problem with combinations of them - for example, do we add getJsonUncached and getUncached, or should users just drop in to the more flexible request function? I'm not sure what the right combination is here. The possibilities I see are:

Leave it as it currently is (get, getJson, and users need to use request to override the caching strategy)
Remove getJson, just leave the general purpose get
Remove getJson, leave get, and add getUncached as well

Thinking about it a bit more, I also realize that the caching options aren't relevant to non-GET requests (except for HEAD, which is very rarely used). So I wonder if it would even make sense to have something like this:

get :
    String
    -> Expect a
    -> Maybe { cache : CacheStrategy, retries : Int, timeoutInMs }
    -> BackendTask (Catchable Error) a

request :
    { url : String
    , method : String
    , headers : List ( String, String )
    , body : Body
    , retries : Int
    , timeoutInMs : Maybe Int
    }
    -> Expect a
    -> BackendTask (Catchable Error) a

It does seem very confusing to be able to pass the CacheStrategy in to request when in most cases it won't have any effect. So I really like the idea of focusing in on that for choosing the set of functions to expose in the Http API.

dillonkearns Jan 8, 2023
Maintainer Author

Alright, I made some changes and went with my intuition with a good set of tradeoffs. Here's what I ended up with:

https://elm-doc-preview.netlify.app/BackendTask-Http?repo=dillonkearns%2Felm-pages-v3-beta&version=fd1c27e6b886e857e3ca62bcf29d8f41a957f058

get : String -> Expect a -> BackendTask (Catchable Error) a

getJson : String -> Decoder a -> BackendTask (Catchable Error) a

getWithOptions :
    { url : String
    , expect : Expect a
    , headers : List ( String, String )
    , cacheStrategy : Maybe CacheStrategy
    , retries : Maybe Int
    , timeoutInMs : Maybe Int
    , cachePath : Maybe String
    }
    -> BackendTask (Catchable Error) a

post : String -> Body -> Expect a -> BackendTask (Catchable Error) a

request :
    { url : String
    , method : String
    , headers : List ( String, String )
    , body : Body
    , retries : Maybe Int
    , timeoutInMs : Maybe Int
    }
    -> Expect a
    -> BackendTask (Catchable Error) a

Some highlights of the design:

It's only possible to configure caching options using getWithOptions - this avoids having a meaningless option that users can pass in for unsafe HTTP methods. It doesn't allow for configuring caching behavior for HEAD requests, but that's quite low-level so I think it's fine. Users could always drop down to BackendTask.Port if they really need some low-level behavior like this.
retries : Maybe Int and timeoutInMs : Maybe Int is still configurable for both getWithOptions and request
I left in simple versions of get and getJson where you can't configure caching, retries, timeouts, or add headers. I think these helpers will likely be what users reach for 95% of the time when making GET requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP Caching in v3 - Default Policy and Configuration Options #356

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

HTTP Caching in v3 - Default Policy and Configuration Options #356

dillonkearns Jan 5, 2023 Maintainer

Definition of freshness for Cached HTTP Responses

Should heuristics be used for any defaults?

Global vs. Per-Request Configuration

Implications for Determinism

Feedback

Replies: 1 comment · 2 replies

j-maas Jan 8, 2023

dillonkearns Jan 8, 2023 Maintainer Author

dillonkearns Jan 8, 2023 Maintainer Author

dillonkearns
Jan 5, 2023
Maintainer

Replies: 1 comment 2 replies

j-maas
Jan 8, 2023

dillonkearns Jan 8, 2023
Maintainer Author

dillonkearns Jan 8, 2023
Maintainer Author