Load time optimization: download search index asynchronously; make cacheable #2700

xal-0 · 2025-05-06T23:41:25Z

If you have not visited https://docs.julialang.org for 10 minutes, there is a very noticeable delay as all of the assets hosted by GitHub are either validated or redownloaded. The worst offender is search_index.js, which is 4.4 MiB (1 MiB gzipped).

This PR does two things to make the situation a little better:

We block rendering until the search index is fully loaded, even though search.js handles asynchronously loading the index just fine. This is a simple matter of making the script tag async.
The search index becomes stale absurdly quickly. Unfortunately, GitHub pages sets Cache-Control: max-age=600 on everything it hosts, so we're mostly stuck with this situation there. We can still make it a little better for docs hosted elsewhere by putting the search index's hash into its filename, so it can stay fresh indefinitely.

Before, with slow internet and render blocking network requested boxed in red:

After:

fingolfin · 2025-05-07T13:59:44Z

Thank you, sounds useful!

Prettier fails (note that we use Prettier v2 not v3)

Also needs a changelog entry

CHANGELOG.md

fingolfin · 2025-05-07T23:25:09Z

CHANGELOG.md

@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 * Refactored `deploydocs` internals to allow dispatch on the `versions` keyword, intended as a non-public API for DocumenterVitepress which cannot use the default versioning mechanism during deployment ([#2695]).
 * Use different banners for dev and unreleased docs ([#2382], [#2682])
+* The search index now loads asynchronously, and can be cached indefinitely. ([#2702], [#2700])


@xal-0 those PRs also need to be added to the list at the end of the file (make sure to pull our changes first :-) )

Note: one can run make changelog, which will update the references in CHANGELOG.md automatically.

fingolfin · 2025-05-07T23:26:05Z

src/html/HTMLWriter.jl

+        generate_index!(ctx, nn)
+    end
+
+    # Make an attempt at avoiding spurious changes to the search index hash.


Would this be accurate?

Suggested change

# Make an attempt at avoiding spurious changes to the search index hash.

# Sort the index to avoid spurious changes to the search index hash caused

# by difference in the order of entries

fingolfin

Looks good to me, thank you

mortenpi · 2025-05-08T03:22:39Z

src/html/HTMLWriter.jl

@@ -800,9 +799,34 @@ function render(doc::Documenter.Document, settings::HTML = HTML())
        copy_asset("themes/$(theme).css", doc)
    end

-    size_limit_successes = map(collect(keys(doc.blueprint.pages))) do page
+    function page_navnode(page)


Bikeshed, but let's move this out of the function body perhaps. I am not a huge fan of declaring functions within functions. I think it just needs to take doc::Document as the first argument?

mortenpi · 2025-05-08T03:27:22Z

src/html/HTMLWriter.jl

+    return foreach(getpage(ctx, navnode).mdast.children) do node
+        rec = searchrecord(ctx, navnode, node)
+        if !isnothing(rec)
+            push!(ctx.search_index, rec)
+        end
+    end


foreach doesn't return anything, so it looks a bit weird to me.

Suggested change

return foreach(getpage(ctx, navnode).mdast.children) do node

rec = searchrecord(ctx, navnode, node)

if !isnothing(rec)

push!(ctx.search_index, rec)

end

end

foreach(getpage(ctx, navnode).mdast.children) do node

rec = searchrecord(ctx, navnode, node)

if !isnothing(rec)

push!(ctx.search_index, rec)

end

end

return nothing

Also, as a side note: I don't mind having foreach-es, but I don't really find them particularly idiomatic. I'd just write this as for node in getpage(ctx, navnode).mdast.children ... end.

mortenpi · 2025-05-08T03:55:38Z

src/html/HTMLWriter.jl

+    foreach(keys(doc.blueprint.pages)) do page
+        nn, idx = page_navnode(page)
+        @debug "Indexing $(page) [$(repr(idx))]"
+        generate_index!(ctx, nn)
+    end


Mmm. I don't think this works. A lot of the pushing into .search_index happens during the DOM generation deep in the call tree, i.e. downstream from render_page. So right now a bunch of stuff is missing (e.g. docstrings).

But yea, we have a chicken-and-egg situation here, where in order to get the full search index, we need to render everything, but we don't know how to render it because we don't know the search index hash yet, since we haven't generated the index yet.

I do think that with some reorganization of the code, we could somehow generate all the DOM first, and then inject the search index file name later. Although one nice property of the current structure is that we don't need to keep the DOM of all the pages in memory at once, which I am not sure how we could keep.

What I might suggest (unless you have a good simple idea of how to tackle this) is to undo the hashing related changes for now. There's also another issue with it -- MultiDocumenter, for example, relies on the search_index.js file being where it is, so we may need to think about this change a bit more anyway. But we should definitely get the front end loading changes in.

mortenpi

The hashing logic needs changes: #2700 (comment)

fingolfin · 2025-05-08T23:49:56Z

I see the value for inserting the hash but I agree with @mortenpi here, it is better to do this separately, and then differently.

A simpler solution than refactoring the rendering pipeline etc. would be to use simple post-processing: after the HTML docs have been generated, compute the checksum of the (by now complete) search_index.js. Then inject the hash into each HTML file using a regex to pick out the relevant places. Yes, it is a bit of a hack, but it is very easy to implement and understand.

I believe you don't need to rename search_index.js -- using an URL of the form search_index.js?HASH should do the trick as well.

Hetarth02 · 2025-05-09T06:21:13Z

asynchronously loading the index

If we do this, then wouldn't user face a problem when trying to search while the index is loading asynchronously?

For example, as soon as the page loads I as a user try to search something. User presses ctrl + / there are 2 scenarios,

The modal UI doesn't trigger(which is unlikely think it's pretty independent from the logic of search) and user has to wait or try repeatedly.
The modal UI triggers but it doesn't do anything because the search index is getting loaded in the background.

xal-0 added 3 commits May 6, 2025 16:24

Include content in search_index.js path; make script async

e8bf846

Listen for the load event instead of polling for search index

5e7c196

Fix Julia formatting issues

7da80d6

xal-0 mentioned this pull request May 7, 2025

Page load time: first visit and subsequent visits #2702

Open

12 tasks

Run code formatters on JS and Julia

f80982f

xal-0 added 2 commits May 7, 2025 10:59

Use right version of prettier this time

6c98c4f

Add changelog entry

cae8623

mortenpi reviewed May 7, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

mortenpi and others added 2 commits May 8, 2025 09:46

Update CHANGELOG.md

69f530a

Merge branch 'master' into async-search-index

3cea893

fingolfin reviewed May 7, 2025

View reviewed changes

fingolfin approved these changes May 7, 2025

View reviewed changes

mortenpi reviewed May 8, 2025

View reviewed changes

mortenpi requested changes May 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Load time optimization: download search index asynchronously; make cacheable #2700

Load time optimization: download search index asynchronously; make cacheable #2700

Uh oh!

xal-0 commented May 6, 2025

Uh oh!

fingolfin commented May 7, 2025

Uh oh!

Uh oh!

fingolfin May 7, 2025

Uh oh!

mortenpi May 8, 2025

Uh oh!

fingolfin May 7, 2025

Uh oh!

fingolfin left a comment

Uh oh!

mortenpi May 8, 2025

Uh oh!

mortenpi May 8, 2025

Uh oh!

mortenpi May 8, 2025

Uh oh!

mortenpi left a comment

Uh oh!

fingolfin commented May 8, 2025

Uh oh!

Hetarth02 commented May 9, 2025

Uh oh!

Uh oh!

	# Make an attempt at avoiding spurious changes to the search index hash.
	# Sort the index to avoid spurious changes to the search index hash caused
	# by difference in the order of entries

Load time optimization: download search index asynchronously; make cacheable #2700

Are you sure you want to change the base?

Load time optimization: download search index asynchronously; make cacheable #2700

Uh oh!

Conversation

xal-0 commented May 6, 2025

Uh oh!

fingolfin commented May 7, 2025

Uh oh!

Uh oh!

fingolfin May 7, 2025

Choose a reason for hiding this comment

Uh oh!

mortenpi May 8, 2025

Choose a reason for hiding this comment

Uh oh!

fingolfin May 7, 2025

Choose a reason for hiding this comment

Uh oh!

fingolfin left a comment

Choose a reason for hiding this comment

Uh oh!

mortenpi May 8, 2025

Choose a reason for hiding this comment

Uh oh!

mortenpi May 8, 2025

Choose a reason for hiding this comment

Uh oh!

mortenpi May 8, 2025

Choose a reason for hiding this comment

Uh oh!

mortenpi left a comment

Choose a reason for hiding this comment

Uh oh!

fingolfin commented May 8, 2025

Uh oh!

Hetarth02 commented May 9, 2025

Uh oh!

Uh oh!