Skip to content

Load time optimization: download search index asynchronously; make cacheable #2700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

xal-0
Copy link

@xal-0 xal-0 commented May 6, 2025

If you have not visited https://docs.julialang.org for 10 minutes, there is a very noticeable delay as all of the assets hosted by GitHub are either validated or redownloaded. The worst offender is search_index.js, which is 4.4 MiB (1 MiB gzipped).

This PR does two things to make the situation a little better:

  • We block rendering until the search index is fully loaded, even though search.js handles asynchronously loading the index just fine. This is a simple matter of making the script tag async.
  • The search index becomes stale absurdly quickly. Unfortunately, GitHub pages sets Cache-Control: max-age=600 on everything it hosts, so we're mostly stuck with this situation there. We can still make it a little better for docs hosted elsewhere by putting the search index's hash into its filename, so it can stay fresh indefinitely.

Before, with slow internet and render blocking network requested boxed in red:
Screenshot 2025-05-06 at 4 23 48 PM

After:
Screenshot 2025-05-06 at 4 24 01 PM

@fingolfin
Copy link
Collaborator

Thank you, sounds useful!

Prettier fails (note that we use Prettier v2 not v3)

Also needs a changelog entry

@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

* Refactored `deploydocs` internals to allow dispatch on the `versions` keyword, intended as a non-public API for DocumenterVitepress which cannot use the default versioning mechanism during deployment ([#2695]).
* Use different banners for dev and unreleased docs ([#2382], [#2682])
* The search index now loads asynchronously, and can be cached indefinitely. ([#2702], [#2700])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xal-0 those PRs also need to be added to the list at the end of the file (make sure to pull our changes first :-) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: one can run make changelog, which will update the references in CHANGELOG.md automatically.

generate_index!(ctx, nn)
end

# Make an attempt at avoiding spurious changes to the search index hash.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be accurate?

Suggested change
# Make an attempt at avoiding spurious changes to the search index hash.
# Sort the index to avoid spurious changes to the search index hash caused
# by difference in the order of entries

Copy link
Collaborator

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you

@@ -800,9 +799,34 @@ function render(doc::Documenter.Document, settings::HTML = HTML())
copy_asset("themes/$(theme).css", doc)
end

size_limit_successes = map(collect(keys(doc.blueprint.pages))) do page
function page_navnode(page)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed, but let's move this out of the function body perhaps. I am not a huge fan of declaring functions within functions. I think it just needs to take doc::Document as the first argument?

Comment on lines +1866 to +1871
return foreach(getpage(ctx, navnode).mdast.children) do node
rec = searchrecord(ctx, navnode, node)
if !isnothing(rec)
push!(ctx.search_index, rec)
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach doesn't return anything, so it looks a bit weird to me.

Suggested change
return foreach(getpage(ctx, navnode).mdast.children) do node
rec = searchrecord(ctx, navnode, node)
if !isnothing(rec)
push!(ctx.search_index, rec)
end
end
foreach(getpage(ctx, navnode).mdast.children) do node
rec = searchrecord(ctx, navnode, node)
if !isnothing(rec)
push!(ctx.search_index, rec)
end
end
return nothing

Also, as a side note: I don't mind having foreach-es, but I don't really find them particularly idiomatic. I'd just write this as for node in getpage(ctx, navnode).mdast.children ... end.

Comment on lines +808 to +812
foreach(keys(doc.blueprint.pages)) do page
nn, idx = page_navnode(page)
@debug "Indexing $(page) [$(repr(idx))]"
generate_index!(ctx, nn)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. I don't think this works. A lot of the pushing into .search_index happens during the DOM generation deep in the call tree, i.e. downstream from render_page. So right now a bunch of stuff is missing (e.g. docstrings).

But yea, we have a chicken-and-egg situation here, where in order to get the full search index, we need to render everything, but we don't know how to render it because we don't know the search index hash yet, since we haven't generated the index yet.

I do think that with some reorganization of the code, we could somehow generate all the DOM first, and then inject the search index file name later. Although one nice property of the current structure is that we don't need to keep the DOM of all the pages in memory at once, which I am not sure how we could keep.

What I might suggest (unless you have a good simple idea of how to tackle this) is to undo the hashing related changes for now. There's also another issue with it -- MultiDocumenter, for example, relies on the search_index.js file being where it is, so we may need to think about this change a bit more anyway. But we should definitely get the front end loading changes in.

Copy link
Member

@mortenpi mortenpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hashing logic needs changes: #2700 (comment)

@fingolfin
Copy link
Collaborator

I see the value for inserting the hash but I agree with @mortenpi here, it is better to do this separately, and then differently.

A simpler solution than refactoring the rendering pipeline etc. would be to use simple post-processing: after the HTML docs have been generated, compute the checksum of the (by now complete) search_index.js. Then inject the hash into each HTML file using a regex to pick out the relevant places. Yes, it is a bit of a hack, but it is very easy to implement and understand.

I believe you don't need to rename search_index.js -- using an URL of the form search_index.js?HASH should do the trick as well.

@Hetarth02
Copy link
Contributor

asynchronously loading the index

If we do this, then wouldn't user face a problem when trying to search while the index is loading asynchronously?

For example, as soon as the page loads I as a user try to search something. User presses ctrl + / there are 2 scenarios,

  1. The modal UI doesn't trigger(which is unlikely think it's pretty independent from the logic of search) and user has to wait or try repeatedly.

  2. The modal UI triggers but it doesn't do anything because the search index is getting loaded in the background.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants