-
Notifications
You must be signed in to change notification settings - Fork 496
Load time optimization: download search index asynchronously; make cacheable #2700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thank you, sounds useful! Prettier fails (note that we use Prettier v2 not v3) Also needs a changelog entry |
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
|
|||
* Refactored `deploydocs` internals to allow dispatch on the `versions` keyword, intended as a non-public API for DocumenterVitepress which cannot use the default versioning mechanism during deployment ([#2695]). | |||
* Use different banners for dev and unreleased docs ([#2382], [#2682]) | |||
* The search index now loads asynchronously, and can be cached indefinitely. ([#2702], [#2700]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xal-0 those PRs also need to be added to the list at the end of the file (make sure to pull our changes first :-) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: one can run make changelog
, which will update the references in CHANGELOG.md
automatically.
generate_index!(ctx, nn) | ||
end | ||
|
||
# Make an attempt at avoiding spurious changes to the search index hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be accurate?
# Make an attempt at avoiding spurious changes to the search index hash. | |
# Sort the index to avoid spurious changes to the search index hash caused | |
# by difference in the order of entries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you
@@ -800,9 +799,34 @@ function render(doc::Documenter.Document, settings::HTML = HTML()) | |||
copy_asset("themes/$(theme).css", doc) | |||
end | |||
|
|||
size_limit_successes = map(collect(keys(doc.blueprint.pages))) do page | |||
function page_navnode(page) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshed, but let's move this out of the function body perhaps. I am not a huge fan of declaring functions within functions. I think it just needs to take doc::Document
as the first argument?
return foreach(getpage(ctx, navnode).mdast.children) do node | ||
rec = searchrecord(ctx, navnode, node) | ||
if !isnothing(rec) | ||
push!(ctx.search_index, rec) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foreach
doesn't return anything, so it looks a bit weird to me.
return foreach(getpage(ctx, navnode).mdast.children) do node | |
rec = searchrecord(ctx, navnode, node) | |
if !isnothing(rec) | |
push!(ctx.search_index, rec) | |
end | |
end | |
foreach(getpage(ctx, navnode).mdast.children) do node | |
rec = searchrecord(ctx, navnode, node) | |
if !isnothing(rec) | |
push!(ctx.search_index, rec) | |
end | |
end | |
return nothing |
Also, as a side note: I don't mind having foreach
-es, but I don't really find them particularly idiomatic. I'd just write this as for node in getpage(ctx, navnode).mdast.children ... end
.
foreach(keys(doc.blueprint.pages)) do page | ||
nn, idx = page_navnode(page) | ||
@debug "Indexing $(page) [$(repr(idx))]" | ||
generate_index!(ctx, nn) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm. I don't think this works. A lot of the pushing into .search_index
happens during the DOM generation deep in the call tree, i.e. downstream from render_page
. So right now a bunch of stuff is missing (e.g. docstrings).
But yea, we have a chicken-and-egg situation here, where in order to get the full search index, we need to render everything, but we don't know how to render it because we don't know the search index hash yet, since we haven't generated the index yet.
I do think that with some reorganization of the code, we could somehow generate all the DOM first, and then inject the search index file name later. Although one nice property of the current structure is that we don't need to keep the DOM of all the pages in memory at once, which I am not sure how we could keep.
What I might suggest (unless you have a good simple idea of how to tackle this) is to undo the hashing related changes for now. There's also another issue with it -- MultiDocumenter, for example, relies on the search_index.js
file being where it is, so we may need to think about this change a bit more anyway. But we should definitely get the front end loading changes in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hashing logic needs changes: #2700 (comment)
I see the value for inserting the hash but I agree with @mortenpi here, it is better to do this separately, and then differently. A simpler solution than refactoring the rendering pipeline etc. would be to use simple post-processing: after the HTML docs have been generated, compute the checksum of the (by now complete) I believe you don't need to rename |
If we do this, then wouldn't user face a problem when trying to search while the index is loading asynchronously? For example, as soon as the page loads I as a user try to search something. User presses
|
If you have not visited https://docs.julialang.org for 10 minutes, there is a very noticeable delay as all of the assets hosted by GitHub are either validated or redownloaded. The worst offender is
search_index.js
, which is 4.4 MiB (1 MiB gzipped).This PR does two things to make the situation a little better:
search.js
handles asynchronously loading the index just fine. This is a simple matter of making the script tagasync
.Cache-Control: max-age=600
on everything it hosts, so we're mostly stuck with this situation there. We can still make it a little better for docs hosted elsewhere by putting the search index's hash into its filename, so it can stay fresh indefinitely.Before, with slow internet and render blocking network requested boxed in red:

After:
