Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor docs to use ExDocs search_data model for global HexDocs indexing of Gleam packages #3904

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

diemogebhardt
Copy link
Contributor

@diemogebhardt diemogebhardt commented Nov 27, 2024

This PR is regarding issue #3811.

Data Model

The search data model json changed from SearchIndex:

[
  {
    "doc": "the name of the module or page",
    "title": "the title of the item",
    "content": "the content of the item",
    "url": "the relative link of the item, like 'module.html#title'"
  }
]

to SearchData, SearchItem, SearchItemType and SearchProgrammingLanguage:

{
  "items": [
    {
      "type": "module" | "type" | "function" | "constant" | "page",
      "parentTitle": "the name of the module or page",
      "title": "the title of the item",
      "doc": "the content of the item",
      "ref": "the relative link of the item, like 'module.html#title'"
    }
  ],
  "proglang": "gleam"
}

Notes:

  • parentTitle doesn't seem to have an equivalent in ExDocs from what I have seen so far. It is being populated by:
    • either module.name.to_string() in the context of types "module" | "type" | "function" | "constant"
    • or config.name.to_string() in the context of type "page" (README.md)
  • doc matches ExDocs, except for headline partitioning
  • ref matches ExDocs, except for arity module.html#title/arity

Neither of those is an issue for global HexDocs indexing of Gleam packages.

Output Files

As I'm still exploring how everything works in the codebase, I didn't want to change too much at once. So I added search-data.json in addition to search-data.js to keep the existing mechanism working as is for now.

search-data.js

Wrapped search_data_json in window.Gleam.initSearch({}); for use in documentation_layout.html to execute initSearch({}) in docs-js/index.js which in turn uses lunr.js.

search_data.json

Bare search_data_json for use in global HexDocs.

Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!! I've left a few small comments inline, mostly around abbreviations, which we avoid in this codebase.

It would be great to have a test for the output format also.

#[serde(rename = "type")]
type_: SearchItemType,
#[serde(rename = "parentTitle")]
parent_title: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a parent title? Could you add /// documentation comments for each of these fields pleaes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at my comments above:

  • In one I pasted a screenshot to visualize how it's being used currently;
  • and in another asked whether you might have an idea for a better name?

Please also have a look at my Notes.

doc: String,
struct SearchData {
items: Vec<SearchItem>,
proglang: SearchProgLang,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
proglang: SearchProgLang,
programming_language: SearchProgLang,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the naming and would prefer it too. Yet it is named differently in ExDocs so I went for consistency.

See here for the summarized ExDocs data format and more Info.

So shall we use programming_language internally and have a separate function that transforms the json output of search-data.json for HexDocs indexing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

url: String,
doc: String,
#[serde(rename = "ref")]
ref_: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ref_: String,
reference: String,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the naming and would prefer it too. Yet it is named differently in ExDocs so I went for consistency.

See here for the summarized ExDocs data format and more Info.

So shall we use reference internally and have a separate function that transforms the json output of search-data.json for HexDocs indexing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

title: String,
content: String,
url: String,
doc: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
doc: String,
content: String,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the naming and would prefer it too. Yet it is named differently in ExDocs so I went for consistency.

See here for the summarized ExDocs data format and more Info.

So shall we use content internally and have a separate function that transforms the json output of search-data.json for HexDocs indexing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[serde(rename_all = "lowercase")]
enum SearchProgLang {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enum SearchProgLang {
enum SearchProgrammingLanguage {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the naming and would prefer it too. Yet it is named differently in ExDocs so I went for consistency.

See here for the summarized ExDocs data format and more Info.

So shall we use SearchProgrammingLanguage internally and have a separate function that transforms the json output of search-data.json for HexDocs indexing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

this.add({
id: i,
// type: entry.type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this comment to discuss, or keep it as a reminder that we have this property in the new data model but are not making use of it yet.

const index = lunr(function () {
this.ref("id");
// this.field("type");
// this.field("parentTitle");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unfinished?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this comment to discuss, or keep it as a reminder that we have this property in the new data model but are not making use of it yet.

@diemogebhardt
Copy link
Contributor Author

Yes, test are a good idea! :)

Lets clarify the other topics first though: whether we keep two separate data models. One for Gleam internal use and another for HexDocs indexing?

@lpil
Copy link
Member

lpil commented Dec 4, 2024

I think just one is fine, it's only used to render that JSON object.

@diemogebhardt diemogebhardt force-pushed the refactor-docs-for-hexdocs-indexing branch from fae2bc9 to ab689e5 Compare December 4, 2024 16:39
@diemogebhardt
Copy link
Contributor Author

Ok, I don't know what you want me to do. This isn't a good use for either your, nor my time.

Not sure how familiar you are with this part of the code base and when you looked into it last. From my own experience, I know that running a project while overseeing a larger code base with lots of different contributors is quite something. So I understand that revisiting parts of the code base that others have contributed to and you haven't reviewed recently takes getting used to. That's why I added context information, including screenshots in the initial post and comments regarding key changes to make it easier to follow. Do you see them or are they just visible to me?

Is there anything else I can do to communicate better?

@diemogebhardt
Copy link
Contributor Author

diemogebhardt commented Dec 6, 2024

Open Questions

  • Is consistent naming throughout the Gleam code base; and BEAM languages a goal? Options:
    • 2 Models:
      • one internal with extended names for rust and generated js
        • used in HexDocs search of Gleam projects
      • one external with abbreviated names for generated json
        • used in global HexDocs indexing of Gleam projects
    • 1 Model variant A (current implementation):
      • rust model named with abbreviated names
      • generated js and json with abbreviated names
        • used in HexDocs search of Gleam projects
        • used in global HexDocs indexing of Gleam projects
    • 1 Model variant B (your proposal?):
      • rust model named with extended names
      • generated js and json with abbreviated names
        • used in HexDocs search of Gleam projects
        • used in global HexDocs indexing of Gleam projects
    • Open a discussion with the ExDocs/HexDocs team to create a proper BEAM standard, thats actually documented and using better naming.
      • Would require them to be backwards compatible and support both during a transition period but probably feasible.
      • If this is a real option, I would volunteer to help with whatever is needed.
    • ...?
  • Extra vs Page? Extra being the ExDocs/HexDocs way.
    • If Extra, only for use in the model or also throughout the rest of the code base?
  • parent_title/parentTitle or does a better name come to mind?
    • used in HexDocs search of Gleam projects
    • NOT used in global HexDocs indexing of Gleam projects

Option: 2 Models

  • Pros:
    • Such abbreviated names all but help with understanding the code. Having separate models would allow for nice descriptive names, used Gleam internally in rust and js for HexDocs search of Gleam projects. A quick search shows all affected places, no matter whether rust or js code.
    • We don't pollute the global HexDocs indexing json with Gleam specific parent_title/parentTitle. More clarity whats used where.
  • Cons:
    • A negative is that it's more work though, as two models need to be generated/transformed.
    • Deviation from the BEAM "standard"?
/// for HexDocs search of Gleam projects
#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
struct SearchIndex {
    #[serde(rename = "parentTitle")]
    parent_title: String,
    title: String,
    content: String,
    reference: String,
}

/// for global HexDocs indexing
#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
struct SearchData {
    items: Vec<SearchItem>,
    proglang: SearchProgLang,
}
#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
struct SearchItem {
    #[serde(rename = "type")]
    type_: SearchItemType,
    title: String,
    doc: String,
    #[serde(rename = "ref")]
    ref_: String,
}
#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[serde(rename_all = "lowercase")]
enum SearchItemType {
    Constant,
    Function,
    Module,
    Page,
    Type,
}
#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[serde(rename_all = "lowercase")]
enum SearchProgLang {
    // Elixir,
    // Erlang,
    Gleam,
}

Option: 1 Model variant A (current implementation)

  • Pros:
    • Consistent
      • internal naming for rust and js used in HexDocs search of Gleam projects;
      • AND external naming for json used in global HexDocs indexing of Gleam projects;
      • making it easier to understand and navigate all affected bits throughout the code base;
      • AND being consistent with BEAM "standard" used in global HexDocs indexing.
    • ...?
  • Cons:
    • Abbreviated naming in rust, js and json. Not so nice naming in the rust bits only?
    • ...?

docs.rs

#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
struct SearchData {
    items: Vec<SearchItem>,
    proglang: SearchProgLang,
}

#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
struct SearchItem {
    #[serde(rename = "type")]
    type_: SearchItemType,
    #[serde(rename = "parentTitle")]
    parent_title: String,
    title: String,
    doc: String,
    #[serde(rename = "ref")]
    ref_: String,
}

#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[serde(rename_all = "lowercase")]
enum SearchItemType {
    Constant,
    Function,
    Module,
    Page,
    Type,
}

#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[serde(rename_all = "lowercase")]
enum SearchProgLang {
    // Elixir,
    // Erlang,
    Gleam,
}

outputs:

search-data.js

for use in compiler-core/templates/docs-js/index.js:

window.Gleam.initSearch({
  "items": [
    {
      "type": "module" | "type" | "function" | "constant" | "page",
      "parentTitle": "the name of the module or page",
      "title": "the title of the item",
      "doc": "the content of the item",
      "ref": "the relative link of the item, like 'module.html#title'"
    }
  ],
  "proglang": "gleam"
});

to generate search results in HexDocs of Gleam projects like:

image

and

search-data.json

for use in global HexDocs indexing:

{
  "items": [
    {
      "type": "module" | "type" | "function" | "constant" | "page",
      "parentTitle": "the name of the module or page",
      "title": "the title of the item",
      "doc": "the content of the item",
      "ref": "the relative link of the item, like 'module.html#title'"
    }
  ],
  "proglang": "gleam"
}

Option: 1 Model variant B

  • Pros:
    • Nicer naming in the rust bits only.
    • ...?
  • Cons:
    • Abbreviated naming in js and json.
    • Inconsistent naming between rust and js used in HexDocs search of Gleam projects making it harder to understand and navigate all affected bits throughout the code base.
    • ...?

docs.rs

#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
struct SearchData {
    items: Vec<SearchItem>,
    #[serde(rename = "proglang")]
    programming_language: SearchProgrammingLanguage,
}

#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
struct SearchItem {
    #[serde(rename = "type")]
    type_: SearchItemType,
    #[serde(rename = "parentTitle")]
    parent_title: String,
    title: String,
    #[serde(rename = "doc")]
    content: String,
    #[serde(rename = "ref")]
    reference: String,
}

#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[serde(rename_all = "lowercase")]
enum SearchItemType {
    Constant,
    Function,
    Module,
    Page,
    Type,
}

#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[serde(rename_all = "lowercase")]
enum SearchProgrammingLanguage {
    // Elixir,
    // Erlang,
    Gleam,
}

outputs: same as variant A.

@diemogebhardt diemogebhardt force-pushed the refactor-docs-for-hexdocs-indexing branch from ab689e5 to 9725204 Compare December 10, 2024 21:27
@lpil
Copy link
Member

lpil commented Dec 17, 2024

Hello! Sorry for the delay, my inbox is overwhelming!

I'm a bit confused by your notes here. Are you suggesting we propose a change upstream to Hex for them to change the format that all BEAM languages use? Personally I am happy with the format Hex is using so if the comments from my previous PR are applied then we can merge. Thank you

@diemogebhardt
Copy link
Contributor Author

I give up, @lpil – our miscommunication is too frustrating.

Above, there are two questions that are still unanswered since the initial post of this PR:

  • Extra vs Page? Extra being the ExDocs/HexDocs way.
    • If Extra, only for use in the model or also throughout the rest of the code base?
  • parent_title/parentTitle or does a better name come to mind?
    • used in HexDocs search of Gleam projects
    • NOT used in global HexDocs indexing of Gleam projects

Above, there are three options mapped out – including a draft implementation, and their pros and cons. Have you had a look at them and understand the impact of each? These are the three options to choose from:

  • Option: 2 Models
  • Option: 1 Model variant A (current implementation)
  • Option: 1 Model variant B

Feel free to close this unless you plan on addressing the above.

@lpil
Copy link
Member

lpil commented Dec 21, 2024

I would like the specific field renaming that I asked for above please 🙏 I am not looking to discuss any alternative designs presently.

If you would like to do just the tests and leave the renaming to me that would also be fine. Thank you

enum SearchProgLang {
// Elixir,
// Erlang,
Gleam,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you want to have the comments removed or kept for documentation?

#[serde(rename = "type")]
type_: SearchItemType,
#[serde(rename = "parentTitle")]
parent_title: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea of how to name this better?

See comment regarding this below.

resultDoc.appendChild(resultDocTitle);
let resultDocOrSection = resultDocTitle;
if (doc.doc != doc.title) {
if (searchItem.parentTitle != searchItem.title) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic to render one of these two variations:

image

  • result 1
    • has parent_title set to gleam/string_tree
    • and title set to gleam/string_tree
  • result 2
    • has parent_title set to gleam/string_tree
    • and title set to string

Constant,
Function,
Module,
Page,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all of the SearchItemTypes being used.

@ruslandoga propsed Extra in #3811 (comment) for README.md and such things.

In the sidebar of the UI and the rest of the code it's Page. As this is not used for indexing, it might make more sense to stick to SearchItemType::Page.

See comment regarding this below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found more information regarding Extras:

here: https://hexdocs.pm/ex_doc/Mix.Tasks.Docs.html#module-configuration

:extras - List of paths to additional Markdown (.md extension), Live Markdown (.livemd extension), Cheatsheets (.cheatmd extension) and plain text pages to add to the documentation. You can also specify keyword pairs to customize the generated filename, title and source file of each extra page; default: []. Example: ["README.md", "LICENSE", "CONTRIBUTING.md": [filename: "contributing", title: "Contributing", source: "CONTRIBUTING.mdx"]]

and here: https://hexdocs.pm/ex_doc/Mix.Tasks.Docs.html#module-groups

extras: [
  "guides/introduction/foo.md",
  "guides/introduction/bar.md",

  ...

  "guides/advanced/baz.md",
  "guides/advanced/bat.md"
]
groups_for_extras: [
  "Introduction": Path.wildcard("guides/introduction/*.md"),
  "Advanced": Path.wildcard("guides/advanced/*.md")
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After familiarizing myself more with HexDocs for Elixir and Erlang, I noticed that they use "Pages" as the headline in the sidebar too.

So it might make sense to stay consistent with others are doing in the BEAM ecosystem and adopt their naming: SearchItemType::Extra.

title: config.name.to_string(),
content,
url: page.path.to_string(),
doc: content,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs_pages: &[DocsPage],
    
for page in docs_pages {
    let content = fs.read(&page.source).unwrap_or_default();
    ...
}

SearchItemType::Page seems more appropriate than SearchItemType::Extra.

Or refactor everything that's Page to Extra?

See comment regarding this above.

url: String,
doc: String,
#[serde(rename = "ref")]
ref_: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the naming and would prefer it too. Yet it is named differently in ExDocs so I went for consistency.

See here for the summarized ExDocs data format and more Info.

So shall we use reference internally and have a separate function that transforms the json output of search-data.json for HexDocs indexing?

title: String,
content: String,
url: String,
doc: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the naming and would prefer it too. Yet it is named differently in ExDocs so I went for consistency.

See here for the summarized ExDocs data format and more Info.

So shall we use content internally and have a separate function that transforms the json output of search-data.json for HexDocs indexing?


#[derive(Serialize, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[serde(rename_all = "lowercase")]
enum SearchProgLang {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the naming and would prefer it too. Yet it is named differently in ExDocs so I went for consistency.

See here for the summarized ExDocs data format and more Info.

So shall we use SearchProgrammingLanguage internally and have a separate function that transforms the json output of search-data.json for HexDocs indexing?

this.add({
id: i,
// type: entry.type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this comment to discuss, or keep it as a reminder that we have this property in the new data model but are not making use of it yet.

const index = lunr(function () {
this.ref("id");
// this.field("type");
// this.field("parentTitle");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this comment to discuss, or keep it as a reminder that we have this property in the new data model but are not making use of it yet.

@diemogebhardt
Copy link
Contributor Author

Same here, @lpil – sorry! Glad we figured this out – I am relieved. You should be able to see my comments from weeks back now.

@diemogebhardt diemogebhardt force-pushed the refactor-docs-for-hexdocs-indexing branch 2 times, most recently from 3bef08f to cc0ba6f Compare December 24, 2024 01:20
@diemogebhardt
Copy link
Contributor Author

diemogebhardt commented Dec 24, 2024

Thank you!! I've left a few small comments inline, mostly around abbreviations, which we avoid in this codebase.

Renamed the abbreviations. You should be able to see all my replies to your comments from weeks ago now.

It would be great to have a test for the output format also.

Added tests to ensure that the output matches the ExDocs specification.

@diemogebhardt diemogebhardt force-pushed the refactor-docs-for-hexdocs-indexing branch from ee888b7 to c386647 Compare January 9, 2025 22:55
@diemogebhardt
Copy link
Contributor Author

diemogebhardt commented Jan 9, 2025

Rebased and resolved conflicts.

@diemogebhardt diemogebhardt force-pushed the refactor-docs-for-hexdocs-indexing branch from c386647 to 761bd5a Compare January 13, 2025 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants