Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index topics and extensions and various other tweaks #11

Merged
merged 5 commits into from
Oct 11, 2023

Conversation

gsmet
Copy link
Member

@gsmet gsmet commented Oct 11, 2023

No description provided.

Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM except a doubt about the boost change for title, see below.

Comment on lines +121 to +122
// I wonder if we could use something similar to https://stackoverflow.com/a/74737474/5043585
// to have some sort of weight in the documents and prioritize some of them
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seriously though, we could implement a very simple rank based on which guide is referenced by others. The Hibernate ORM guide would automatically get a higher rank than e.g. Panache or Hibernate Search, and that's what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #12

// to have some sort of weight in the documents and prioritize some of them
// problem will be to find the right balance because the weight would be always on
// another option could be to use the keyword to trick some searches
// I suspect we should make the keywords a List separated by , and a proper list in the index rather than what it is currently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we can expect all documents to reliably use the proper syntax...

Regardless, in the full-text index you'll get the exact same result, whether you split the list yourself or you let the tokenizer do it. So unless you need spaces or punctutation in keywords, or you need faceting, I don't see what this would bring.

}

return f.bool().must(f.simpleQueryString()
.field("title").boost(10.0f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought topics/keywords were things we wanted to match in priority, that's why the title boost was lower.

If you want to change this, you should probably change it for title_autocomplete as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that we had a very naive solution for search until now. So keywords was basically: "let's add some additional words that are not in the title or preamble", and nothing more.
We could change the definition but that's what it is atm.

As for topics, it might make sense to boost them more in the future but I would suggest to do the tweaking when all the guides are indexed and we have a proper idea of what we want (and also maybe once the PageRank thing is in).

@yrodiere yrodiere enabled auto-merge October 11, 2023 15:04
@yrodiere yrodiere force-pushed the index-topics-extensions branch from 593ad82 to 9b9535b Compare October 11, 2023 15:10
gsmet added 5 commits October 11, 2023 17:13
Also adjust the scoring a bit:
- keywords should be at the same level of the title as they are just
  additional keywords on top of the title
- when topics contains compatibility, lower the score of the document

Adjust the dataset a bit to be more representative and also add a few
explanations about why the documents are popping.
(Even if the changes seem a bit counterproductive, we will need a better
way to adjust the Hibernate Reactive ranking)
@yrodiere yrodiere force-pushed the index-topics-extensions branch from 9b9535b to b22fcae Compare October 11, 2023 15:13
@yrodiere yrodiere merged commit 79b1eae into quarkusio:main Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants