Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete Results for Strings that Include Numbers #122

Open
thekenshow opened this issue Oct 26, 2021 · 9 comments
Open

Incomplete Results for Strings that Include Numbers #122

thekenshow opened this issue Oct 26, 2021 · 9 comments

Comments

@thekenshow
Copy link
Contributor

A client site uses part numbers in page titles (e.g., SPK1000) and TNTSearch isn't returning all matches when the first three characters are used.

Test case 1 is a search for "spk", which should return "spk1000" and "spk7457", but only the first appears:

Screen Shot 2021-10-25 at 4 31 03 PM

A search for "spk7", returns "spk7457", which should also appear in the previous search:

Screen Shot 2021-10-25 at 4 31 13 PM

Test 2 is a search for "739", which should return three results - two instances of "7393 Horn Driver" and 1 with "739" in the body of the text, but instead only returns the latter:

Screen Shot 2021-10-26 at 1 16 04 PM

A search for "7393" turns up the first two expected above (two instances of "7393 Horn Driver"):

Screen Shot 2021-10-26 at 1 15 56 PM

Thought this might be related to the stemming issue describe here but @ViliusS set me straight.

@ViliusS
Copy link
Contributor

ViliusS commented Oct 26, 2021

Are you using fuzzy search? Try playing with tntsearch settings a bit to see if one of them makes a difference. That would at least help pinpoint the part of the search algorithm which is failing.

@thekenshow
Copy link
Contributor Author

Tried the "739" search documented above, with index rebuild + cache clear between tests, with the following:

  • fuzzy enabled and then disabled, same result
  • phrases enabled and then disabled, same result
  • search URL ("/search") directly and using pop-up/overlay, same result
  • search type of "auto" and "auto", same result
  • steammer enabled and disabled, same result

Here's my settings yaml:

enabled: true
search_route: /search
query_route: /s
built_in_css: true
built_in_js: true
built_in_search_page: true
enable_admin_page_events: true
search_type: auto
fuzzy: true
phrases: true
stemmer: default
display_route: true
display_hits: true
display_time: true
live_uri_update: true
limit: '20'
min: '3'
snippet: '300'
index_page_by_default: true
scheduled_index:
  enabled: false
  at: '* * * * *'
  logs: logs/tntsearch-index.out
filter:
  items:
    - [email protected]
powered_by: true
search_object_type: Grav

@ViliusS
Copy link
Contributor

ViliusS commented Oct 28, 2021

I can confirm that there is something strange going on with searches which include numbers. Let's day I have these 2 data points.
"7777777"
"777777777777"

If I search for 777 I get only the first result however it is not highlighted in the context.
If I search for 7777, 77777, 777777 or 7777777 I get only first result and it is now correctly highlighted.
If I search for 77777777 to 777777777777 then I correctly get only second result.

Most probably a bug in TNTSearch library itself. Maybe something with BM25 implementation.

@thekenshow
Copy link
Contributor Author

Hmmm. Sounds like I may have to implement simple search until this gets resolved. Model numbers are the bread and butter for this client site.

@ViliusS
Copy link
Contributor

ViliusS commented Oct 28, 2021

I just did a quick debug session and it is definitely a library bug. Most probably a bug in TNTIndexer because the data is already that way in the index. You can try to debug it youself further https://github.com/trilbymedia/grav-plugin-tntsearch/blob/develop/vendor/teamtnt/tntsearch/src/Indexer/TNTIndexer.php or open an issue with them https://github.com/teamtnt/tntsearch

@thekenshow
Copy link
Contributor Author

Great, thanks for all your help on this. I'll open an issue with https://github.com/teamtnt/tntsearch.

@ViliusS
Copy link
Contributor

ViliusS commented Oct 28, 2021

I have fixed some highlighter issues found during my tests teamtnt/tntsearch#256, but search with numbers is still broken.

I will try to look at the library code in couple of days if I would find a free and will reply on tntsearch repo.

This issue can be closed here.

@ViliusS
Copy link
Contributor

ViliusS commented Oct 30, 2021

@thekenshow I've spent some time on this and uncovered another layer of bugs. Try these two patches:
#123
#124

In order to active fuzzy search in the library itself for your case you will also need:
a) apply teamtnt/tntsearch#233 with default fuzzy_no_limit option set to true,
b) and increase Levenshtein distance in the new configuration option to at least 4 or 5.

Other than that, there is nothing we can do at the moment. Proper partial search needs to be implemented by someone in the library.

@thekenshow
Copy link
Contributor Author

@ViliusS Thanks for diving into this, I'm back to it today and will let you know what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants