Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove fulltext index on posts #828

Closed

Conversation

user12986714
Copy link
Contributor

@user12986714 user12986714 commented Nov 21, 2020

@ArtOfCode- ArtOfCode- closed this Nov 21, 2020
@ArtOfCode-
Copy link
Member

[status-declined]. If it's a LIKE search you need, tick the LIKE box. The fulltext index saves vast amounts of time and server processing for basic searches.

@user12986714
Copy link
Contributor Author

user12986714 commented Nov 22, 2020

@ArtOfCode- The issue with the fulltext index is not that it isn't blazingly fast or that it does not support CJK characters, but is that fulltext search is not doing what people (generally) mean. If I search bbb ccc ddd on metasmoke, I expect posts containing bbb ccc ddd. However, the fulltext search will give posts containing one of bbb, ccc or ddd.

Moreover, when people search spam, they expect posts containing string spam. However, in fulltext search, if a post only contains \bspammer\b but not \bspam\b, it will not be matched.

Consequently, the way fulltext search operates is very confusing and will be a surprise for most of the users. It operates more like Google - when we search best spam in stackexchange on google, we don't expect pages containing exactly best spam in stackexchange; google would, for example, return a page titled best spam posted on stackexchange.

However, on metasmoke, we have been using our old search system for years, and thus users expect their queries to return exactly posts containing exactly best spam in stackexchange. It is obvious from recent complaints of search inaccuracy that those users didn't seen the difference in how search operated in the past and how it operates now.

This is not about fulltext index not supporting CJK; even if all our posts are in ASCII this is still a problem.


As a real example:

  1. If I search nullpointerexception, I get 191 results.
  2. If I search java.lang.nullpointerexception, I get 7062 results.

How can it be the case that 2 yields more results than 1? Behind the scene, query string in 2 is tokenized to java, lang and nullpointerexception (in fact, if you search java lang nullpointerexception, you will get the same 7062 results), and any post containing one of java, lang or nullpointerexception will be returned.

@ArtOfCode-
Copy link
Member

@user12986714 I'm aware of the limitations of FT index searching, but the fact remains that it's saving significant processing time for the simple searches. As I said, if it's the old behaviour you need, tick the box for it - that's why it's there. Removing the index is not the correct solution.

@tripleee
Copy link
Member

Removing the full-text index may be the wrong solution, but there is a problem and the result is that the behavior of search is basically erratic. Having a FAQ somewhere which explains why it's broken is by far insufficient.

Perhaps an adequate fix would be to replace the LIKE button with a "Full-text search" checkbox which is enabled by default, probably with a slightly more verbose description of how it breaks search.

@user12986714
Copy link
Contributor Author

Edited FAQ to explain more on FT index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants