Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kiwix-serve raises 500 error on Or search term #1104

Open
benoit74 opened this issue Jul 23, 2024 · 9 comments
Open

kiwix-serve raises 500 error on Or search term #1104

benoit74 opened this issue Jul 23, 2024 · 9 comments
Assignees
Milestone

Comments

@benoit74
Copy link

See e.g. https://library.kiwix.org/catalog/v2/entries?start=0&count=4&lang=fra&category=other&q=Or

@veloman-yunkan
Copy link
Collaborator

The problem here is that the search text is interpreted as a Xapian query where or is an operator, and therefore the query is syntactically invalid. Note that if you search for "composition or rose" (without quotes) it will be interpreted as a query for either composition or rose rather than a query for all of composition, or and rose.

There are different ways to address this issue:

  1. Preprocess the search text in the front end so that words coinciding with Xapian query operators are somehow escaped (e.g. quoted or preceded with a +). The backend and the semantics of the q parameter of the /catalog/v2/entries API endpoint remain unchanged (thus it will be possible to pass advanced Xapian queries via the HTTP API).
  2. Redefine the syntax and semantics of the q parameter of the /catalog/v2/entries API endpoint and parse it respectively in the backend. The content of the searchbox is passed to the endpoint as is (like in the current implementation).
  3. Document the current implementation and fix it so that a proper error is displayed to the user. The users will have to escape their queries on their own.

@veloman-yunkan
Copy link
Collaborator

@kelson42 @rgaudin ping

@kelson42
Copy link
Collaborator

kelson42 commented Sep 9, 2024

This issue remember me of kiwix/kiwix-tools#440, I need to make a reassessment of all of this.

@kelson42
Copy link
Collaborator

kelson42 commented Sep 10, 2024

@veloman-yunkan Solution (1) - so a kind of xapian_escaping() should be implemented, but this should be done IMHO:

  • At the reader side (so not deep in the search backend)
  • Potentialy in all readers
  • Via a new public method (in libzim?)

For Kiwix serve, not sure exactly how it should be done... but I guess this problem is potentialy everywhere

@veloman-yunkan
Copy link
Collaborator

For Kiwix serve, not sure exactly how it should be done... but I guess this problem is potentialy everywhere

@kelson42 The problem is in the catalog search (rather than in ZIM search) so it should be constrained to kiwix-serve. I think that you may want to revise your feedback based on that information.

BTW, note that solution (1) won't fix the outcome of the API call from the description of this PR - it would rather make sure that the search form doesn't generate such API requests.

@kelson42
Copy link
Collaborator

kelson42 commented Sep 18, 2024

@veloman-yunkan Thank you for the clarification, it changes a bit my opinion indeed

  • For the online version, this escaping/sanitation should be done at the API level (the API can be used from any Kiwix reader)
  • Maybe we should also apply such an escaping/sanitation in searches applying to the local library of readers?

I have also a bit of a hard time why you think fixing the form will solve the API, the API can be called from anyway so just fixing the form seems to be fixing the problem at the wrong place to me.

@veloman-yunkan
Copy link
Collaborator

I have also a bit of a hard time why you think fixing the form will solve the API.

I never said that. My remark, on the contrary, was intended to emphasize once more that solution (1) is limited to the front end and leaves the API intact.

@veloman-yunkan
Copy link
Collaborator

  • For the online version, this escaping/sanitation should be done at the API level (the API can be used from any Kiwix reader)

That's solution (2). By implementing it we will take away some power from the current operation of the q parameter of the catalog search HTTP API. Is that OK?

@kelson42
Copy link
Collaborator

  • For the online version, this escaping/sanitation should be done at the API level (the API can be used from any Kiwix reader)

That's solution (2). By implementing it we will take away some power from the current operation of the q parameter of the catalog search HTTP API. Is that OK?

@veloman-yunkan Yes, indeed so we won't be able to use Xapian reserved keywords for this specific API end-points. This is basically the only "bad" consequence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants