Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser fails at double quote character #98

Open
tehtris-siem opened this issue Jul 27, 2023 · 5 comments
Open

Parser fails at double quote character #98

tehtris-siem opened this issue Jul 27, 2023 · 5 comments

Comments

@tehtris-siem
Copy link

Hi,

I parsed the expression Category:"Logon and expected SearchField('Category', Word('"Logon')).

However i received luqum.exceptions.IllegalCharacterError.

I parsed the expression Category:Logon" and expected luqum.exceptions.IllegalCharacterError.

However i received SearchField('Category', Word('Logon"')).

from luqum.parser import parser

# No traceback
tree = parser.parse('Category:Logon"')
print(repr(tree))

# Traceback
tree = parser.parse('Category:"Logon')
print(repr(tree))

Is this an expected behavior ?

Shouldn't both expressions react in the same way ?

Is it related to this issue ? #86

@alexgarel
Copy link
Member

Yes it might be related.
I'm not sure it's easy to fix with current parser.

@tehtris-siem
Copy link
Author

Hi, any update about a potential fix of this issue?

@alexgarel
Copy link
Member

Sorry I don't have time to work on it personally right now (maybe at beginning of 2023.

If you want to try to tackle it, I can take the time to point you in the right direction.

@ahankinson
Copy link

hi @alexgarel -- I'm interested in trying to fix this issue. Any chance you could point me in the right direction? Thanks!

@alexgarel
Copy link
Member

alexgarel commented Oct 18, 2024

Hi @ahankinson thanks for your help !

It all happens in https://github.com/jurismarches/luqum/blob/master/luqum/parser.py

We use PLY (by the way we could try to upgrade the version).

You will have to understand PLY's basis.

Maybe it's about refining the TERM_RE regexp ? But it might create ambiguity in the parser (how to understand "this thing", is it term "this AND thing" or "this thing" phrase. I'm not sure we would be able to distinguish ambiguity in an intuitive way.

So I would rather say, you have to escape the double quote in this case (but it's not the job of luqum to do that). Like Category:\"Logon

Of course an improvement would be to give some hint about escaping the double quote as we get a IllegalCharacterError, and we see this pattern in the expression (but PLY does not give much context on errors).

Note that single quote is not a valid phrase delimiter (see doc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants