-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use elementpath for finding via XPath (2.0) #204
base: master
Are you sure you want to change the base?
Conversation
Originally using re:test() which is: boolean regexp:test(string, string, string?) http://exslt.org/regexp/functions/test/index.html In XPath 2.0 it's unnecessary to import 3rd-party stuff to use regex and the func signature matches. The trick is not to support "None" as "flags" or everything crashes. https://www.w3.org/TR/xpath-functions/#func-matches Also, this e-mail chain: https://exslt.fourthought.narkive.com/qFTvc1I2/exslt-for-xslt-2-0-a-new-life-for-exslt
Unfortunately the group lookup returns an empty list from the
|
How about this function (taken from here): def escape_string_for_xpath(s):
if '"' in s and "'" in s:
return 'concat(%s)' % ", '\"',".join('"%s"' % x for x in s.split('"'))
elif '"' in s:
return "'%s'" % s
return '"%s"' % s Then we'd modify the strings in e.g. >>> kp = PyKeePass('test4.kdbx', 'password', 'test4.key')
>>> title_xpath = '/String/Key[text()="Title"]/../Value[text()={}]/../..'
>>> title = 'quote test -> " <-'
>>> kp._xpath('//Entry' + title_xpath.format(escape_string_for_xpath(title)), cast=True, first=True)
Entry: "quote test -> " <- (None)" |
Is there a reason we should prefer elementpath to the above solution? For the sake of people packaging pykeepass, I want to avoid adding any more dependencies unless its really necessary. |
Bump. |
One of the reason is that elementpath allows usage of XPath 1.0 as well as XPath 2.0 and by using the latter we're leveraging out-of-box support for quotes and other fixes, so overall it'd be better to use already newer model which will be even easier for switching to 3.x if/when required. The other reason is that basically the escaping function is error prone and if there's any other character that XPath 1.0 doesn't like it'll break. Creating a list of unsupported characters and trying to escape them this way is rather ugly. Re the dependencies reason:
|
Big +1 for merging this |
If there are other characters besides " which are problematic, then I agree we shouldn't try to handle this ourselves. However, I'm not aware of any others characters that have this problem yet. Also it seems like this PR still needs special quote escaping. As for all the new features of XPath 2.0, I don't see a need right now, as our queries in In my opinion, we should keep this PR on the backburner until new issues arise, then take a second look. |
The problem is that
lxml
uses XPath 1.0. That version does not know how to escape those characters as those are special/forbidden. Instead, we can useelementpath
which seems to be quite popular (among other XPath 2.0 related packages), but it breaks the tests.For XPath 1.0 fixing #123 doesn't seem possible, so perhaps rewriting everything to
elementpath.select(root, path)
would work if the rest of the codebase was adjusted as well.Closes #123