-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf] Explore re-using XPathContext objects, and compiling XPath expressions #3266
Comments
I chatted with @tenderlove about this at RubyConf 2024 and I think I've got a clear path to land some of this in v1.18.0. |
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
See #3378 which implements the context re-use optimization. |
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
I've posted a separate PR that implements compiled XPath expressions: However, I'm not seeing a compelling increase in performance. I'd love if someone who thinks this feature will speed up their code would give that branch a try and show me a benchmark that would convince me it's worth it. |
**What problem is this PR intended to solve?** Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` **What alternatives were considered?** I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes. **Have you included adequate test coverage?** Yes. **Does this change affect the behavior of either the C or the Java implementations?** CRuby only.
Two things I want to explore doing to try to improve the performance of XPath (and, transitively, CSS) searches:
For (1), we need to be a bit careful:
XPathContext#register_namespaces
)XPathContext#register_variable
)nokogiri:
prefix used for dynamic function bindingnokogiri-builtin:
prefix used for our performance-optimized builtin functionsbut the performance improvement could be significant, see this response from the current libxml2 maintainer indicating "best practice" is to keep one XPathContext per thread and re-use it.
The benchmark submitted by a user in #760 indicates a 4x(!) speedup on simple expressions by avoiding re-initializing an XPathContext object. It seems likely that the real-world speedup will be less (since cleaning up registered namespaces and variables will have some overhead), but it still seems like it would be a pretty decent speedup.
For (2), we'll need a new Ruby class to wrap the compiled expression represented by
xmlXPathCompExprPtr
, and a way to pass that into#xpath
, but that seems like relatively straightforward work. (Note this API won't be available in JRuby.)I'd like to get a rough benchmark ahead of time to see how much time this will save us, for simple and for complex expressions -- after a brief search I couldn't find any prior results here.
The text was updated successfully, but these errors were encountered: