Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Explore re-using XPathContext objects, and compiling XPath expressions #3266

Closed
Tracked by #3367
flavorjones opened this issue Jul 2, 2024 · 4 comments
Closed
Tracked by #3367

Comments

@flavorjones
Copy link
Member

Two things I want to explore doing to try to improve the performance of XPath (and, transitively, CSS) searches:

  1. re-use XPathContext objects which are a little expensive to create
  2. expose libxml2's ability to compile XPath expressions

For (1), we need to be a bit careful:

  • XPathContext is not thread-safe
  • there is some state we need to set or un-set appropriately:
    • namespaces (via XPathContext#register_namespaces)
    • variables (via XPathContext#register_variable)
  • while preserving other state:
    • the nokogiri: prefix used for dynamic function binding
    • the nokogiri-builtin: prefix used for our performance-optimized builtin functions
    • the built-in xpath functions themselves

but the performance improvement could be significant, see this response from the current libxml2 maintainer indicating "best practice" is to keep one XPathContext per thread and re-use it.

The benchmark submitted by a user in #760 indicates a 4x(!) speedup on simple expressions by avoiding re-initializing an XPathContext object. It seems likely that the real-world speedup will be less (since cleaning up registered namespaces and variables will have some overhead), but it still seems like it would be a pretty decent speedup.

For (2), we'll need a new Ruby class to wrap the compiled expression represented by xmlXPathCompExprPtr, and a way to pass that into #xpath, but that seems like relatively straightforward work. (Note this API won't be available in JRuby.)

I'd like to get a rough benchmark ahead of time to see how much time this will save us, for simple and for complex expressions -- after a brief search I couldn't find any prior results here.

@flavorjones
Copy link
Member Author

flavorjones commented Dec 11, 2024

I chatted with @tenderlove about this at RubyConf 2024 and I think I've got a clear path to land some of this in v1.18.0.

flavorjones added a commit that referenced this issue Dec 14, 2024
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).

This PR makes the following changes:

- the `XPathContext` object ...
  - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable
  - tracks namespaces and variables registered through those two methods
  - has a new `#reset` method that deregisters all namespaces and variables registered
  - has a new `#node=` method to set the current node being searched
- all the `Searchable` methods ...
  - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context`
  - and that context object is `#reset` when the block returns

There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:

```
$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower
```

I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile.

The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
flavorjones added a commit that referenced this issue Dec 14, 2024
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).

This PR makes the following changes:

- the `XPathContext` object ...
  - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable
  - tracks namespaces and variables registered through those two methods
  - has a new `#reset` method that deregisters all namespaces and variables registered
  - has a new `#node=` method to set the current node being searched
- all the `Searchable` methods ...
  - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context`
  - and that context object is `#reset` when the block returns

There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:

```
$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower
```

I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile.

The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
flavorjones added a commit that referenced this issue Dec 14, 2024
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).

This PR makes the following changes:

- the `XPathContext` object ...
  - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable
  - tracks namespaces and variables registered through those two methods
  - has a new `#reset` method that deregisters all namespaces and variables registered
  - has a new `#node=` method to set the current node being searched
- all the `Searchable` methods ...
  - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context`
  - and that context object is `#reset` when the block returns

There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:

```
$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower
```

I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile.

The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
@flavorjones
Copy link
Member Author

See #3378 which implements the context re-use optimization.

flavorjones added a commit that referenced this issue Dec 14, 2024
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).

This PR makes the following changes:

- the `XPathContext` object ...
  - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable
  - tracks namespaces and variables registered through those two methods
  - has a new `#reset` method that deregisters all namespaces and variables registered
  - has a new `#node=` method to set the current node being searched
- all the `Searchable` methods ...
  - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context`
  - and that context object is `#reset` when the block returns

There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:

```
$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower
```

I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile.

The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
flavorjones added a commit that referenced this issue Dec 14, 2024
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).

This PR makes the following changes:

- the `XPathContext` object ...
  - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable
  - tracks namespaces and variables registered through those two methods
  - has a new `#reset` method that deregisters all namespaces and variables registered
  - has a new `#node=` method to set the current node being searched
- all the `Searchable` methods ...
  - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context`
  - and that context object is `#reset` when the block returns

There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:

```
$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower
```

I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile.

The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
@flavorjones
Copy link
Member Author

flavorjones commented Dec 15, 2024

I've posted a separate PR that implements compiled XPath expressions:

#3380

However, I'm not seeing a compelling increase in performance. I'd love if someone who thinks this feature will speed up their code would give that branch a try and show me a benchmark that would convince me it's worth it.

flavorjones added a commit that referenced this issue Dec 16, 2024
**What problem is this PR intended to solve?**

Issue #3266 discusses a potential optimization by re-using XPathContext
objects (which are relatively expensive to initialize).

This PR makes the following changes:

- the `XPathContext` object ...
- can accept `nil` as the value arg to `#register_ns` and
`#register_variable` to _deregister_ the namespace or variable
  - tracks namespaces and variables registered through those two methods
- has a new `#reset` method that deregisters all namespaces and
variables registered
  - has a new `#node=` method to set the current node being searched
- all the `Searchable` methods ...
- use a thread-local XPathContext object that is only available in a
block yielded by `#get_xpath_context`
  - and that context object is `#reset` when the block returns

There is an escape hatch that I will leave undocumented, which is to set
the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution.

Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb
file:

```
$ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       large: normal     3.790k i/100ms
Calculating -------------------------------------
       large: normal     37.556k (± 1.7%) i/s   (26.63 μs/i) -    189.500k in   5.047390s
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
       small: normal    11.726k i/100ms
Calculating -------------------------------------
       small: normal    113.719k (± 2.5%) i/s    (8.79 μs/i) -    574.574k in   5.055648s

$ ruby --yjit ./issues/3266-xpath-benchmark.rb
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    large: optimized     4.609k i/100ms
Calculating -------------------------------------
    large: optimized     48.107k (± 1.6%) i/s   (20.79 μs/i) -    244.277k in   5.079041s

Comparison:
    large: optimized:    48107.3 i/s
       large: normal:    37555.7 i/s - 1.28x  slower

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux]
Warming up --------------------------------------
    small: optimized    32.014k i/100ms
Calculating -------------------------------------
    small: optimized    319.760k (± 0.6%) i/s    (3.13 μs/i) -      1.601M in   5.006140s

Comparison:
    small: optimized:   319759.6 i/s
       small: normal:   113719.0 i/s - 2.81x  slower
```

**What alternatives were considered?**

I originally implemented a much simpler approach, which cleared all
registered variables and functions, however the standard XPath 1.0
functions were all also deregistered; and calling
`xmlXPathRegisterAllFunctions` to re-register them took us back to the
original performance profile.

The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and
for that we track registered variables and namespaces, and de-register
them after the query eval completes.


**Have you included adequate test coverage?**

Yes.


**Does this change affect the behavior of either the C or the Java
implementations?**

CRuby only.
@flavorjones
Copy link
Member Author

flavorjones commented Dec 16, 2024

#3378 with xpath context re-use will be in v1.18.0.

Closing this, if anyone has thoughts on compiled xpath expressions, let's continue the conversation in #3380 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant