Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eskip: improve lexer performance #2755

Merged
merged 2 commits into from
Jan 16, 2024
Merged

eskip: improve lexer performance #2755

merged 2 commits into from
Jan 16, 2024

Conversation

AlexanderYastrebov
Copy link
Member

@AlexanderYastrebov AlexanderYastrebov commented Nov 28, 2023

  • do not use fmt.Sprintf in ParseFilters and ParsePredicates
  • avoid allocations for fixed tokens
  • optimize scanWhile loop
  • add scanEscaped fast path
  • optimize scanRegexp and scanEscaped using strings.Builder
  • optimize scanWhitespace
  • optimize selectScanner using switch
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                  │   HEAD~1    │                HEAD                 │
                  │   sec/op    │   sec/op     vs base                │
ParsePredicates-8   29.88µ ± 3%   11.33µ ± 2%  -62.09% (p=0.000 n=10)

                  │    HEAD~1    │                 HEAD                 │
                  │     B/op     │     B/op      vs base                │
ParsePredicates-8   4.863Ki ± 0%   2.008Ki ± 0%  -58.71% (p=0.000 n=10)

                  │   HEAD~1    │                HEAD                │
                  │  allocs/op  │ allocs/op   vs base                │
ParsePredicates-8   198.00 ± 0%   33.00 ± 0%  -83.33% (p=0.000 n=10)

@AlexanderYastrebov AlexanderYastrebov changed the title eskip: optimize lexer performance eskip: improve lexer performance Nov 28, 2023
@szuecs
Copy link
Member

szuecs commented Nov 29, 2023

return scalars instead of token to reduce allocations

how much does this speedup?
I think the code is already ugly if you have more than 2 return values and now we have increased from 3 to 4.

@szuecs
Copy link
Member

szuecs commented Dec 1, 2023

👍

@MustafaSaber
Copy link
Member

👍

@AlexanderYastrebov
Copy link
Member Author

AlexanderYastrebov commented Dec 1, 2023

return scalars instead of token to reduce allocations

how much does this speedup?
I think the code is already ugly if you have more than 2 return values and now we have increased from 3 to 4.

Reverted to return token (benchmark results did not change). Looks like compiler can figure out that it does not need to allocate the token if its not stored anywhere, previously there was:

l.lastToken = &t
)

@szuecs
Copy link
Member

szuecs commented Dec 1, 2023

👍

@szuecs
Copy link
Member

szuecs commented Dec 1, 2023

Thanks for checking this @AlexanderYastrebov !

@AlexanderYastrebov AlexanderYastrebov added the major moderate risk, for example new API, small filter changes that have no risk like refactoring or logs label Dec 27, 2023
@sepehrdaddev
Copy link
Member

@AlexanderYastrebov seems like the fuzzer found an index out of bound in the code, pls check the workflow.
specifically the in the ParsePredicates() function.

@sepehrdaddev
Copy link
Member

sepehrdaddev commented Jan 12, 2024

ParsePredicates() panics when given ///

here is how reproduce it:

diff --git a/eskip/eskip_test.go b/eskip/eskip_test.go
index 249dced0..f9a75270 100644
--- a/eskip/eskip_test.go
+++ b/eskip/eskip_test.go
@@ -388,6 +388,9 @@ func TestPredicateParsing(t *testing.T) {
        }, {
                title: "star notation",
                input: `*`,
+       }, {
+               title: "panics",
+               input: `///`,
        }} {
                t.Run(test.title, func(t *testing.T) {
                        p, err := ParsePredicates(test.input)
   

this line is causing the panic: https://github.com/zalando/skipper/pull/2755/files#diff-c7c35e59a50fd3d4f06f22410b38b31f543ff4a988b500468920cf3e07a943f4R698

@AlexanderYastrebov
Copy link
Member Author

AlexanderYastrebov commented Jan 12, 2024

@sepehrdaddev Nice one! Did you get the /// payload from the logs

==72==ERROR: AddressSanitizer: ABRT on unknown address 0x000000000048 (pc 0x0000005e4481 bp 0x10c000633bd0 sp 0x10c000633bb8 T0)
SCARINESS: 10 (signal)
    #0 0x5e4481 in runtime.raise.abi0 runtime/sys_linux_amd64.s:154

DEDUP_TOKEN: runtime.raise.abi0
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: ABRT runtime/sys_linux_amd64.s:154 in runtime.raise.abi0
==72==ABORTING
MS: 1 CopyPart-; base unit: 994af86bef6520042698904387522e903ba5149f
0x2f,0x2f,0x2f,
///

?

@sepehrdaddev
Copy link
Member

sepehrdaddev commented Jan 12, 2024

yes, I got it from the logs (the same log you posted).

AlexanderYastrebov added a commit that referenced this pull request Jan 12, 2024
Test case from #2755 discovered by fuzzing fails.

Signed-off-by: Alexander Yastrebov <[email protected]>
@AlexanderYastrebov
Copy link
Member Author

master does not fail properly on this payload, I've created #2849

@AlexanderYastrebov AlexanderYastrebov marked this pull request as draft January 12, 2024 14:10
@sepehrdaddev
Copy link
Member

sepehrdaddev commented Jan 12, 2024

for reference, the fuzzer found multiple payloads that crash (multiple iterations of /), here is another payload that causes it to crash that I found a bit interesting since it was different than others:
"\x2f\x2f\x00\x00\x00\xe6\xfe\x00\x00\x2f\x00\x00\x00\x00\x00\x00\x00\xe6\xfe\x00\x00\x2f\x00\x00\x00\x00"

AlexanderYastrebov added a commit that referenced this pull request Jan 12, 2024
Test case from #2755 discovered by fuzzing fails.

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jan 12, 2024
Add test cases from #2755 discovered by fuzzing.

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jan 12, 2024
Add test cases from #2755 discovered by fuzzing.

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jan 12, 2024
Add test cases from #2755 discovered by fuzzing.

Signed-off-by: Alexander Yastrebov <[email protected]>
* do not use fmt.Sprintf in ParseFilters and ParsePredicates
* avoid allocations for fixed tokens
* optimize scanWhile loop
* add scanEscaped fast path
* optimize scanRegexp and scanEscaped using strings.Builder
* optimize scanWhitespace
* optimize selectScanner using switch

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                  │   HEAD~1    │                HEAD                 │
                  │   sec/op    │   sec/op     vs base                │
ParsePredicates-8   29.88µ ± 3%   11.33µ ± 2%  -62.09% (p=0.000 n=10)

                  │    HEAD~1    │                 HEAD                 │
                  │     B/op     │     B/op      vs base                │
ParsePredicates-8   4.863Ki ± 0%   2.008Ki ± 0%  -58.71% (p=0.000 n=10)

                  │   HEAD~1    │                HEAD                │
                  │  allocs/op  │ allocs/op   vs base                │
ParsePredicates-8   198.00 ± 0%   33.00 ± 0%  -83.33% (p=0.000 n=10)
```

Signed-off-by: Alexander Yastrebov <[email protected]>
@AlexanderYastrebov AlexanderYastrebov marked this pull request as ready for review January 15, 2024 18:18
@AlexanderYastrebov
Copy link
Member Author

I've fixed the code and will followup in #2866

@sepehrdaddev
Copy link
Member

I ran the fuzzer for 30 mins and it didn't find anything so I think its safe to say that its fixed.

@sepehrdaddev
Copy link
Member

👍

1 similar comment
@szuecs
Copy link
Member

szuecs commented Jan 16, 2024

👍

@szuecs szuecs merged commit d468a59 into master Jan 16, 2024
14 checks passed
@szuecs szuecs deleted the eskip/optimize-lexer-2 branch January 16, 2024 14:37
AlexanderYastrebov added a commit that referenced this pull request Jan 16, 2024
* use plain ascii instead of unicode package
* use loop for scanSymbol
* call scan functions directly instead of selectScanner to aid inlining

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
                  │    HEAD~1    │                HEAD                 │
                  │    sec/op    │   sec/op     vs base                │
ParsePredicates-8   9.637µ ± 11%   8.894µ ± 4%   -7.71% (p=0.001 n=10)
Parse-8             329.1m ±  4%   272.7m ± 2%  -17.15% (p=0.000 n=10)
geomean             1.781m         1.557m       -12.56%

                  │    HEAD~1    │                HEAD                 │
                  │     B/op     │     B/op      vs base               │
ParsePredicates-8   2.008Ki ± 0%   2.008Ki ± 0%       ~ (p=1.000 n=10)
Parse-8             49.94Mi ± 0%   49.94Mi ± 0%       ~ (p=0.926 n=10)
geomean             320.4Ki        320.4Ki       -0.00%

                  │   HEAD~1    │                 HEAD                 │
                  │  allocs/op  │  allocs/op   vs base                 │
ParsePredicates-8    33.00 ± 0%    33.00 ± 0%       ~ (p=1.000 n=10) ¹
Parse-8             1.100M ± 0%   1.100M ± 0%       ~ (p=0.367 n=10)
geomean             6.025k        6.025k       +0.00%
¹ all samples are equal
```

See previous #2755

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jan 16, 2024
* use plain ascii instead of unicode package
* use loop for scanSymbol
* call scan functions directly instead of selectScanner to aid inlining

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
                  │    HEAD~1    │                HEAD                 │
                  │    sec/op    │   sec/op     vs base                │
ParsePredicates-8   9.637µ ± 11%   8.894µ ± 4%   -7.71% (p=0.001 n=10)
Parse-8             329.1m ±  4%   272.7m ± 2%  -17.15% (p=0.000 n=10)
geomean             1.781m         1.557m       -12.56%

                  │    HEAD~1    │                HEAD                 │
                  │     B/op     │     B/op      vs base               │
ParsePredicates-8   2.008Ki ± 0%   2.008Ki ± 0%       ~ (p=1.000 n=10)
Parse-8             49.94Mi ± 0%   49.94Mi ± 0%       ~ (p=0.926 n=10)
geomean             320.4Ki        320.4Ki       -0.00%

                  │   HEAD~1    │                 HEAD                 │
                  │  allocs/op  │  allocs/op   vs base                 │
ParsePredicates-8    33.00 ± 0%    33.00 ± 0%       ~ (p=1.000 n=10) ¹
Parse-8             1.100M ± 0%   1.100M ± 0%       ~ (p=0.367 n=10)
geomean             6.025k        6.025k       +0.00%
¹ all samples are equal
```

See previous #2755

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jan 17, 2024
* eskip: add BenchmarkParse

Add a benchmark for parsing 10000 routes.

Signed-off-by: Alexander Yastrebov <[email protected]>

* eskip: improve lexer performance 2

* use plain ascii instead of unicode package
* use loop for scanSymbol
* call scan functions directly instead of selectScanner to aid inlining

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
                  │    HEAD~1    │                HEAD                 │
                  │    sec/op    │   sec/op     vs base                │
ParsePredicates-8   9.637µ ± 11%   8.894µ ± 4%   -7.71% (p=0.001 n=10)
Parse-8             329.1m ±  4%   272.7m ± 2%  -17.15% (p=0.000 n=10)
geomean             1.781m         1.557m       -12.56%

                  │    HEAD~1    │                HEAD                 │
                  │     B/op     │     B/op      vs base               │
ParsePredicates-8   2.008Ki ± 0%   2.008Ki ± 0%       ~ (p=1.000 n=10)
Parse-8             49.94Mi ± 0%   49.94Mi ± 0%       ~ (p=0.926 n=10)
geomean             320.4Ki        320.4Ki       -0.00%

                  │   HEAD~1    │                 HEAD                 │
                  │  allocs/op  │  allocs/op   vs base                 │
ParsePredicates-8    33.00 ± 0%    33.00 ± 0%       ~ (p=1.000 n=10) ¹
Parse-8             1.100M ± 0%   1.100M ± 0%       ~ (p=0.367 n=10)
geomean             6.025k        6.025k       +0.00%
¹ all samples are equal
```

See previous #2755

Signed-off-by: Alexander Yastrebov <[email protected]>

---------

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Feb 5, 2024
Create a copy of string token value in lexer
to avoid referencing input data array.

The string value becomes a filter or predicate argument or an endpoint value.
Using such value as a long-living cache key may create a memory leak.

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                  │   HEAD~1    │                HEAD                │
                  │   sec/op    │   sec/op     vs base               │
ParsePredicates-8   7.656µ ± 6%   8.281µ ± 3%  +8.16% (p=0.002 n=10)
Parse-8             241.0m ± 1%   252.9m ± 6%  +4.96% (p=0.000 n=10)
geomean             1.358m        1.447m       +6.55%

                  │    HEAD~1    │                 HEAD                 │
                  │     B/op     │     B/op      vs base                │
ParsePredicates-8   1.961Ki ± 0%   2.352Ki ± 0%  +19.92% (p=0.000 n=10)
Parse-8             49.02Mi ± 0%   58.86Mi ± 0%  +20.07% (p=0.000 n=10)
geomean             313.7Ki        376.5Ki       +20.00%

                  │   HEAD~1    │                HEAD                 │
                  │  allocs/op  │  allocs/op   vs base                │
ParsePredicates-8    32.00 ± 0%    52.00 ± 0%  +62.50% (p=0.000 n=10)
Parse-8             1.080M ± 0%   1.330M ± 0%  +23.15% (p=0.000 n=10)
geomean             5.879k        8.316k       +41.46%

        │    HEAD~1    │               HEAD               │
        │   bytes/op   │   bytes/op    vs base            │
Parse-8   15.99Mi ± 0%   15.99Mi ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

Followup on #2755

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Feb 5, 2024
Create a copy of string token value in lexer
to avoid referencing input data array.

The string value becomes a filter or predicate argument or an endpoint value.
Using such value as a long-living cache key may create a memory leak.

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                  │   HEAD~1    │                HEAD                │
                  │   sec/op    │   sec/op     vs base               │
ParsePredicates-8   7.656µ ± 6%   8.281µ ± 3%  +8.16% (p=0.002 n=10)
Parse-8             241.0m ± 1%   252.9m ± 6%  +4.96% (p=0.000 n=10)
geomean             1.358m        1.447m       +6.55%

                  │    HEAD~1    │                 HEAD                 │
                  │     B/op     │     B/op      vs base                │
ParsePredicates-8   1.961Ki ± 0%   2.352Ki ± 0%  +19.92% (p=0.000 n=10)
Parse-8             49.02Mi ± 0%   58.86Mi ± 0%  +20.07% (p=0.000 n=10)
geomean             313.7Ki        376.5Ki       +20.00%

                  │   HEAD~1    │                HEAD                 │
                  │  allocs/op  │  allocs/op   vs base                │
ParsePredicates-8    32.00 ± 0%    52.00 ± 0%  +62.50% (p=0.000 n=10)
Parse-8             1.080M ± 0%   1.330M ± 0%  +23.15% (p=0.000 n=10)
geomean             5.879k        8.316k       +41.46%

        │    HEAD~1    │               HEAD               │
        │   bytes/op   │   bytes/op    vs base            │
Parse-8   15.99Mi ± 0%   15.99Mi ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

Followup on #2755

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Feb 5, 2024
Create a copy of string token value in lexer
to avoid referencing input data array.

The string value becomes a filter or predicate argument or an endpoint value.
Using such value as a long-living cache key may create a memory leak.

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                  │   HEAD~1    │                HEAD                │
                  │   sec/op    │   sec/op     vs base               │
ParsePredicates-8   7.656µ ± 6%   8.281µ ± 3%  +8.16% (p=0.002 n=10)
Parse-8             241.0m ± 1%   252.9m ± 6%  +4.96% (p=0.000 n=10)
geomean             1.358m        1.447m       +6.55%

                  │    HEAD~1    │                 HEAD                 │
                  │     B/op     │     B/op      vs base                │
ParsePredicates-8   1.961Ki ± 0%   2.352Ki ± 0%  +19.92% (p=0.000 n=10)
Parse-8             49.02Mi ± 0%   58.86Mi ± 0%  +20.07% (p=0.000 n=10)
geomean             313.7Ki        376.5Ki       +20.00%

                  │   HEAD~1    │                HEAD                 │
                  │  allocs/op  │  allocs/op   vs base                │
ParsePredicates-8    32.00 ± 0%    52.00 ± 0%  +62.50% (p=0.000 n=10)
Parse-8             1.080M ± 0%   1.330M ± 0%  +23.15% (p=0.000 n=10)
geomean             5.879k        8.316k       +41.46%

        │    HEAD~1    │               HEAD               │
        │   bytes/op   │   bytes/op    vs base            │
Parse-8   15.99Mi ± 0%   15.99Mi ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

Followup on #2755

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Feb 6, 2024
Create a copy of symbol token value in lexer
to avoid referencing input data array.

The symbol value becomes a filter or predicate name or a route id value.
Using such value as a long-living cache key may create a memory leak.

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                  │    HEAD~1    │                 HEAD                 │
                  │    sec/op    │    sec/op     vs base                │
ParsePredicates-8   17.07µ ±  3%   18.80µ ± 18%  +10.17% (p=0.000 n=10)
Parse-8             538.9m ± 19%   545.5m ± 21%        ~ (p=0.631 n=10)
geomean             3.033m         3.203m         +5.60%

                  │    HEAD~1    │                HEAD                 │
                  │     B/op     │     B/op      vs base               │
ParsePredicates-8   2.352Ki ± 0%   2.375Ki ± 0%  +1.00% (p=0.000 n=10)
Parse-8             58.86Mi ± 0%   61.84Mi ± 0%  +5.05% (p=0.000 n=10)
geomean             376.5Ki        387.8Ki       +3.00%

                  │   HEAD~1    │                HEAD                 │
                  │  allocs/op  │  allocs/op   vs base                │
ParsePredicates-8    52.00 ± 0%    53.00 ± 0%   +1.92% (p=0.000 n=10)
Parse-8             1.330M ± 0%   1.510M ± 0%  +13.53% (p=0.000 n=10)
geomean             8.316k        8.946k        +7.57%

        │    HEAD~1    │               HEAD               │
        │   bytes/op   │   bytes/op    vs base            │
Parse-8   15.99Mi ± 0%   15.99Mi ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

See previous #2903
Followup on #2755

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Feb 6, 2024
Create a copy of symbol token value in lexer
to avoid referencing input data array.

The symbol value becomes a filter or predicate name or a route id value.
Using such value as a long-living cache key may create a memory leak.

```
goos: linux
goarch: amd64
pkg: github.com/zalando/skipper/eskip
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                  │    HEAD~1    │                 HEAD                 │
                  │    sec/op    │    sec/op     vs base                │
ParsePredicates-8   17.07µ ±  3%   18.80µ ± 18%  +10.17% (p=0.000 n=10)
Parse-8             538.9m ± 19%   545.5m ± 21%        ~ (p=0.631 n=10)
geomean             3.033m         3.203m         +5.60%

                  │    HEAD~1    │                HEAD                 │
                  │     B/op     │     B/op      vs base               │
ParsePredicates-8   2.352Ki ± 0%   2.375Ki ± 0%  +1.00% (p=0.000 n=10)
Parse-8             58.86Mi ± 0%   61.84Mi ± 0%  +5.05% (p=0.000 n=10)
geomean             376.5Ki        387.8Ki       +3.00%

                  │   HEAD~1    │                HEAD                 │
                  │  allocs/op  │  allocs/op   vs base                │
ParsePredicates-8    52.00 ± 0%    53.00 ± 0%   +1.92% (p=0.000 n=10)
Parse-8             1.330M ± 0%   1.510M ± 0%  +13.53% (p=0.000 n=10)
geomean             8.316k        8.946k        +7.57%

        │    HEAD~1    │               HEAD               │
        │   bytes/op   │   bytes/op    vs base            │
Parse-8   15.99Mi ± 0%   15.99Mi ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

See previous #2903
Followup on #2755

Signed-off-by: Alexander Yastrebov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major moderate risk, for example new API, small filter changes that have no risk like refactoring or logs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants