feat: improved performance of LIKE matching #1049

Cali0707 · 2024-05-01T18:56:30Z

This PR attempts to improve the performance of CESQL LIKE matches by moving away from building a regex and instead taking a simple greedy algorithm approach (which guarantees O(n+m) time complexity to consume the full string, where n is the length of the input string, and m is the length of the pattern).

This change has the added benefit of simplifying the code - we are no longer building and then evaluating a regex, simply looping over a string.

Signed-off-by: Calum Murray <[email protected]>

Cali0707 · 2024-05-03T20:10:16Z

Using the benchmarks in #1050 , I got the following results for this improvement over the original implementation:

benchmark                                                                             old ns/op     new ns/op     delta
BenchmarkTCK/Like_expression/Exact_match_(1)_parse-8                                  14758         8544          -42.11%
BenchmarkTCK/Like_expression/Exact_match_(1)_evaluate-8                               54.0          16.6          -69.32%
BenchmarkTCK/Like_expression/Exact_match_(2)_parse-8                                  12648         9543          -24.55%
BenchmarkTCK/Like_expression/Exact_match_(2)_evaluate-8                               53.9          18.3          -66.09%
BenchmarkTCK/Like_expression/Exact_match_(negate)_parse-8                             14208         9991          -29.68%
BenchmarkTCK/Like_expression/Exact_match_(negate)_evaluate-8                          62.8          24.3          -61.33%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_parse-8                          14502         10052         -30.69%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_evaluate-8                       138           19.4          -85.97%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_parse-8                          13726         9677          -29.50%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_evaluate-8                       142           20.8          -85.41%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_parse-8                          15831         9393          -40.67%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_evaluate-8                       226           27.3          -87.94%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_parse-8                          13356         9529          -28.65%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_evaluate-8                       168           18.7          -88.87%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_parse-8                          14551         8331          -42.75%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_evaluate-8                       14.9          15.4          +3.08%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_parse-8                          11908         8704          -26.91%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_evaluate-8                       14.9          13.6          -9.30%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_parse-8                          13463         9115          -32.30%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_evaluate-8                       220           25.7          -88.29%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_parse-8                          13717         8474          -38.22%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_evaluate-8                       54.1          14.2          -73.69%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_parse-8                          14725         9084          -38.31%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_evaluate-8                       14.9          16.3          +9.37%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_parse-8                          14657         8456          -42.31%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_evaluate-8                       69.3          18.2          -73.73%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_parse-8                          12694         8659          -31.79%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_evaluate-8                       15.1          16.3          +7.93%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_parse-8                          14553         9358          -35.70%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_evaluate-8                       14.9          17.3          +15.60%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_parse-8                          13478         9348          -30.64%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_evaluate-8                       70.8          18.2          -74.29%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_parse-8                          17317         9626          -44.41%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_evaluate-8                       69.3          18.2          -73.74%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_parse-8                          13191         8579          -34.96%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_evaluate-8                       41.4          14.2          -65.56%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_parse-8                 13441         9493          -29.37%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_evaluate-8              55.8          18.9          -66.10%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_parse-8                 14298         9841          -31.17%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_evaluate-8              64.1          26.6          -58.45%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_parse-8                 14786         9259          -37.38%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_evaluate-8              40.1          16.1          -59.84%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_parse-8                 13357         9316          -30.25%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_evaluate-8              14.9          16.1          +7.90%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_parse-8                 14233         9091          -36.13%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_evaluate-8              14.9          15.9          +6.22%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_parse-8                 12699         9316          -26.64%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_evaluate-8              55.4          18.7          -66.28%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_parse-8                 16482         8654          -47.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_evaluate-8              40.1          15.9          -60.44%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_parse-8                 13331         9264          -30.51%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_evaluate-8              14.9          15.9          +6.22%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_parse-8                  16142         9377          -41.91%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_evaluate-8               916           517           -43.54%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_parse-8        18162         10281         -43.39%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_evaluate-8     889           480           -46.02%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_parse-8                  14798         9458          -36.09%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_evaluate-8               125           69.2          -44.72%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_parse-8                  12326         8368          -32.11%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_evaluate-8               155           74.1          -52.23%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_parse-8                  12217         8298          -32.08%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_evaluate-8               129           74.0          -42.73%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_parse-8                 12822         9170          -28.48%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_evaluate-8              84.4          19.2          -77.18%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_parse-8                 13436         8809          -34.44%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_evaluate-8              122           20.6          -83.18%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_parse-8                 12214         10246         -16.11%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_evaluate-8              43.0          15.2          -64.67%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_parse-8                 13674         8945          -34.58%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_evaluate-8              82.5          20.3          -75.39%

benchmark                                                                             old allocs     new allocs     delta
BenchmarkTCK/Like_expression/Exact_match_(1)_parse-8                                  137            94             -31.39%
BenchmarkTCK/Like_expression/Exact_match_(1)_evaluate-8                               0              0              +0.00%
BenchmarkTCK/Like_expression/Exact_match_(2)_parse-8                                  139            94             -32.37%
BenchmarkTCK/Like_expression/Exact_match_(2)_evaluate-8                               0              0              +0.00%
BenchmarkTCK/Like_expression/Exact_match_(negate)_parse-8                             142            99             -30.28%
BenchmarkTCK/Like_expression/Exact_match_(negate)_evaluate-8                          0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_parse-8                          137            94             -31.39%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_parse-8                          136            93             -31.62%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_parse-8                          150            94             -37.33%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_parse-8                          152            94             -38.16%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_evaluate-8                       0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_parse-8                 153            99             -35.29%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_parse-8                 148            94             -36.49%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_parse-8                  162            93             -42.59%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_evaluate-8               2              2              +0.00%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_parse-8        167            98             -41.32%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_evaluate-8     2              2              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_parse-8                  136            93             -31.62%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_evaluate-8               2              2              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_parse-8                  141            94             -33.33%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_evaluate-8               2              2              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_parse-8                  137            94             -31.39%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_evaluate-8               2              2              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_parse-8                 138            91             -34.06%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_parse-8                 134            91             -32.09%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_parse-8                 138            91             -34.06%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_evaluate-8              0              0              +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_parse-8                 142            91             -35.92%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_evaluate-8              0              0              +0.00%

benchmark                                                                             old bytes     new bytes     delta
BenchmarkTCK/Like_expression/Exact_match_(1)_parse-8                                  7240          4688          -35.25%
BenchmarkTCK/Like_expression/Exact_match_(1)_evaluate-8                               0             0             +0.00%
BenchmarkTCK/Like_expression/Exact_match_(2)_parse-8                                  7376          4720          -36.01%
BenchmarkTCK/Like_expression/Exact_match_(2)_evaluate-8                               0             0             +0.00%
BenchmarkTCK/Like_expression/Exact_match_(negate)_parse-8                             7536          4984          -33.86%
BenchmarkTCK/Like_expression/Exact_match_(negate)_evaluate-8                          0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_parse-8                          9328          4712          -49.49%
BenchmarkTCK/Like_expression/Percentage_operator_(1)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_parse-8                          9328          4720          -49.40%
BenchmarkTCK/Like_expression/Percentage_operator_(2)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_parse-8                          9344          4736          -49.32%
BenchmarkTCK/Like_expression/Percentage_operator_(3)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_parse-8                          9328          4720          -49.40%
BenchmarkTCK/Like_expression/Percentage_operator_(4)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_parse-8                          7240          4688          -35.25%
BenchmarkTCK/Like_expression/Percentage_operator_(5)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_parse-8                          7224          4672          -35.33%
BenchmarkTCK/Like_expression/Percentage_operator_(6)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_parse-8                          9344          4736          -49.32%
BenchmarkTCK/Like_expression/Percentage_operator_(7)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_parse-8                          9344          4736          -49.32%
BenchmarkTCK/Like_expression/Percentage_operator_(8)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_parse-8                          8944          4712          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(1)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(2)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(3)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(4)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(5)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(6)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_parse-8                          8960          4720          -47.32%
BenchmarkTCK/Like_expression/Underscore_operator_(7)_evaluate-8                       0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_parse-8                 8416          4736          -43.73%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(1)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_parse-8                 8712          5032          -42.24%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(2)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_parse-8                 8416          4736          -43.73%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(3)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_parse-8                 8384          4712          -43.80%
BenchmarkTCK/Like_expression/Escaped_underscore_wildcards_(4)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_parse-8                 8384          4712          -43.80%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(1)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_parse-8                 8416          4736          -43.73%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(2)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_parse-8                 8416          4736          -43.73%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(3)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_parse-8                 8384          4712          -43.80%
BenchmarkTCK/Like_expression/Escaped_percentage_wildcards_(4)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_parse-8                  11912         4752          -60.11%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_evaluate-8               341           336           -1.47%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_parse-8        12208         5048          -58.65%
BenchmarkTCK/Like_expression/With_access_to_event_attributes_(negated)_evaluate-8     341           336           -1.47%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_parse-8                  7336          4672          -36.31%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(1)_evaluate-8               19            19            +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_parse-8                  7576          4680          -38.23%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(2)_evaluate-8               20            20            +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_parse-8                  7336          4680          -36.21%
BenchmarkTCK/Like_expression/With_type_coercion_from_int_(3)_evaluate-8               20            20            +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_parse-8                 7544          4648          -38.39%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(1)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_parse-8                 7632          4648          -39.10%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(2)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_parse-8                 7544          4648          -38.39%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(3)_evaluate-8              0             0             +0.00%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_parse-8                 8392          4672          -44.33%
BenchmarkTCK/Like_expression/With_type_coercion_from_bool_(4)_evaluate-8              0             0             +0.00%

@dgeorgievski are these improvements good enough for your use case? There are further performance improvements I could make if it is really needed, but they would make the code more complex so I'd prefer not to if there isn't a need

Cali0707 · 2024-05-03T20:10:33Z

cc @pierDipi @duglin

duglin · 2024-05-09T14:26:19Z

sql/v2/expression/like_expression.go

-			chunks = append(chunks, ".*")
+	for textIdx < textLen {
+		// handle escaped characters -> pattern needs to increment two places here
+		if patternIdx < patternLen-1 && pattern[patternIdx] == '\\' &&


I wonder if moving the pattern[patternIdx] == '\\' to be first would speed things up slightly since patternIdx < patternLen-1 will be true most of the time, and I think doing the check for \ is the biggest determining factor, so let's stop on that check as quickly as possible.

oh crap - never mind, I mixed up text* and pattern* variables. Although, I do wonder if you should add a check for the rest of the pattern being empty and then by-pass all of these if's.

Oh yeah, an empty pattern is an interesting case. I guess currently you would just end up in the final else clause and return false. We could definitely add a quick check at the top to check the textLen and patternLen before entering the for loop and all the if statemenets

sql/v2/expression/like_expression.go

duglin · 2024-05-09T15:42:01Z

sql/v2/expression/like_expression.go

+			lastMatchIdx = textIdx
+			patternIdx += 1
+			// greedy match didn't work, try again from the last known match
+		} else if lastWildcardIdx != -1 {


I'm probably wrong, and it's probably due to the nature of % being so greedy, but are we sure that we don't need a stack of "last known wildcard indexes" instead of just 1? I'm wondering about a case where we've seen multiple %'s and then we don't match something and we need to back-up to the first % not just the 2nd %. I can't seem to write a testcase to show the issue, hence why I'm probably wrong, but I thought I'd mention it/ask....

For me the trick to understanding this is that wildcard matching has optimal substructure: if we have already matched part of the input string against part of the pattern string, we only need to use the remaining parts of the pattern string and input string to determine if the entire string matches the pattern or not.

The only slight "exception" to the optimal substructure is that the % character can consume more than one element of the input. So, if at some point our pattern no longer matches the input text, we need to go back to the last wildcard character we saw, as well as the place we were at in the input text when we saw it and try again, consuming one character from the input text in the process.

The key is that we know that we have a valid match up until the last wildcard character we saw, and that apart from going back to that wildcard and consuming more characters from the input text there we can not otherwise move backwards in the pattern or the input text. So, storing an older index would not end up being used, since it would be in part of the input text we had already matched.

Cali0707 · 2024-05-09T18:24:14Z

@duglin @pierDipi since this code is pretty technical, I also ran it (with modifications for % -> * and _ -> ?, since the wildcard characters were different) through the Leetcode pattern matching test suite, and it passed all of their test cases too:

https://leetcode.com/problems/wildcard-matching/submissions/1253803571

We have reasonable coverage of the LIKE expression in the tck tests, but I bet LeetCode has more edge cases than we do, so this makes me feel a bit better about the accuracy of this

Signed-off-by: Calum Murray <[email protected]>

Cali0707 · 2024-05-14T16:08:23Z

cc @embano1

duglin · 2024-05-14T22:51:08Z

@duglin @pierDipi since this code is pretty technical, I also ran it (with modifications for % -> * and _ -> ?, since the wildcard characters were different) through the Leetcode pattern matching test suite, and it passed all of their test cases too:

https://leetcode.com/problems/wildcard-matching/submissions/1253803571

We have reasonable coverage of the LIKE expression in the tck tests, but I bet LeetCode has more edge cases than we do, so this makes me feel a bit better about the accuracy of this

Good thinking!

feat: improved performance of LIKE matching

17059e8

Signed-off-by: Calum Murray <[email protected]>

Cali0707 marked this pull request as ready for review May 3, 2024 20:08

Cali0707 requested a review from a team as a code owner May 3, 2024 20:08

duglin reviewed May 9, 2024

View reviewed changes

sql/v2/expression/like_expression.go Outdated Show resolved Hide resolved

duglin reviewed May 9, 2024

View reviewed changes

Cali0707 added 2 commits May 9, 2024 14:26

cleanup: moved comments

782e776

Signed-off-by: Calum Murray <[email protected]>

perf: added check for empty pattern

19fd3f2

Signed-off-by: Calum Murray <[email protected]>

duglin approved these changes May 14, 2024

View reviewed changes

duglin added 2 commits May 14, 2024 18:51

Merge branch 'main' into improved-like-matching

d8de904

Merge branch 'main' into improved-like-matching

7bc8a63

duglin merged commit 0988325 into cloudevents:main May 14, 2024
9 checks passed

Cali0707 mentioned this pull request Jul 31, 2024

Improve CESQL LIKE expression implementation #1048

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improved performance of LIKE matching #1049

feat: improved performance of LIKE matching #1049

Cali0707 commented May 1, 2024 •

edited

Loading

Cali0707 commented May 3, 2024

Cali0707 commented May 3, 2024

duglin May 9, 2024 •

edited

Loading

duglin May 9, 2024

Cali0707 May 9, 2024

duglin May 9, 2024

Cali0707 May 9, 2024

Cali0707 commented May 9, 2024

Cali0707 commented May 14, 2024

duglin commented May 14, 2024

feat: improved performance of LIKE matching #1049

feat: improved performance of LIKE matching #1049

Conversation

Cali0707 commented May 1, 2024 • edited Loading

Cali0707 commented May 3, 2024

Cali0707 commented May 3, 2024

duglin May 9, 2024 • edited Loading

Choose a reason for hiding this comment

duglin May 9, 2024

Choose a reason for hiding this comment

Cali0707 May 9, 2024

Choose a reason for hiding this comment

duglin May 9, 2024

Choose a reason for hiding this comment

Cali0707 May 9, 2024

Choose a reason for hiding this comment

Cali0707 commented May 9, 2024

Cali0707 commented May 14, 2024

duglin commented May 14, 2024

Cali0707 commented May 1, 2024 •

edited

Loading

duglin May 9, 2024 •

edited

Loading