Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CDATA and comment parse performance #246

Merged

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Mar 3, 2025

Why?

Since <a><!a and <a><!a> are malformed node, they do not need to be checked before comments and CDATA.

Benchmark : comment (after_doctype)

$ benchmark-driver benchmark/parse_comment.yaml
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
       after_doctype     1.306k      5.586k        1.152k       3.569k i/s -     100.000 times in 0.076563s 0.017903s 0.086822s 0.028020s

Comparison:
                    after_doctype
               after:      5585.7 i/s
         after(YJIT):      3568.9 i/s - 1.57x  slower
              before:      1306.1 i/s - 4.28x  slower
        before(YJIT):      1151.8 i/s - 4.85x  slower
  • YJIT=ON : 3.09x faster
  • YJIT=OFF : 4.28x faster

Benchmark : CDATA

$ benchmark-driver benchmark/parse_cdata.yaml
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     1.269k      5.548k        1.053k       3.072k i/s -     100.000 times in 0.078808s 0.018026s 0.094976s 0.032553s
                 sax     1.399k      8.244k        1.220k       4.460k i/s -     100.000 times in 0.071458s 0.012130s 0.081958s 0.022422s
                pull     1.411k      8.319k        1.260k       4.806k i/s -     100.000 times in 0.070883s 0.012021s 0.079335s 0.020809s
              stream     1.420k      8.320k        1.254k       4.728k i/s -     100.000 times in 0.070406s 0.012019s 0.079738s 0.021149s

Comparison:
                              dom
               after:      5547.5 i/s
         after(YJIT):      3071.9 i/s - 1.81x  slower
              before:      1268.9 i/s - 4.37x  slower
        before(YJIT):      1052.9 i/s - 5.27x  slower

                              sax
               after:      8244.0 i/s
         after(YJIT):      4459.9 i/s - 1.85x  slower
              before:      1399.4 i/s - 5.89x  slower
        before(YJIT):      1220.1 i/s - 6.76x  slower

                             pull
               after:      8318.8 i/s
         after(YJIT):      4805.6 i/s - 1.73x  slower
              before:      1410.8 i/s - 5.90x  slower
        before(YJIT):      1260.5 i/s - 6.60x  slower

                           stream
               after:      8320.2 i/s
         after(YJIT):      4728.4 i/s - 1.76x  slower
              before:      1420.3 i/s - 5.86x  slower
        before(YJIT):      1254.1 i/s - 6.63x  slower
  • YJIT=ON : 2.91x - 3.80x faster
  • YJIT=OFF : 4.37x - 5.90x faster

@naitoh naitoh marked this pull request as ready for review March 3, 2025 13:54
@naitoh naitoh requested a review from kou March 3, 2025 13:54
@naitoh naitoh force-pushed the improve_cdata_and_comment_parse_performance branch from 22e66b2 to 00c2478 Compare March 3, 2025 22:59
@naitoh naitoh requested a review from kou March 3, 2025 23:14
## Why?

Since `<a><!a` and `<a><!a>` are malformed node, they do not need to be checked before comments and CDATA.

## Benchmark : comment (after_doctype)
```
$ benchmark-driver benchmark/parse_comment.yaml
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
       after_doctype     1.306k      5.586k        1.152k       3.569k i/s -     100.000 times in 0.076563s 0.017903s 0.086822s 0.028020s

Comparison:
                    after_doctype
               after:      5585.7 i/s
         after(YJIT):      3568.9 i/s - 1.57x  slower
              before:      1306.1 i/s - 4.28x  slower
        before(YJIT):      1151.8 i/s - 4.85x  slower
```
- YJIT=ON : 3.09x faster
- YJIT=OFF : 4.28x faster

## Benchmark : CDATA
```
$ benchmark-driver benchmark/parse_cdata.yaml
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     1.269k      5.548k        1.053k       3.072k i/s -     100.000 times in 0.078808s 0.018026s 0.094976s 0.032553s
                 sax     1.399k      8.244k        1.220k       4.460k i/s -     100.000 times in 0.071458s 0.012130s 0.081958s 0.022422s
                pull     1.411k      8.319k        1.260k       4.806k i/s -     100.000 times in 0.070883s 0.012021s 0.079335s 0.020809s
              stream     1.420k      8.320k        1.254k       4.728k i/s -     100.000 times in 0.070406s 0.012019s 0.079738s 0.021149s

Comparison:
                              dom
               after:      5547.5 i/s
         after(YJIT):      3071.9 i/s - 1.81x  slower
              before:      1268.9 i/s - 4.37x  slower
        before(YJIT):      1052.9 i/s - 5.27x  slower

                              sax
               after:      8244.0 i/s
         after(YJIT):      4459.9 i/s - 1.85x  slower
              before:      1399.4 i/s - 5.89x  slower
        before(YJIT):      1220.1 i/s - 6.76x  slower

                             pull
               after:      8318.8 i/s
         after(YJIT):      4805.6 i/s - 1.73x  slower
              before:      1410.8 i/s - 5.90x  slower
        before(YJIT):      1260.5 i/s - 6.60x  slower

                           stream
               after:      8320.2 i/s
         after(YJIT):      4728.4 i/s - 1.76x  slower
              before:      1420.3 i/s - 5.86x  slower
        before(YJIT):      1254.1 i/s - 6.63x  slower
```
- YJIT=ON : 2.91x - 3.80x faster
- YJIT=OFF : 4.37x - 5.90x faster

Co-authored-by: Sutou Kouhei <[email protected]>
@naitoh naitoh force-pushed the improve_cdata_and_comment_parse_performance branch from 00c2478 to 463fe8d Compare March 4, 2025 09:02
@naitoh naitoh requested a review from kou March 4, 2025 09:09
@kou kou merged commit a5f31c4 into ruby:master Mar 4, 2025
67 checks passed
@naitoh naitoh deleted the improve_cdata_and_comment_parse_performance branch March 4, 2025 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants