Skip to content

Simd json encode #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: master
Choose a base branch
from
Open

Simd json encode #120

wants to merge 46 commits into from

Conversation

nielsdos
Copy link
Owner

@nielsdos nielsdos commented Feb 3, 2025

TODO:

  • more performant resolver (too much call overhead, use template-style code) SEMI-DONE, it's not worth going for the extra few percentages because that will create a lot of code bloat.
  • verifications
  • bitset stuff: does the widening worsen performance? (A: no)
  • benchmarks

@mvorisek
Copy link

mvorisek commented Feb 4, 2025

I have no C toolchain setup so here is my idea in human words:

do {
#if simd support
    if (len >= sizeof(__m128i)) {
        calc mask;
        if (mask is "all not-to-be-escaped") {
            pos += sizeof(__m128i);
            continue;
        }
        mask_length = sizeof(__m128i);
    } else {
        set mask to "all maybe to-be-escaped"
        mask_length = len;
    }
    
    for (i = 0; i < mask_length; i++) {
        if (mask[i] == "not-to-be-escaped") {
            pos++; // OR i += -1 + , pos += "not-to-be-escaped" length calculated from mask if faster
            continue;
        }
        
        if (pos) {
            add unescaped;
        }

#endif
        handle single "maybe to-be-escaped" character as before;
#if simd support
    }
#endif
} while (len);

#if simd support
if (pos) {
    add unescaped;
}
#endif

The keypoint is to set/use mask even for non-SIMD (no SIMD support or too short input) to be able to process the "maybe to-be-escaped" characters at one place as before. That should generate shorter code, optimal performance for "never-to-be-escaped" chracters and hopefully introduce minimal overhead for other characters.

@nielsdos
Copy link
Owner Author

nielsdos commented Feb 4, 2025

A couple of points regarding your suggestion:

  • I already use the mask to know where to put the escape characters. I do this by looping over all set bits, appending in bulk from the input what we didn't append yet, and then escaping the new character. This code is here:
    do {
    /* Note that we shift the input forward, so we have to shift the mask as well,
    * beyond the to-be-escaped character */
    int len = zend_ulong_ntz(mask);
    mask >>= len + 1;
    smart_str_appendl(buf, s, len + pos);
    pos += len;
    us = (unsigned char)s[pos];
    s += pos + 1; /* skip 'us' too */
    pos = 0;
    bool handled = php_json_printable_ascii_escape(buf, us, options);
    ZEND_ASSERT(handled == true);
    } while (mask != 0);
  • The suggestion to use the mask even for short inputs will still make a slowdown. In my testing the overhead of looping over the bits and performing the check is higher than just doing a "dumb" byte per byte loop.
  • The remaining performance "issue" is regarding the case where we have to escape (almost) everything. The performance for not needing to escape is pretty good.
  • My code would've been closer to your pseudocode if we didn't need to cater for non-printable or UTF-8 characters. This adds an extra unavoidable complexity to the control flow of the algorithm.

@nielsdos
Copy link
Owner Author

nielsdos commented Feb 5, 2025

VTune shows some DSB stalls, that I have tried to improve by changing code layout. I also see some bad speculation (machine check) and data stalls in php_json_printable_ascii_escape (as expected) that I want to investigate.

@nielsdos
Copy link
Owner Author

nielsdos commented Feb 6, 2025

The overhead of the worst case with escapes is now only around 6%, so relatively small.
This is mostly due to tweaking of the algorithm, and tightening the code layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants