Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(decoder): avoid lots of calls to ensure data and field length/offset lookups #144

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mkjois
Copy link

@mkjois mkjois commented Mar 23, 2023

I've made three changes (three commits) to reduce the number of calls to ensureReadable() and to the LengthOffsetCache internal maps. We've seen 2x speed improvement in TCString.decode() and 2% overall CPU usage improvement in our production workload, which at high scale results in tangible cost savings.

Just to note, our use case doesn't benefit much from lazy decoding, since we need to decode the vendor consent, vendor legitimate interest, and publisher restrictions for almost all requests, and those three seem to dominate the decoding time in our profiling measurements when TCStringV2.hashCode() is invoked upon eager decoding.

There is no reason to compute the exact same field lengths and offsets at every
iteration of these loops. At high scale, this results in tons of redundant
lookups into the `LengthOffsetCache`.
Implements `readBits64(int)` with new unit tests, and uses all variants of the
bit reading methods to implement reading into a bit set with larger chunks.

Also adds a `resultShift` parameter to make the implementation of reading vendor
consent much easier, i.e. just need to set this to 1 instead of the default 0.
Implements `readBits32(int)` with new unit tests, and eliminate half the reads
for publisher restrictions by reading two fields at a time when possible. Also
uses the new bit set reading implementation for vendor consent. All tests pass.
@mkjois
Copy link
Author

mkjois commented Mar 23, 2023

Tagging some common contributors to help review: @laktech @imayankmishra @iabmayank @srinivas81

@laktech
Copy link
Collaborator

laktech commented Mar 23, 2023

thanks! i'll be able to review in a few days.

@mkjois
Copy link
Author

mkjois commented Mar 24, 2023

I also have another change on top of this where we make sure to use primitive int in decoding everywhere instead of Integer. Looks like that can squeeze out another 0.5% overall on our production workload. I'll submit that as another PR.

@mkjois
Copy link
Author

mkjois commented Mar 24, 2023

@laktech As promised, I've made a couple more PRs on top of this to test even further optimizations:

  1. (perf(decoder): use primitive integers mkjois/iabtcf-java#1) Avoids any boxed integers for a ~25% speedup.
  2. (perf(decoder): leaner bit set with less invariant checking mkjois/iabtcf-java#2) A more speculative change to avoid lots of invariant checking in BitSet for a ~15% speedup.

Let me know what you think, when you're ready.

@laktech
Copy link
Collaborator

laktech commented Apr 10, 2023

hey, are you able to share the benchmark that you're running?

@mkjois
Copy link
Author

mkjois commented Apr 10, 2023

@laktech It wasn't a proper offline benchmark like with JMH or anything. I basically just tried it with a production-like workload as part of a much larger code path, and viewed some flame graph profiles. If you have any better JMH or other benchmarks set up, please feel free to test this out.

@mkjois
Copy link
Author

mkjois commented Apr 19, 2023

@laktech Any progress in reviewing?

@ChristopherWirt
Copy link

Nice

@lukhar
Copy link

lukhar commented May 25, 2023

Hey, any progress on reviewing this?

@mkjois
Copy link
Author

mkjois commented Jul 11, 2023

@laktech Another bump 🙏

@laktech
Copy link
Collaborator

laktech commented Aug 28, 2023

sorry i stepped away from this for a few months. i'll be reviewing this once again.

if (isRangeEntry) {
int endVendorId = bbv.readBits16(offset);
offset += FieldDefs.START_OR_ONLY_VENDOR_ID.getLength(bbv);
final int content = bbv.readBits32(offset);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is significant code here to eliminate a call to readBits16. I don't think it's worth it unless you can show a benchmark that it adds value.

@laktech
Copy link
Collaborator

laktech commented Sep 5, 2023

Overall, this looks great. I just have some concerns around the two areas I commented above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants