Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use fastdoubleparser to improve double parsing speed #54

Merged
merged 4 commits into from
Jan 10, 2023
Merged

Conversation

rudygt
Copy link
Contributor

@rudygt rudygt commented Nov 8, 2022

Description of changes:

as part of the latest updated jackson-databind that recently uptaked the fast double parser into it, we can now take advantage of this to improve our double parsing speed (that we happens very often).

more information about the performance gains from the faster double parser can be found on the original issue at FasterXML/jackson-core#577 (comment)

I can see that in my machine this change helps with the performance, so hopefully we can get some more information from other people results and decide if we want to merge.

thanks!.

Benchmark / Performance:

Original

Reading citylots2
Read 213068 events
EXACT events/sec: 242674.3
WILDCARD events/sec: 190579.6
PREFIX events/sec: 275994.8
SUFFIX events/sec: 278520.3
EQUALS_IGNORE_CASE events/sec: 247465.7
NUMERIC events/sec: 154062.2
ANYTHING-BUT events/sec: 161171.0
COMPLEX_ARRAYS events/sec: 5052.0
PARTIAL_COMBO events/sec: 109942.2
COMBO events/sec: 3000.0
Reading citylots2
Read 213068 events
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 10197
Events/sec: 20895.2
 Rules/sec: 146266.2
Before: 225.2
After: 3056.0
Per rule: -7077
Turning JSON into field-lists...
Finding Rules...
Lines: 213068, Msec: 2011
Events/sec: 105951.3
Before: 236.5
After: 1278.9
Per rule: -2605
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 1340
Events/sec: 159006.0
 Rules/sec: 579099743.3
Reading citylots2
Read 213068 events
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Matched: 52527
Lines: 213068, Msec: 11256
Events/sec: 18929.3
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 9501
Events/sec: 22425.8
 Rules/sec: 156980.9
DEEP EXACT events/sec: 14285.7

Changes

Reading citylots2
Read 213068 events
EXACT events/sec: 247465.7
WILDCARD events/sec: 184155.6
PREFIX events/sec: 282583.6
SUFFIX events/sec: 274218.8
EQUALS_IGNORE_CASE events/sec: 235434.3
NUMERIC events/sec: 154845.9
ANYTHING-BUT events/sec: 160806.0
COMPLEX_ARRAYS events/sec: 6088.0
PARTIAL_COMBO events/sec: 108874.8
COMBO events/sec: 3660.5
Reading citylots2
Read 213068 events
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 10301
Events/sec: 20684.2
 Rules/sec: 144789.4
Before: 211.9
After: 2406.3
Per rule: -5485
Turning JSON into field-lists...
Finding Rules...
Lines: 213068, Msec: 2098
Events/sec: 101557.7
Before: 245.3
After: 1352.7
Per rule: -2768
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 1381
Events/sec: 154285.3
 Rules/sec: 561907064.4
Reading citylots2
Read 213068 events
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Matched: 52527
Lines: 213068, Msec: 10224
Events/sec: 20840.0
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 9186
Events/sec: 23194.9
 Rules/sec: 162364.0
DEEP EXACT events/sec: 11111.1

@baldawar
Copy link
Collaborator

baldawar commented Nov 8, 2022

Thanks @rudygt . On

I can see that in my machine this change helps with the performance, so hopefully we can get some more information from other people results and decide if we want to merge.

I'll run this on my laptop tomorrow and confirm if I see the same results

@@ -85,7 +86,7 @@ Set<NameState> transitionOn(String valString) {
boolean fieldValueIsNumeric = false;
if (hasNumeric.get() > 0) {
try {
final double numerically = Double.parseDouble(valString);
final double numerically = parseDouble(valString);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to follow Class.staticMethod(...) pattern here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ook, I have changed the import

@timbray
Copy link
Collaborator

timbray commented Nov 8, 2022

I found the presentation of the benchmark results in the PR a little hard to figure out. My first cut, when exploring the impact of a chance, is just to run the RealisticCityLots() benchmark. It’s not perfect but it does stress out the flattening process and is actually a pretty good test if your changes are in the Event JSON parsing.

@baldawar
Copy link
Collaborator

baldawar commented Nov 10, 2022

I found the presentation of the benchmark results in the PR a little hard to figure out.

I'm looking at making benchmarks more readable and to fail in case regressions sneak in. For this change I think NUMERIC events/sec: 154845.9 would be the place to look. Also, any other rule patterns that require numeric matching (like in original citylots and citylots2 tests)

I can see that in my machine this change helps with the performance, so hopefully we can get some more information from other people results and decide if we want to merge.

I'll run this on my laptop tomorrow and confirm if I see the same results

Looks like this doesn't lead to meaningful difference. Based on the originating paper https://arxiv.org/pdf/2101.11408.pdf seems like we'd see some benefits but I think the rest of the code makes a significant impact. Will be checking in profiler later this week.

@timbray
Copy link
Collaborator

timbray commented Nov 10, 2022

I'm looking at making benchmarks more readable and to fail in case regressions sneak in. For this change I think NUMERIC events/sec: 154845.9 would be the place to look. Also, any other rule patterns that require numeric matching (like in original citylots and citylots2 tests)

I'm curious: Why numeric? Not necessarily disagreeing but wondering why.

@rudygt
Copy link
Contributor Author

rudygt commented Nov 10, 2022

I think the change only comes into play when there are a lot of ComparableNumbers being created, during the transitions on the ByteMachine, I think calling it parsing maybe was confusing because it has nothing to do with the json parsing speed?.

@@ -85,7 +86,7 @@ Set<NameState> transitionOn(String valString) {
boolean fieldValueIsNumeric = false;
if (hasNumeric.get() > 0) {
try {
final double numerically = Double.parseDouble(valString);
final double numerically = NumberInput.parseDouble(valString);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like ComparableNumber is the bottleneck. I'm not seeing consistent improvement w this change on my laptop, even when I directly use FastDoubleParser.parseDouble(s).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same is true in Quamina BTW. That and the Sprintf.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rudygt just catching up back on this thread, do you think its' still worth merging this in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for letting it hang, I would like to do some isolated benchmarks but currently on vacation, so if we can keep it open for another 20 days or so, if not feel free to close, and I will re open if my tests reveal good results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries. I'll leave this open. enjoy your time-off !

@rudygt
Copy link
Contributor Author

rudygt commented Dec 27, 2022

Hello again @baldawar

I think I found the issue, I was not calling the correct method in the first version of the pull.

Also I added a more formal benchmark to compare the changes and with java 11 I see good improvements.

this is the output of running ComparableNumberBenchmark in my setup, also I did a very similar empirical test as the one we used to validate the fast hex to long changes a while ago in ComparableNumberTest.benchmarkDoubleParsers

Let me know if how the numbers look like for you, and from there we could see if it makes sense to merge.

Thanks.

# JMH version: 1.36
# VM version: JDK 11.0.7, OpenJDK 64-Bit Server VM, 11.0.7+10-LTS
# VM invoker: C:\Program Files\Amazon Corretto\jdk11.0.7_10\bin\java.exe
# VM options: -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2020.1.2\lib\idea_rt.jar=52940:C:\Program Files\JetBrains\IntelliJ IDEA 2020.1.2\bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 2 s each
# Measurement: 5 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: software.amazon.event.ruler.ComparableNumberBenchmark.candidate

# Run progress: 0.00% complete, ETA 00:01:10
# Fork: 1 of 1
# Warmup Iteration   1: 13574.493 ops/s
# Warmup Iteration   2: 15092.026 ops/s
# Warmup Iteration   3: 15172.805 ops/s
# Warmup Iteration   4: 15172.074 ops/s
# Warmup Iteration   5: 15059.023 ops/s
Iteration   1: 15064.241 ops/s
Iteration   2: 15035.477 ops/s
Iteration   3: 14821.209 ops/s
Iteration   4: 14753.557 ops/s
Iteration   5: 14920.381 ops/s


Result "software.amazon.event.ruler.ComparableNumberBenchmark.candidate":
  14918.973 ±(99.9%) 515.175 ops/s [Average]
  (min, avg, max) = (14753.557, 14918.973, 15064.241), stdev = 133.789
  CI (99.9%): [14403.798, 15434.148] (assumes normal distribution)


# JMH version: 1.36
# VM version: JDK 11.0.7, OpenJDK 64-Bit Server VM, 11.0.7+10-LTS
# VM invoker: C:\Program Files\Amazon Corretto\jdk11.0.7_10\bin\java.exe
# VM options: -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2020.1.2\lib\idea_rt.jar=52940:C:\Program Files\JetBrains\IntelliJ IDEA 2020.1.2\bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 2 s each
# Measurement: 5 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: software.amazon.event.ruler.ComparableNumberBenchmark.original

# Run progress: 50.00% complete, ETA 00:00:35
# Fork: 1 of 1
# Warmup Iteration   1: 4738.090 ops/s
# Warmup Iteration   2: 5253.218 ops/s
# Warmup Iteration   3: 5112.347 ops/s
# Warmup Iteration   4: 5322.917 ops/s
# Warmup Iteration   5: 5315.814 ops/s
Iteration   1: 5355.783 ops/s
Iteration   2: 5344.274 ops/s
Iteration   3: 5506.612 ops/s
Iteration   4: 5568.696 ops/s
Iteration   5: 5572.263 ops/s


Result "software.amazon.event.ruler.ComparableNumberBenchmark.original":
  5469.526 ±(99.9%) 432.194 ops/s [Average]
  (min, avg, max) = (5344.274, 5469.526, 5572.263), stdev = 112.239
  CI (99.9%): [5037.332, 5901.719] (assumes normal distribution)


# Run complete. Total time: 00:01:11

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                             Mode  Cnt      Score     Error  Units
ComparableNumberBenchmark.candidate  thrpt    5  14918.973 ± 515.175  ops/s
ComparableNumberBenchmark.original   thrpt    5   5469.526 ± 432.194  ops/s

Process finished with exit code 0

@@ -139,4 +139,38 @@ public void epocTimestampRangeTest() throws Exception {
}
assertTrue(m.isEmpty());
}

@Test()
@Ignore()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed?

Copy link
Contributor Author

@rudygt rudygt Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we dont really need this test merged, my intention was only to validate the new code was performing better, but once we make the change I dont see the need to keep this benchmark at all

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I don't see value in this test at the moment. Happy to take it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soo I took the test out, also fixe the issue with the import and now I think we are good, Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 16 to 17
@State(Scope.Benchmark)
public class ComparableNumberBenchmark {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant on merging this one because it would lead to just one benchmark being disconnected from others. Is it possible to push this into a separate branch? we can merge for this one later, when I've brought rest of the benchmark tests to JMH as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, I did backup my changes adding jmh to another branch.

@baldawar
Copy link
Collaborator

baldawar commented Jan 3, 2023

I'm fine w merging this after the comments are addressed. If there's any update to the perf numbers on the README, feel free to update them as well.

import java.util.Map;
import java.time.Duration;
import java.time.Instant;
import java.util.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the wildcard import?

@baldawar
Copy link
Collaborator

baldawar commented Jan 9, 2023

Gentle reminder to do a minor version bump in pom.xml as part of the PR.

@baldawar baldawar merged commit dfdfa71 into aws:main Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants