Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Fast Double Parser in Jackson #7909

Merged
merged 3 commits into from
Jun 29, 2023
Merged

Conversation

mgodwan
Copy link
Member

@mgodwan mgodwan commented Jun 5, 2023

Description

Enable Jackson Features USE_FAST_DOUBLE_PARSER and USE_FAST_DOUBLE_WRITER

Related Issues

Resolves #7822

Check List

  • New functionality includes testing.
    • All tests pass
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Verified all the content types with _doc API:

  1. Smile
  2. YAML
  3. CBOR
  4. JSON

JSON Document:

{ "account_number": 1,"address": "Some Avenue", "lat": 71.38  }

YAML Document:

---
account_number: 1
address: Some Avenue
lat: 71.38

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

Gradle Check (Jenkins) Run Completed with:

@dblock
Copy link
Member

dblock commented Jun 6, 2023

Let's add to the CHANGELOG something like "Improved performance of parsing ..."?

@reta
Copy link
Collaborator

reta commented Jun 6, 2023

@dblock @mgodwan I was looking into this feature and there are few "dragons" here:

I think this is a good feature to offer but may be we should be careful with using it as default?

@mgodwan
Copy link
Member Author

mgodwan commented Jun 18, 2023

@reta I ran a few benchmarks with nyc taxis where I saw the following differences in throughput without and with this change:

Code Path Indexing Throughput Avg P50 Indexing Latency
Baseline 78162.21,docs/s 876.34 ms
Candidate (Fast Double Parsing Enabled) 81743.31,docs/s (1.04x) 837.26 ms (0.956x)

These runs use JDK17 and show good improvements.

@mgodwan
Copy link
Member Author

mgodwan commented Jun 20, 2023

On the related note, we use Jackson for CBOR/Smile/Yaml (CborXContent/SmileXContent/YamlXContent), should we also switch those to USE_FAST_DOUBLE_WRITER? I think it would make sense.

I have updated the parser configuration for CborXContent, SmileXContent, and YamlXContent as well.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@reta
Copy link
Collaborator

reta commented Jun 20, 2023

@mgodwan conflicts :(

@mgodwan
Copy link
Member Author

mgodwan commented Jun 20, 2023

Resolved the conflicts.

@backslasht
Copy link
Contributor

@mgodwan - Can you please add testing notes for the non JSON formats?

Copy link
Collaborator

@reta reta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dblock are we good with making it default for main and 2.x but have an opt-out option for 2.x users (system property)?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      3 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

@dblock
Copy link
Member

dblock commented Jun 22, 2023

@dblock are we good with making it default for main and 2.x but have an opt-out option for 2.x users (system property)?

For 3.0 (main) looks good.

For 2.x, does it actually change behavior, even on edge cases? I read the above but can't tell what the specific examples are when users would get different results. If those exist, I would say it's a no-go because it would break semver. It should be optional.

@reta
Copy link
Collaborator

reta commented Jun 22, 2023

For 2.x, does it actually change behavior, even on edge cases? I read the above but can't tell what the specific examples are when users would get different results. If those exist, I would say it's a no-go because it would break semver. It should be optional.

Correct, we could address that at backport with system property

Signed-off-by: Mohit Godwani <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.test.rest.ClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/string profiler via global ordinals}
      1 org.opensearch.gateway.RecoveryFromGatewayIT.testReuseInFileBasedPeerRecovery

@mgodwan
Copy link
Member Author

mgodwan commented Jun 22, 2023

Thanks @dblock @reta

Could you help with merging this to main?
I can take up the required changes for backport once workflow is triggered.

@reta
Copy link
Collaborator

reta commented Jun 22, 2023

Could you help with merging this to main?

@dblock sign off?

@dblock dblock merged commit 51da85c into opensearch-project:main Jun 29, 2023
@dblock
Copy link
Member

dblock commented Jun 29, 2023

I merged it, however I am still unclear what the edge cases are. I would at least add a test that demonstrates the handling of the edge case (a test that would fail if we remove this code).

@reta
Copy link
Collaborator

reta commented Jun 29, 2023

@dblock there is a difficult one (at least it seems to me):

  • we could look for conversion edge cases (still have to find some), where JDK conversion != FastDoubleConversion, that could actually change from JDK to JDK (from the discussions above)
  • we could look for conversion time edge cases (more like JMH-oriented test cases) where we assert that FastDoubleConversion should not take longer than X

Which ones you are thinking about? Or something very different?

@mgodwan
Copy link
Member Author

mgodwan commented Jul 3, 2023

we could look for conversion edge cases (still have to find some), where JDK conversion != FastDoubleConversion, that could actually change from JDK to JDK (from the discussions above)

For serialization, some cases which yield different output on jdk-17 and with jackson flag enabled are listed here.

e.g.

JDK 17 behaviour: 8.41E21 --> 8.409999999999999E21
JDK 20 behaviour: 8.41E21 --> 8.41E21
Jackson flag behaviour: 8.41E21 --> 8.41E21

I could not find any cases for JDK-20 since jdk-20 and jackson seem to be using the same algorithm for converting java double to string.

@mgodwan mgodwan deleted the fdp branch July 3, 2023 17:29
@mgodwan
Copy link
Member Author

mgodwan commented Jul 3, 2023

I merged it, however I am still unclear what the edge cases are. I would at least add a test that demonstrates the handling of the edge case (a test that would fail if we remove this code).

@dblock

  • In this PR, we have only enabled jackson flag for converting from string to double[uses USE_FAST_DOUBLE_PARSER]. This yields better performance during document/query parsing cases.
  • We have not enabled jackson implementation for double to string conversion [uses USE_FAST_DOUBLE_WRITER] since JDK-20 and jackson use the same algorithm, and we may not see any performance advantage (e.g. search query responses containing floating point numbers) with this flag in OS 3.0 which bundles with JDK 20
  • The string to double procedure does not have any edge cases we are aware of (the algorithm follows IEEE standard) while there are edge cases in highlighted in double to string are covered in the above comment.

reta pushed a commit that referenced this pull request Jul 6, 2023
…erty (#8467)

* Enable Fast Double Parser in Jackson

Signed-off-by: Mohit Godwani <[email protected]>

* Update chaneglog

Signed-off-by: Mohit Godwani <[email protected]>

* Add fast double writer behind system property

Signed-off-by: Mohit Godwani <[email protected]>

* Extract system property out to a constant

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
@dblock
Copy link
Member

dblock commented Jul 6, 2023

@mgodwan So you're saying that the two outcomes of this change are 1) improved performance, and 2) fixing a bug such as specifying 8.41E21 would produce 8.409999999999999E21? If so YOLO.

baba-devv pushed a commit to baba-devv/OpenSearch that referenced this pull request Jul 29, 2023
* Enable Fast Double Parser in Jackson

Signed-off-by: Mohit Godwani <[email protected]>

* Use Fast Double Parser in jackson

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Mohit Godwani <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
* Enable Fast Double Parser in Jackson

Signed-off-by: Mohit Godwani <[email protected]>

* Use Fast Double Parser in jackson

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance: Enable fast parsing/writing for double
4 participants