Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Denial of service when parsing a JSON object with an unexpected field that has a big number #118

Open
plokhotnyuk opened this issue Oct 20, 2022 · 10 comments

Comments

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Oct 20, 2022

Sub-quadratic decreasing of throughput when length of the JSON object is increasing

On contemporary CPUs parsing of such JSON object with an additional field that has of 1000000 decimal digits (~1Mb) can took ~13 seconds:

[info] Benchmark                               (size)   Mode  Cnt         Score          Error  Units
[info] ExtractFieldsReading.weePickle               1  thrpt    3   3302801.285 ±  1260746.202  ops/s
[info] ExtractFieldsReading.weePickle              10  thrpt    3   3120990.432 ±   202607.161  ops/s
[info] ExtractFieldsReading.weePickle             100  thrpt    3    814496.851 ±    15939.790  ops/s
[info] ExtractFieldsReading.weePickle            1000  thrpt    3     49105.311 ±    21893.732  ops/s
[info] ExtractFieldsReading.weePickle           10000  thrpt    3       674.598 ±       44.746  ops/s
[info] ExtractFieldsReading.weePickle          100000  thrpt    3         7.022 ±        0.054  ops/s
[info] ExtractFieldsReading.weePickle         1000000  thrpt    3         0.070 ±        0.002  ops/s

Probably the root cause is in the jackson core library, but reporting hear hoping that a hot fix to just skip unwanted values can be applied on the weePickle level.

Flame graph for CPU cycles at size=1000000

image

weePickle version

1.8.0

Scala version

2.13.10

JDK version

17

Steps to reproduce

To run that benchmarks on your JDK:

  1. Install latest version of sbt and/or ensure that it already installed properly:
sbt about
  1. Clone jsoniter-scala repo:
git clone --depth 1 https://github.com/plokhotnyuk/jsoniter-scala.git
  1. Enter to the cloned directory and checkout for the specific branch:
cd jsoniter-scala
git checkout jackson-DoS-by-a-big-number
  1. Run a benchmark that reproduce the issue:
sbt clean 'jsoniter-scala-benchmarkJVM/jmh:run -wi 3 -i 3 ExtractFieldsReading.weePickle'
@plokhotnyuk
Copy link
Contributor Author

A similar issue can be reproduced using jackson-module-scala: FasterXML/jackson-module-scala#609

@htmldoug
Copy link
Contributor

Thanks for the detailed report.

I see that this branch is using jackson-core 2.14-rc2. Do you get the same results with the jackson-core 2.13.x version used by weepickle?

@htmldoug
Copy link
Contributor

From a quick glance at the source code, it seems like jackson-core 2.13 should be affected as well.

hoping that a hot fix to just skip unwanted values can be applied on the weePickle level.

weepickle is push-based, so we can't easily do that. The closest option I can think of without breaking binary compatibility would be to force all numbers through newly allocated strings, then handle number parsing on the visitor side. However, I expect that would significantly hurt common-case performance.

Ideally, I'd prefer to see a fix in jackson-core.

@htmldoug
Copy link
Contributor

Another non-ideal but slightly better workaround might be to call JsonParser.getLongValue blindly and catch the InputCoercionException without checking getNumberType first. This would at least keep primitives performing well at the cost of expected BigIntegers performing slightly worse.

@pjfanning
Copy link

pjfanning commented Oct 23, 2022

I put small improvement into jackson-core for the v2.14.0 release so that parsing very large bigints should be a little bit faster - but still has sub-quadratic performance.

I'm not sure if jackson code benefits from rewriting ParserBase.getNumberType to make the number parsing lazy.

I would suggest that you try your idea of calling getLongValue instead of getNumberType.

For me, the next priority in jackson-core is to have a limit on the number of chars in a number - FasterXML/jackson-core#815 - @cowtowncoder might have other ideas on this topic

Edit: I have added FasterXML/jackson-core#828 which might be enough to solve this - but it might be a bit late for Jackson v2.14

@cowtowncoder
Copy link

Quick note: I did merge @pjfanning's nice patch mentioned above, and it will be included in 2.14.0.
It solves the problem I did not realize we had: that getNumberType() would indeed eagerly decode BigInteger values (int and long are fine since their decoding eagerly has marginal downside, simplifying code -- but BigInteger is different).

Limits on token lengths and various limits will have to wait until 2.15, fwtw.

@pjfanning
Copy link

@htmldoug the PR I mentioned yesterday is merged and is built into latest jackson-core 2.14.0-SNAPSHOT jar

@pjfanning
Copy link

@htmldoug would it be possible to upgrade to jackson v2.14.1?

@pjfanning
Copy link

The jackson v2.14.1 changes seem to help make the performance degradation less dramatic. On my laptop, I got these numbers.

[info] Benchmark                        (size)   Mode  Cnt        Score         Error  Units
[info] ExtractFieldsReading.weePickle        1  thrpt    3  2448991.650 ±  369565.879  ops/s
[info] ExtractFieldsReading.weePickle       10  thrpt    3  2213383.858 ±  149730.109  ops/s
[info] ExtractFieldsReading.weePickle      100  thrpt    3  1729895.818 ± 1231309.852  ops/s
[info] ExtractFieldsReading.weePickle     1000  thrpt    3   628557.494 ±   26224.130  ops/s
[info] ExtractFieldsReading.weePickle    10000  thrpt    3    72562.523 ±     700.455  ops/s
[info] ExtractFieldsReading.weePickle   100000  thrpt    3     3539.489 ±     563.178  ops/s
[info] ExtractFieldsReading.weePickle  1000000  thrpt    3      225.537 ±      11.495  ops/s
[success] Total time: 211 s (03:31), completed 13 Dec 2022, 01:59:39

@htmldoug
Copy link
Contributor

@pjfanning thanks for your improvements and verifying the changes.

would it be possible to upgrade to jackson v2.14.1?

Yes, this needs to happen. I'm looking forward to it for #117 as well.

I need to redo the maven central publishing which is mostly why that hasn't happened yet. In the mean time, consumers who want the latest version can pull in the latest jackson-core as jsoniter-scala has done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants