Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError when writing BigDecimal #1316

Closed
gmethvin opened this issue Jul 28, 2016 · 8 comments
Closed

OutOfMemoryError when writing BigDecimal #1316

gmethvin opened this issue Jul 28, 2016 · 8 comments

Comments

@gmethvin
Copy link

gmethvin commented Jul 28, 2016

When I've enabled the WRITE_BIGDECIMAL_AS_PLAIN setting on Jackson 2.7.5, Jackson will attempt to write out the whole number, no matter how large the exponent.

For example, the following code:

ObjectMapper mapper = new ObjectMapper().enable(JsonGenerator.Feature.WRITE_BIGDECIMAL_AS_PLAIN);
mapper.writeValueAsString(new java.math.BigDecimal("9.223372E+1010671858"));

triggers the exception:

java.lang.OutOfMemoryError: Java heap space
  at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68)
  at java.lang.StringBuilder.<init>(StringBuilder.java:101)
  at java.math.BigDecimal.toPlainString(BigDecimal.java:2964)
  at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator.writeNumber(WriterBasedJsonGenerator.java:690)
  at com.fasterxml.jackson.databind.ser.std.NumberSerializer.serialize(NumberSerializer.java:45)
  at com.fasterxml.jackson.databind.ser.std.NumberSerializer.serialize(NumberSerializer.java:19)
  at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
  at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3612)
  at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2980)
  ... 23 elided

I know technically Jackson is doing what you're telling it to do (so if you don't feel this is an issue feel free to close it). But it would be nice if WRITE_BIGDECIMAL_AS_PLAIN set a reasonable length on the number, so as not to leave users open to denial of service vulnerabilities.

(Actually, I think this might technically be an issue in jackson-core; let me know if I should resubmit.)

@cowtowncoder
Copy link
Member

Hmmmh. Interesting. Yes, agreed, would be nice to set some limit.
There is some overlap with possibly more general limit of size of document to create (with writeValueAsString() and writeValueAsBytes()), which could handle a wide variety of reasons for large output. But unlike many other root causes, this could allow asymmetric attacks, with small input, so I think some sort of sanity check would be sensible.

Any suggestions for reasonable limit? 1 million? I probably would not make this configurable; users could still override default serializer if they feel they really want ability to print out huge numbers.

Also: there is the choice of either throwing an exception, or quietly reverting to engineering notation. Presumably this would only happen with either malicious reasons, or due to a bug of some kind, so exception probably makes most sense to surface the issue.

@gmethvin
Copy link
Author

gmethvin commented Jul 30, 2016

@cowtowncoder I guess it really depends on how people use that setting. For me a maximum of ten zeroes would be fine, but there's no limit that's going to satisfy everyone. I see that setting mainly as a minor formatting switch to avoid writing things like 1.1E2 instead of 110; most people probably don't want it to write out a 1000-digit number, though.

@cowtowncoder
Copy link
Member

@gmethvin Since memory usage is just one additional char (2 bytes), I don't think limit needs to be particularly low; was thinking something closer to like million digits, purely from technical standpoint.
I would guess something like 1k would suffice for most uses, so limit could be somewhere between those two numbers.

One possibly related question would be that of BigInteger, since you could probably get from engineering notation into BigInteger via coercion. There may, however, be more uses for really long integer values for cryptograhic uses. Still, not in millions of digits range I think.

It may actually make sense to add a SerializationFeature to enable checking, defaulting to enabled for safety.

@gmethvin
Copy link
Author

@cowtowncoder If we made a SerializationFeature for safety, I would err on the lower side, maybe 100-1000 digits max. I don't see any practical benefit to printing out the whole number.

@jroper
Copy link
Member

jroper commented Aug 4, 2016

If you want to not choose an arbitrary limit, but rather one that correlates to a real world limit, I'd put the limit at 20 digits, that's the longest decimal value that can be represented by a long (unsigned or signed, with unsigned you have a 20 digit number, with signed you have a 19 digit number plus a minus sign). Any larger than that and the only primitive type you could use to represent it would be a double, which only has 16 decimal digits of precision.

Also, correct me if I'm wrong, it's not just about large exponents, it's also about small exponents.

I'd seriously question the validity of the use case of representing numbers in cryptographic use cases using BigDecimal - BigDecimal performs incredibly poorly for doing any actual maths, and cryptography relies heavily on high performance mathematical calculations.

A million digits would be too high, that's 2 megabytes, it's entirely conceivable that some json containing an array of objects could have in those objects multiple fields where an attacker could inject such a number - if the system returns 100 objects in the array, each with 5 of these fields, and turns it to a String, you've just caused it to attempt to create a 1GB String.

@gmethvin
Copy link
Author

gmethvin commented Aug 4, 2016

I totally agree a million digits is really high. 20 or something in that general magnitude seems fine to me.

@cowtowncoder
Copy link
Member

I disagree on 20 digits: there is no reason why one could not want to express actually big (or, conversely, very small) numbers: this is what BigDecimal is for. JSON likewise does not define any limits for either magnitude or precisions. So use cases with thousands of digits are not necessarily out of scope, although it is more likely that really big numbers would actually be supported using BigInteger.

But what I see the main problem here is potential for DoS attacks: pass in short representation for a huge number, and trigger its re-serialization. This is the reason I would add checks.

Given that this only applies to specific case of forcing non-scientific notation, I think I am ok with just adding maximum length for now. However, I am not comfortable limiting it to a really small value based on very limited knowledge of actual usage. I could accept 1000; but from my perspective there is very little upside to trying to use a low value.

So as the first step, I'll set limit to 10000. It avoids accidental OOMEs, should be very unlikely to affect valid use cases, and can be changed in future to a lower value if there seems to be need.

@cowtowncoder
Copy link
Member

Just realized that a better place this is with jackson-core, since that's where it can be more reliably caught. So recreated as:

FasterXML/jackson-core#315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants