Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeTraversingParser does not check int bounds #2189

Closed
saites opened this issue Nov 21, 2018 · 5 comments
Closed

TreeTraversingParser does not check int bounds #2189

saites opened this issue Nov 21, 2018 · 5 comments
Milestone

Comments

@saites
Copy link

saites commented Nov 21, 2018

Similar to #1729, TreeTraversingParser does not perform bounds checks on some JSON values bound to ints.

Using Jackson version 2.9.7, here are several comparisons generated with the following code:

  public static class IntClass {
    public int x;

    @Override
    public String toString() {
      return String.valueOf(x);
    }
  }

  ObjectMapper mapper  = new ObjectMapper();
  void readAndPrint(String _example) {
    String fromTree;
    try {
      JsonNode tree = mapper.readTree(_example);
      fromTree = mapper.readerFor(IntClass.class).readValue(tree).toString();
    } catch (IOException _e) {
      fromTree = _e.getClass().getSimpleName();
    }

    String fromString;
    try {
      fromString = mapper.readerFor(IntClass.class).readValue(_example).toString();
    } catch (IOException _e) {
      fromString = _e.getClass().getSimpleName();
    }

    System.out.printf("|%30s | %30s | %-30s|\n", _example, fromTree, fromString);
  }

  @Test
  public void compareFromTree() {
    System.out.printf("|%30s | %30s | %-30s|\n", "json input", "read from tree", "read from string");
    System.out.println("|-------------------------------|--------------------------------|-------------------------------|");
    readAndPrint("{\"x\": 0}");
    // etc.
  }
json input read from tree read from string
{"x": 0} 0 0
{"x": 10} 10 10
{"x": 1e4} 10000 10000
{"x": 1e10} 2147483647 JsonMappingException
{"x": 1e-1} 0 0
{"x": 2147483648} -2147483648 JsonMappingException
{"x": 2147483649} -2147483647 JsonMappingException
{"x": -2147483649} 2147483647 JsonMappingException
{"x": -4294967295} 1 JsonMappingException
{"x": 0.1} 0 0
{"x": 1.9} 1 1
{"x": 1.9999999999999999} 2 2
{"x": true} MismatchedInputException MismatchedInputException
{"x": {}} MismatchedInputException MismatchedInputException
{"x": []} MismatchedInputException MismatchedInputException
{"x": [0]} MismatchedInputException MismatchedInputException
{"x": "0"} 0 0
{"x": "10"} 10 10
{"x": "1e4"} InvalidFormatException InvalidFormatException
{"x": "1e10"} InvalidFormatException InvalidFormatException
{"x": "1e-1"} InvalidFormatException InvalidFormatException
{"x": "2147483648"} InvalidFormatException InvalidFormatException
{"x": "2147483649"} InvalidFormatException InvalidFormatException
{"x": "-2147483649"} InvalidFormatException InvalidFormatException
{"x": "-4294967295"} InvalidFormatException InvalidFormatException
{"x": "0.1"} InvalidFormatException InvalidFormatException
{"x": "1.9"} InvalidFormatException InvalidFormatException
{"x": "1.9999999999999999"} InvalidFormatException InvalidFormatException
{"x": "true"} InvalidFormatException InvalidFormatException
{"x": "{}"} InvalidFormatException InvalidFormatException
{"x": "[]"} InvalidFormatException InvalidFormatException
{"x": "[0]"} InvalidFormatException InvalidFormatException

Without digging further into the code, it appears if the JSON value is numeric, TreeTraversingParser silently overflows. Maybe this is expected behavior, but to me the inconsistency between reading from a non-tree (reader/string/file, etc) versus directly from a tree seems like a bug. At the very least, it makes it less convenient to do manipulations on a JSON document before binding.

I would expect an exception thrown for the all but the first three examples above, but I do understand there are use-cases for coercing values. Even so, I would expect the coercion logic to be

  • consistent between the parsers (or clearly documented otherwise)
  • consistent between quoted and unquoted values

I'm also curious about the expected behavior when converting non-integral values. Why is true MismatchedInput, but 0.1 is converted? Similarly, why are 0.1, 1e4, and 1e-1 acceptable, but not when in quotes, even though "10" and other quote integers are acceptable?

Thanks for all your hard work on this. I hope this issue doesn't come off as condescending. For our specific use case, we read the value as a tree, validating it against a schema, then using Jackson to bind the tree to an object. While it's true that we can specify type, minimum, and maximum values in the schema, it is prone to mistakes, and there's not necessarily a reason to tie the schema to the language implementation, provided things like overflow consistently result in an exception. Thus, I'm trying to better understand the expectations and limits Jackson has when using the tree parser.

@cowtowncoder
Copy link
Member

Thank you for reporting this: it is valuable, and I did not think it is condescending at all.
This area is quite complicated, partly due to conflicting expectations and wishes by users, partly due to parallel implementations and so on.

Starting with overflow: yes, I think bounds checks should work similarly and throw exception.
I think stream-based parsers can/should act as the initial "expected behavior", baseline.

Coercions: similarly, should be similar (or ideally identical).

And in general I think TreeTraversingParser should at least try to apply checks that TokenBuffer does, as the two are used in similar way for buffering / conversions.

Rules for coercions get trickier, and I wish there was a good overall answer. But I'll try to offer some perspective on likely reasons I implemented things that way.

First, coercion from integral values to booleans were allowed for interoperability (back in the day Perl did not have true and false to map to). But it is not clear that values outside of, say, 0, 1 and -1 should coerce; they do now.
But coercion from true / false to number seems iffy to me.

Coercions across integral/floating-point is something where different users have different preferences. I think int-to-float is less controversial (and less problematic too).
The other direction is up for debate. There are some settings (DeserializationFeatures) used for more/less strict coercions, but support is limited, and probably not applied everywhere.

Now... String-to-number. That is also bit controversial, in general; and in particular difference from floating-point-as-String to int. I am thinking that this is 2-part coercion and although it is not necessarily consistent, it seems like limiting could be reasonable (either coerce from floating-point OR from String, but not both). But I think it is one of those cases that grew over time, i.e. not necessarily designed as a system (well obviously... :) ).

Ok. Having said that, I would be interested in solid improvements, one piece/aspect at a time.
Main limitations here, I think, is backwards compatibility.
We can make only limited changes in 2.9.x patch releases; but there will be 2.10.0, which allows bit more room for changes. For example, bounds checks should go in 2.10.0, but not in patch, just because risk for patches is bigger (I try to keep changes to behavior to minimum, to allow "blind" upgrade of patch versions).

And then there is work for 3.0 (master), in which we can actually solve most issues.

This is why dividing improvements into separate issues may help.

I think this issue works, then, for int / long bounds checks, and other aspects can be covered by new issues?

@cowtowncoder
Copy link
Member

One more thing: reference to TokenBuffer was just a note (to myself too) that similar issues probably affect mapper.convertValue(), which uses buffer as generator first, then parser.

@saites
Copy link
Author

saites commented Nov 26, 2018

Thanks for your reply. I understand how projects like these grow over time, and since you're supporting a public API, I certainly get that you want to keep compatibility-breaking changes at a minimum.

I'm happy to hear I'm not totally misunderstanding the library use, but in a way, a bit sad that I won't have a super quick/simple fix to my specific problem. For now, I'll recommend my team use stricter schemas and convert JsonNodes back to Strings before unmarshaling, even though it's less efficient. That should ensure we're processing what we expect, though some of the schema work will require diligence and more careful testing.

I think this issue works, then, for int / long bounds checks, and other aspects can be covered by new issues?

Makes sense to me; use this issue for tracking bounds checks (presumably for 2.10.0) and track other changes elsewhere.

Is it reasonable for the various implementations to use a common base? I'm not super familiar with the code, so I hope i'm not too confused. I'm guessing the split is between streaming vs non-streaming encoders/decoders? I'd expect it to be possible to put both in a common underlying representation and perform coercions/bounds checks there. What might be nice is modular checks/coercions that can be swapped out for more or less strict checking, as desired, but this might be less reasonable from an efficiency standpoint, which is what I assume is the basis for the split in the first place.

What do you think is the best way forward for 3.0? What kind of help do you need to get there?

@cowtowncoder
Copy link
Member

Quick note: I am actually working on this now, to add tests first for TreeTraversingParser, then fixing. As per earlier notes, methods in JsonNode (like intValue()) will probably only do bounds check with 3.0, but TreeTraversingParser need not wait that.

@cowtowncoder cowtowncoder changed the title TreeTraversingParser does not check int bounds TreeTraversingParser does not check int bounds Jan 17, 2019
@cowtowncoder cowtowncoder added this to the 2.10.0 milestone Jan 17, 2019
cowtowncoder added a commit that referenced this issue Jan 17, 2019
@cowtowncoder
Copy link
Member

Ok: so, I did add bounds-checks for int and long access for TreeTraversingParser, and matching tests. So that part should now be more consistent.

Other questions wrt coercions are valid, and it might be good file a new issue. I plan on tackling some aspects regardless, but I think my main concern has to do with configurability aspects, followed by documentation. Implementation should not be as difficult once configurability for specific desired/undesired coercions is defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants