Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failing tests for #1173: off-by-one discrepancy in the location of reported parser errors #1175

Merged
merged 1 commit into from
Dec 23, 2023

Conversation

hal7df
Copy link
Contributor

@hal7df hal7df commented Dec 22, 2023

This adds (currently failing) test cases for the issue identified in #1173. I've tried my best at adding a plethora of different variations of invalid JSON, but it's possible I might still be missing some edge cases. I also have written this to test every variation of invalid JSON against the four parser backends identified by @cowtowncoder in the issue comments.

@hal7df
Copy link
Contributor Author

hal7df commented Dec 22, 2023

Curiously, DataInput-based parsers pass all of these tests, because they don't provide byte-/character-level position information. It's possible that they might still be susceptible to the off-by-one error if the error occurs across a line boundary, but I haven't been able to get that behavior to occur yet.

26,
24,
1,
25
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parser backends aren't consistent about the column in this case, because the exact definition of "column" seems to depend on whether the parser is operating on a byte array or a character array. I'm open to suggestions about what to do with this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column should be exactly same for byte/char -backed input, for ASCII characters (for multi-byte characters there is indeed difference). But I think few tests use characters outside ASCII range..


byte[] inputBytes = input.getBytes(StandardCharsets.UTF_8);
feeder.feedInput(inputBytes, 0, inputBytes.length);
feeder.endOfInput();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works the way you'd expect... I think endOfInput() should not be called until all input is consumed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe I misremember this part, as test appears to work fine :)

Copy link
Member

@cowtowncoder cowtowncoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok parameterization etc make things bit hard to follow but I assume things are fine.
I do have suspicion async-parser case won't work as expected but I'll merge and see how things fare.

@cowtowncoder cowtowncoder merged commit 9fb3f21 into FasterXML:2.16 Dec 23, 2023
5 checks passed
cowtowncoder added a commit that referenced this pull request Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants