Added a regex checker and fixer for offsets without colons for ZDateTime and OSDateTime #208

oeystein · 2021-03-21T16:44:46Z

WHY

See issue #131 for details

Colonless offsets for ISO 8601s strings a common way of representing DateTimes with offsets. Unfortunately ZonedDateTime.parse(string) and OffsetDateTime.parse(string) does not support this, breaking jackson compatibility.

HOW

Uses a regex to check for offsets without colons, and if found - adds in a colon to make it compatible with ZonedDateTime.parse() and OffsetDateTime.parse()

…ateTime and OffsetDateTime.

oeystein · 2021-03-21T17:01:51Z

~~.Looks like it breaks negative time strings (i was not actually aware this was a thing ... ) Need to solidify the regex expression.~~.

Edit: Fixed with ignoring matches of start of the string.

…s ...

datetime/src/main/java/com/fasterxml/jackson/datatype/jsr310/deser/InstantDeserializer.java

kupci

Just a few minor points otherwise looks good to me.

…and recommented on regex expression usage in production.

kupci

Looks good to me, @cowtowncoder thoughts?

kupci · 2021-03-23T22:41:33Z

FYI: I naively thought it would be better to apply the regex to the end of the string, and quickly found out: not so simple. I think given the shortness of the string, and the simplicity (clean and simple) of the regex you have, it should be fine, so please disregard (now deleted) comment I had about the regex.

oeystein · 2021-03-23T22:52:08Z

Well that explains why the comment button refused to work when I tried to reply!

And originally i had the same thought as you. I overdid the regex expression before figuring out simple ought to do the trick. There are also variations of the ISO 8601 string which includes TZ on end as mentioned in #131, such as "2000-01-01T12:00+0100[Europe/Paris]". So any at the end of expression would have to check for it to be the end or followed by "[".

kupci · 2021-03-24T14:33:23Z

@oeystein Thanks! Good points, the regex can get crazy quick.

I think this is getting ready, though @cowtowncoder might have some further comments, in the meantime, one thing you'll need to send in (if not already) the CLA, from:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

and usually the easiest way is to print, fill & sign, scan/photo, email to info at fasterxml dot com.

Thank you once again for the PR!

oeystein · 2021-03-25T08:54:24Z

@kupci Any time!

And i just sent over the contributor agreement so the formalities should be in order now.

cowtowncoder · 2021-03-26T00:54:17Z

datetime/src/main/java/com/fasterxml/jackson/datatype/jsr310/deser/InstantDeserializer.java

+     *
+     * @since 2.13
+     */
+    protected static final Pattern ISO8601_COLONLESS_OFFSET_REGEX = Pattern.compile("(?<!^)[+-][0-9]{4}");


Quick question here: I understand that first part is to try to avoid false matches (since we cannot assume it's at the end, due to possible timezone suffix), but what does the part in parenthesis mean? :)

Also wonder if it could/should check trailing word boundary (either end-of-string, or [) -- that would anchor it correctly, I think.
Just curious, not sure if there are performance implications; I know some matching constructors can be more expensive than others.

Been giving the regex expression some thought over the last few days and i agree its not optimal. The parenthesis just simply uses anchors to ignore checking the first letter in the the string.

We can however likewise simply use match on anchor(end of line) or boundary([) to check for line ends or "[" matches. So realistically i don't believe the following example should not have any major performance implications? It's at least the best i appear to come up with, and the one i believe would be the best.

[+-][0-9]{4}(?=[|$)

Another solution is too only look for matches after T(which is pretty much the only guaranteed thing in ISO 8601 DataTime formats), which should also technically work but not created a regex expression for this yet.

@kupci I believe you also looked a bit into the performance complications of this, any comments?

That looks like the best to me. Anchoring it to the end of the string, or the bracket, will help performance a bit, compared to the original. Then, the regex engine only looks for the 4 digits, and stops.

Btw, for a fun read on how things can rapidly get out of hand with a regex that looks simple in appearance, this is an interesting read: Runaway Regular Expressions: Catastrophic Backtracking

cowtowncoder · 2021-03-26T00:54:57Z

@oeystein Thank you for contributing this (including tests!) & sending CLA -- only have one question, hoping to merge this soon.

oeystein · 2021-03-30T09:10:28Z

I posted a code comment on this a couple days ago. When that idea is sorted out everything should be ready to-go. Seems the comment was left hanging!

Apologizes not that familiar with github ... i apparently had to submit review for that comment to go trough!

…e string checked the whole string beyond the start, now it checks at the end or before [.

oeystein · 2021-03-31T18:30:41Z

Committed the requested changes, so if everything seems good now this should be ready to be merged.

cowtowncoder · 2021-04-01T03:15:00Z

Sorry, extremely heavy week. This is near the top of the pile (https://github.com/FasterXML/jackson-future-ideas/wiki/Jackson-Work-in-Progress) so while delayed, not forgotten, and I intend to get to it this week, hopefully tomorrow.
Thank you for your patience @oeystein.

Added a regex checker and fixer for offsets without colons for ZonedD…

fd1425b

…ateTime and OffsetDateTime.

Added check at start of regex to avoid parsing plus and negative year…

a8c9eb0

…s ...

kupci reviewed Mar 22, 2021

View reviewed changes

datetime/src/main/java/com/fasterxml/jackson/datatype/jsr310/deser/InstantDeserializer.java Outdated Show resolved Hide resolved

kupci reviewed Mar 22, 2021

View reviewed changes

datetime/src/main/java/com/fasterxml/jackson/datatype/jsr310/deser/InstantDeserializer.java Outdated Show resolved Hide resolved

kupci reviewed Mar 22, 2021

View reviewed changes

Added additional tests for ISO8601_COLONLESS_OFFSET_REGEX expression …

41552b6

…and recommented on regex expression usage in production.

kupci approved these changes Mar 23, 2021

View reviewed changes

kupci added the 2.13 label Mar 24, 2021

cowtowncoder reviewed Mar 26, 2021

View reviewed changes

Altered the Regex format for checking datetime offsets. Previously th…

0edb120

…e string checked the whole string beyond the start, now it checks at the end or before [.

cowtowncoder merged commit 31a46ad into FasterXML:2.13 Apr 1, 2021

cowtowncoder added this to the 2.13.0 milestone Apr 1, 2021

oeystein deleted the Support-colonless-timezones branch April 2, 2021 14:33

kupci mentioned this pull request Jun 5, 2021

Deserialize zone offset with or without colon #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added a regex checker and fixer for offsets without colons for ZDateTime and OSDateTime #208

Added a regex checker and fixer for offsets without colons for ZDateTime and OSDateTime #208

Uh oh!

oeystein commented Mar 21, 2021

Uh oh!

oeystein commented Mar 21, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

kupci left a comment

Uh oh!

kupci left a comment

Uh oh!

kupci commented Mar 23, 2021

Uh oh!

oeystein commented Mar 23, 2021

Uh oh!

kupci commented Mar 24, 2021

Uh oh!

oeystein commented Mar 25, 2021

Uh oh!

cowtowncoder Mar 26, 2021

Uh oh!

oeystein Mar 28, 2021

Uh oh!

kupci Mar 30, 2021

Uh oh!

cowtowncoder commented Mar 26, 2021

Uh oh!

oeystein commented Mar 30, 2021 •

edited

Loading

Uh oh!

oeystein commented Mar 31, 2021

Uh oh!

cowtowncoder commented Apr 1, 2021

Uh oh!

Uh oh!

Added a regex checker and fixer for offsets without colons for ZDateTime and OSDateTime #208

Added a regex checker and fixer for offsets without colons for ZDateTime and OSDateTime #208

Uh oh!

Conversation

oeystein commented Mar 21, 2021

WHY

HOW

Uh oh!

oeystein commented Mar 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kupci left a comment

Choose a reason for hiding this comment

Uh oh!

kupci left a comment

Choose a reason for hiding this comment

Uh oh!

kupci commented Mar 23, 2021

Uh oh!

oeystein commented Mar 23, 2021

Uh oh!

kupci commented Mar 24, 2021

Uh oh!

oeystein commented Mar 25, 2021

Uh oh!

cowtowncoder Mar 26, 2021

Choose a reason for hiding this comment

Uh oh!

oeystein Mar 28, 2021

Choose a reason for hiding this comment

Uh oh!

kupci Mar 30, 2021

Choose a reason for hiding this comment

Uh oh!

cowtowncoder commented Mar 26, 2021

Uh oh!

oeystein commented Mar 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oeystein commented Mar 31, 2021

Uh oh!

cowtowncoder commented Apr 1, 2021

Uh oh!

Uh oh!

oeystein commented Mar 21, 2021 •

edited

Loading

oeystein commented Mar 30, 2021 •

edited

Loading