Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a regex checker and fixer for offsets without colons for ZDateTime and OSDateTime #208

Merged
merged 4 commits into from
Apr 1, 2021

Conversation

oeystein
Copy link
Contributor

WHY

See issue #131 for details

Colonless offsets for ISO 8601s strings a common way of representing DateTimes with offsets. Unfortunately ZonedDateTime.parse(string) and OffsetDateTime.parse(string) does not support this, breaking jackson compatibility.

HOW

Uses a regex to check for offsets without colons, and if found - adds in a colon to make it compatible with ZonedDateTime.parse() and OffsetDateTime.parse()

@oeystein
Copy link
Contributor Author

oeystein commented Mar 21, 2021

.Looks like it breaks negative time strings (i was not actually aware this was a thing ... ) Need to solidify the regex expression..

Edit: Fixed with ignoring matches of start of the string.

Copy link
Member

@kupci kupci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor points otherwise looks good to me.

…and recommented on regex expression usage in production.
Copy link
Member

@kupci kupci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, @cowtowncoder thoughts?

@kupci
Copy link
Member

kupci commented Mar 23, 2021

FYI: I naively thought it would be better to apply the regex to the end of the string, and quickly found out: not so simple. I think given the shortness of the string, and the simplicity (clean and simple) of the regex you have, it should be fine, so please disregard (now deleted) comment I had about the regex.

@oeystein
Copy link
Contributor Author

Well that explains why the comment button refused to work when I tried to reply!

And originally i had the same thought as you. I overdid the regex expression before figuring out simple ought to do the trick. There are also variations of the ISO 8601 string which includes TZ on end as mentioned in #131, such as "2000-01-01T12:00+0100[Europe/Paris]". So any at the end of expression would have to check for it to be the end or followed by "[".

@kupci
Copy link
Member

kupci commented Mar 24, 2021

@oeystein Thanks! Good points, the regex can get crazy quick.

I think this is getting ready, though @cowtowncoder might have some further comments, in the meantime, one thing you'll need to send in (if not already) the CLA, from:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

and usually the easiest way is to print, fill & sign, scan/photo, email to info at fasterxml dot com.

Thank you once again for the PR!

@kupci kupci added the 2.13 label Mar 24, 2021
@oeystein
Copy link
Contributor Author

@kupci Any time!

And i just sent over the contributor agreement so the formalities should be in order now.

*
* @since 2.13
*/
protected static final Pattern ISO8601_COLONLESS_OFFSET_REGEX = Pattern.compile("(?<!^)[+-][0-9]{4}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question here: I understand that first part is to try to avoid false matches (since we cannot assume it's at the end, due to possible timezone suffix), but what does the part in parenthesis mean? :)

Also wonder if it could/should check trailing word boundary (either end-of-string, or [) -- that would anchor it correctly, I think.
Just curious, not sure if there are performance implications; I know some matching constructors can be more expensive than others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been giving the regex expression some thought over the last few days and i agree its not optimal. The parenthesis just simply uses anchors to ignore checking the first letter in the the string.

We can however likewise simply use match on anchor(end of line) or boundary([) to check for line ends or "[" matches. So realistically i don't believe the following example should not have any major performance implications? It's at least the best i appear to come up with, and the one i believe would be the best.

[+-][0-9]{4}(?=[|$)

Another solution is too only look for matches after T(which is pretty much the only guaranteed thing in ISO 8601 DataTime formats), which should also technically work but not created a regex expression for this yet.

@kupci I believe you also looked a bit into the performance complications of this, any comments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like the best to me. Anchoring it to the end of the string, or the bracket, will help performance a bit, compared to the original. Then, the regex engine only looks for the 4 digits, and stops.

Btw, for a fun read on how things can rapidly get out of hand with a regex that looks simple in appearance, this is an interesting read: Runaway Regular Expressions: Catastrophic Backtracking

@cowtowncoder
Copy link
Member

@oeystein Thank you for contributing this (including tests!) & sending CLA -- only have one question, hoping to merge this soon.

@oeystein
Copy link
Contributor Author

oeystein commented Mar 30, 2021

I posted a code comment on this a couple days ago. When that idea is sorted out everything should be ready to-go. Seems the comment was left hanging!

Apologizes not that familiar with github ... i apparently had to submit review for that comment to go trough!

…e string checked the whole string beyond the start, now it checks at the end or before [.
@oeystein
Copy link
Contributor Author

Committed the requested changes, so if everything seems good now this should be ready to be merged.

@cowtowncoder
Copy link
Member

Sorry, extremely heavy week. This is near the top of the pile (https://github.com/FasterXML/jackson-future-ideas/wiki/Jackson-Work-in-Progress) so while delayed, not forgotten, and I intend to get to it this week, hopefully tomorrow.
Thank you for your patience @oeystein.

@cowtowncoder cowtowncoder merged commit 31a46ad into FasterXML:2.13 Apr 1, 2021
@cowtowncoder cowtowncoder added this to the 2.13.0 milestone Apr 1, 2021
@oeystein oeystein deleted the Support-colonless-timezones branch April 2, 2021 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants