Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATETIME_RFC339 (sic) is not a correct regex for an RFC 3339 datetime #78

Closed
philvarner opened this issue Jun 1, 2021 · 3 comments
Closed

Comments

@philvarner
Copy link
Contributor

philvarner commented Jun 1, 2021

The datetime regex in https://github.com/stac-utils/stac-pydantic/blob/master/stac_pydantic/shared.py#L18

DATETIME_RFC339 = "%Y-%m-%dT%H:%M:%SZ"

is not correct wrt RFC 3339.

This is important because it is used in Search object validation, and incorrectly returns an error on some valid datetime values.

Copying this from my personal notes on RFC 3339 datetimes:

RFC 3339 is a profile of ISO 8601, adding these constraints:
- a complete representation of date and time (fractional seconds optional).
- requires 4-digit years
- only allows a period character to be used as the decimal point for fractional seconds
- requires the zone offset to be Z or like +00:00, while ISO8601 allows like +0000

Also note that the 'T' separator is required, and there's quite a bit of confusion over this, see https://stackoverflow.com/questions/63783868/what-are-valid-date-time-separators-in-rfc3339-strings

These are a few examples of what would be allowed for ISO8601 but not RFC 3339:

  • 1985-04-12
  • 1937-01-01T12:00:27.87+0100
  • 37-01-01T12:00:27.87+0100

Below are all valid RFC 3339 datetimes. Note the fractional seconds, Z or z as a timezone, positive and negative arbitrary offset timezones, T or any other character as a separator between date and time.

  • 1985-04-12T23:20:50.52Z
  • 1996-12-19T16:39:57-08:00
  • 1990-12-31T23:59:60Z
  • 1990-12-31T15:59:60-08:00
  • 1937-01-01T12:00:27.87+01:00
  • 1985-04-12T23:20:50.52Z
  • 1937-01-01T12:00:27.8710+01:00
  • 1937-01-01T12:00:27.8+01:00
  • 1937-01-01T12:00:27.8Z
  • 1985-04-12t23:20:50.5202020z
  • 2020-07-23T00:00:00Z
  • 2020-07-23T00:00:00.0Z
  • 2020-07-23T00:00:00.01Z
  • 2020-07-23T00:00:00.012Z
  • 2020-07-23T00:00:00.0123Z
  • 2020-07-23T00:00:00.01234Z
  • 2020-07-23T00:00:00.012345Z
  • 2020-07-23T00:00:00.000Z
  • 2020-07-23T00:00:00.000+03:00
  • 2020-07-23T00:00:00+03:00
  • 2020-07-23T00:00:00.000+03:00
  • 2020-07-23T00:00:00.000z

I think this is the correct regex for an RFC 3339 datetime:

r"^(\d\d\d\d)\-(\d\d)\-(\d\d)(T|t)(\d\d):(\d\d):(\d\d)(\.\d+)?(Z|([-+])(\d\d):(\d\d))$"

This is slightly different from the one in python-strict-rfc3339, as it allows T or t for the sep, and reverses +- and -+ so that the - doesn't need to be escaped with \-

Matching the datetime string to this will ensure it is a valid RFC 3339 (not just an ISO 8601 datetime), and then an ISO8601 parser can be used to parse it further if need be.

The built-in Python datetime library is not sufficient to parse all valid datetimes here -- notably, it doesn't parse Z as a timezone.

There are two options for this:

Additionally, hypothesis-jsonschema has support for generating dt's for testing: https://github.com/Zac-HD/hypothesis-jsonschema/blob/1c5f107230ccbd48c66d7c6693833745a598e294/src/hypothesis_jsonschema/_from_schema.py

@philvarner
Copy link
Contributor Author

Current recommendation is to use https://github.com/closeio/ciso8601 for RFC3339 parsing.

@vincentsarago
Copy link
Member

in #131 we've decided to switch to pydantic for datetime parsing/validation
this is bit more permissive.

The validation depends on https://docs.pydantic.dev/latest/api/types/#pydantic.types.AwareDatetime

if you feel that this should be changed please let us know or close this issue 🙏

@philvarner
Copy link
Contributor Author

in #131 we've decided to switch to pydantic for datetime parsing/validation this is bit more permissive.

The validation depends on https://docs.pydantic.dev/latest/api/types/#pydantic.types.AwareDatetime

if you feel that this should be changed please let us know or close this issue 🙏

That sounds good, as long as it's not both custom and wrong :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants