Skip to content

DATETIME_RFC339 (sic) is not a correct regex for an RFC 3339 datetime #78

Closed
@philvarner

Description

@philvarner

The datetime regex in https://github.com/stac-utils/stac-pydantic/blob/master/stac_pydantic/shared.py#L18

DATETIME_RFC339 = "%Y-%m-%dT%H:%M:%SZ"

is not correct wrt RFC 3339.

This is important because it is used in Search object validation, and incorrectly returns an error on some valid datetime values.

Copying this from my personal notes on RFC 3339 datetimes:

RFC 3339 is a profile of ISO 8601, adding these constraints:
- a complete representation of date and time (fractional seconds optional).
- requires 4-digit years
- only allows a period character to be used as the decimal point for fractional seconds
- requires the zone offset to be Z or like +00:00, while ISO8601 allows like +0000

Also note that the 'T' separator is required, and there's quite a bit of confusion over this, see https://stackoverflow.com/questions/63783868/what-are-valid-date-time-separators-in-rfc3339-strings

These are a few examples of what would be allowed for ISO8601 but not RFC 3339:

  • 1985-04-12
  • 1937-01-01T12:00:27.87+0100
  • 37-01-01T12:00:27.87+0100

Below are all valid RFC 3339 datetimes. Note the fractional seconds, Z or z as a timezone, positive and negative arbitrary offset timezones, T or any other character as a separator between date and time.

  • 1985-04-12T23:20:50.52Z
  • 1996-12-19T16:39:57-08:00
  • 1990-12-31T23:59:60Z
  • 1990-12-31T15:59:60-08:00
  • 1937-01-01T12:00:27.87+01:00
  • 1985-04-12T23:20:50.52Z
  • 1937-01-01T12:00:27.8710+01:00
  • 1937-01-01T12:00:27.8+01:00
  • 1937-01-01T12:00:27.8Z
  • 1985-04-12t23:20:50.5202020z
  • 2020-07-23T00:00:00Z
  • 2020-07-23T00:00:00.0Z
  • 2020-07-23T00:00:00.01Z
  • 2020-07-23T00:00:00.012Z
  • 2020-07-23T00:00:00.0123Z
  • 2020-07-23T00:00:00.01234Z
  • 2020-07-23T00:00:00.012345Z
  • 2020-07-23T00:00:00.000Z
  • 2020-07-23T00:00:00.000+03:00
  • 2020-07-23T00:00:00+03:00
  • 2020-07-23T00:00:00.000+03:00
  • 2020-07-23T00:00:00.000z

I think this is the correct regex for an RFC 3339 datetime:

r"^(\d\d\d\d)\-(\d\d)\-(\d\d)(T|t)(\d\d):(\d\d):(\d\d)(\.\d+)?(Z|([-+])(\d\d):(\d\d))$"

This is slightly different from the one in python-strict-rfc3339, as it allows T or t for the sep, and reverses +- and -+ so that the - doesn't need to be escaped with \-

Matching the datetime string to this will ensure it is a valid RFC 3339 (not just an ISO 8601 datetime), and then an ISO8601 parser can be used to parse it further if need be.

The built-in Python datetime library is not sufficient to parse all valid datetimes here -- notably, it doesn't parse Z as a timezone.

There are two options for this:

Additionally, hypothesis-jsonschema has support for generating dt's for testing: https://github.com/Zac-HD/hypothesis-jsonschema/blob/1c5f107230ccbd48c66d7c6693833745a598e294/src/hypothesis_jsonschema/_from_schema.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions