Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change OCA-spec for RegEx format rules for DateTime #52

Closed
carlyh-micb opened this issue Mar 5, 2024 · 3 comments
Closed

Change OCA-spec for RegEx format rules for DateTime #52

carlyh-micb opened this issue Mar 5, 2024 · 3 comments

Comments

@carlyh-micb
Copy link
Collaborator

Suggested change to the OCA specification - give example of RegEx and explain it in contrast to ISO standard for format rules for DateTime. Right now the only example is ISO standard and this creates significant issues downstream when you have date attributes that aren't ISO standard.
https://github.com/agrifooddatacanada/oca-spec/tree/master/docs/specification#format-overlay

What happens when a data set has date that isn't in ISO format for DateTime? The data may not be easily changed if it is data coming from an instrument.

If DateTime is expressed in ISO notation for format only, then, in the data example, date DataType must be "Text" because the date example cannot be expressed in ISO notation. We would then create format rules for text dates (such as a RegEx expression for dates with slashes) for the format overlay.

In our data verification code at ADC, DateTime is only allowed to be expressed in ISO notation, and it calls a library that converts the ISO notation into a RegEx rule for data verification.

However, it's a pain for users, their schema will change when they specify the date first as text (burned into capture base) and then if they change the data to ISO standard and switch the datatype to DateTime now the capture base is different.

image

@pknowl
Copy link
Collaborator

pknowl commented Mar 7, 2024

@mitfik See the highlighted date format in the screenshot above. I would naturally format that as DD/MM/YYYY. However, that is not ISO 8601 compliant. Would it make sense to use a RegEx format in this case? The spec is accurate as it stands. However, we could add a note regarding non-ISO date formats. Your thoughts?

@blelump
Copy link
Member

blelump commented Mar 13, 2024

Unfortunately, the format overlay introduced ambiguities that we observe across different use cases. See #38 or #44 . Formatting covers a broad area of topics and occurs in various contexts. What we have observed so far is that formatting issues occur when presenting or capturing data. Both are not related to semantics but to presentation and/or business requirements. Whether the input date is DD/MM/YYYY or YYYY-MM-DD does not matter from the semantics perspective because the common denominator is the DateTime type, currently (also implicitly) used as ISO8601. Any further formatting (contextual tailoring) required for presentation and/or business requirements is secondary to that and must be addressed differently and separately. It is worth noting that date formatting is also culture-dependent, which adds more complexity.

Invariants mentioned in a separate issue and described deeper here partially address formatting, i.e., the business says it only takes care of a year and a presentation that focuses on cultural differences.

Semantics + invariants + presentation constitute a trio that must always be considered when thinking of digitally stored information.

@mitfik
Copy link
Contributor

mitfik commented Oct 20, 2024

Problem is resolved by enforcing ISO8601 which capture essence of structural part of semantic. The rest as @blelump pointed out is addressed on other layers.

@mitfik mitfik closed this as completed Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants