Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Year zero in ODTs, LDTs, and LDs may be disallowed #1016

Closed
wants to merge 4 commits into from

Conversation

eksortso
Copy link
Contributor

@eksortso eksortso commented Mar 1, 2024

Closes #1000. Sets the permitted year range from 1 to 9999 for the three time-date types that include years, thus explicitly disallowing year 0. In the ODT section, reasons for doing this are provided.

UPDATE: We may relax this, so that year 0 may be disallowed, though not necessarily.

@eksortso
Copy link
Contributor Author

eksortso commented Mar 2, 2024

I rewrote this to simplify it, and to get rid of the Table of Contents changes that my editor made in two places(!) without acknowledgment.

@arp242 @ChristianSi Here I explicitly state that the Gregorian calendar has no year 0. That has nothing to do with RFC 3339, as it turns out, because that spec allows a year zero, which makes many confused coders think they're dealing with the proleptic Gregorian calendar. I acknowledge that in my second point, which is that support for zero-year dates isn't good.

The last of the three example sections makes note that 0001-01-01 is the earliest valid local date. This is a signpost for users looking to use a date for a sentinel to take this one and use it.

I did not touch the ABNF. Despite our grievance with the year 0000, we want to keep the same ABNF as RFC 3339. So, the parser must invalidate dates with a year zero.

What do you think?

Copy link
Contributor

@ChristianSi ChristianSi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be shortened a bit (see my comments), looks good otherwise!

toml.md Outdated Show resolved Hide resolved
toml.md Outdated Show resolved Hide resolved
@eksortso eksortso changed the title Disallow year zero in all ODTs, LDTs, and LDs Year zero in ODTs, LDTs, and LDs may be disallowed Mar 4, 2024
@arp242
Copy link
Contributor

arp242 commented Mar 6, 2024

It seems to me that the first issue to decide is whether to allow year zero or not. Discussing the verbiage seems a bit pointless without agreement on that.

@eksortso
Copy link
Contributor Author

eksortso commented Mar 7, 2024

It seems to me that the first issue to decide is whether to allow year zero or not. Discussing the verbiage seems a bit pointless without agreement on that.

I thought that @ChristianSi made the case for not disallowing year zero on #1000. It's similar to how integers that fall outside the range of signed 64-bit values are not disallowed if the parser can handle them reliably. We should take the same approach with date-times and dates in year zero.

And based on what's already in toml-test, @arp242, I thought that you essentially agreed with the notion of not disallowing (i.e. "allowing" without encouraging) year zero. You have valid tests for year-one date-times and dates in the files edge.toml and edge.json, but no tests on year-zero values. I wrote a PR to make year zero invalid originally on that project, but I'm on the cusp of closing that PR, so that parsers that handle year zero correctly (in the sense of RFC 3339) don't get dinged by us.

I'll make changes tomorrow afternoon my time to reflect the "year zero not disallowed" perspective, unless we get serious calls to disallow 0000 even on platforms that could support it.

@eksortso
Copy link
Contributor Author

eksortso commented Mar 8, 2024

The changes are made. Yero-zero date-times and dates may be declared invalid.

@ChristianSi
Copy link
Contributor

Looks good to me!

@eksortso
Copy link
Contributor Author

@arp242 @pradyunsg Can you take a look at this and tell me what you think of it?

This PR falls under the idea that year 0000 may be disallowed, but it ought to be allowable in the same way that integers and floats outside of their guaranteed ranges are allowable if they can be done.

@arp242
Copy link
Contributor

arp242 commented Mar 12, 2024

I would prefer to either forbid it outright, or to keep it as-is and leave it up to implementations to inform users of their limitations. There's some other things that aren't 100% supported due to restrictions in the language, and I don't think we need to mention all of that here.

As I mentioned in the other PR, I'm fine with either. It's something I found when writing some tests and not really something that seems to be a huge practical issue. I wouldn't be surprised if very few or even exactly zero people have ever encountered it.

@eksortso
Copy link
Contributor Author

As I mentioned in the other PR, I'm fine with either. It's something I found when writing some tests and not really something that seems to be a huge practical issue. I wouldn't be surprised if very few or even exactly zero people have ever encountered it.

Ironically, it's not the technical issues with languages or calendars that can't handle year zero that ultimately compels me to write this PR, but rather the practical matter of choosing a sentinel date. The strongest assertion I make in my code is reflected in something I found in edge.toml: the first date that is guaranteed to exist. The date 0001-01-01 is a better sentinel than a well-formed value that could blow up when interpreted. This is practical in nature. Which is why I could not accept just leaving the matter open. We do need a PR to close this implicit issue. It will help to prevent future problems.

@arp242
Copy link
Contributor

arp242 commented Mar 12, 2024

Most people don't read the TOML spec; they "try stuff and see if it works". So a "we discourage doing [...]" will solve basically nothing in that regard, because people in languages where it works will keep writing 0000-01-01 if it works there.

So the only way to fix that is to just forbid it. Or to explicitly make year 0 work like year 1, or something like that. That is: clear unambiguous behaviour.

If anything, your change will make matters worse because some implementations where year 0 currently works will start forbidding it, whereas others won't. It will confuse things more, rather than less.

@ChristianSi
Copy link
Contributor

ChristianSi commented Mar 16, 2024

@arp242: Thanks for your comment. While I personally still think that we don't need to forbid the year 0000 (that would create just busywork for implementations where the underlying code will naturally allow it), the currently suggested wording is maybe not the best one.

For integers, we write:

Arbitrary 64-bit signed integers (from −2^63 to 2^63−1) should be accepted and handled losslessly. If an integer cannot be represented losslessly, an error must be thrown.

So we don't explicitly write much about integers outside this range, and we don't explicitly forbid implementations to accept them, nor do we recommend people to "avoid" them (except implicitly by pointing out that an error may be thrown). Maybe we could handle dates similarly, writing something like:

Any Offset Date-Time with a year from 0001 to 9999 that's valid according to RFC 3339 must be accepted. Offset Date-Times in the year 0000 maybe be rejected with an error (RFC 3339 allows them, but there is no such year in the Gregorian calendar). Years with five or more digits and negative years are not defined by RFC 3339 and should (or must?) be rejected with an error.

For Local Date-Times and Local Dates, we could then write something like:

The accepted year range is the same as for Offset Date-Times.

@arp242
Copy link
Contributor

arp242 commented Mar 17, 2024

The thing is, this isn't some practical issue people have been reporting: it's a hypothetical issue that I more or less accidentally discovered writing tests for boundary conditions (in this case: year 0, year 9999).

Almost all TOML files are read by a single application only. And even those read by more than one typically use the same underlying implementation (e.g. pyproject.toml is most often read by Python applications). On top of that datetimes are a relatively little used feature.

In short: it's not really a practical problem as such. It would be nicer to forbid it, IMHO, but it's really not that important. I wouldn't be surprised if the number of people who have actually encountered this issue is exactly zero.

But like I said, adding more text about it would just confuse people and things. "Parse as RFC3339" is easy; "forbid year 0" is easy. Anything else makes it harder because implementations now need to decide what to do.

@eksortso
Copy link
Contributor Author

@ChristianSi @arp242 Thanks to both of you for bringing this issue as far as it has come.

@ChristianSi This PR does need to be fleshed out a bit more, and I would consider a revised version that spells out the main points of your suggested edits more succinctly.

But I'm going to take another approach.

@arp242 I'm going to take your earlier stance and strictly adhere to it. Since nobody has raised any questions, we should not cause questions to be asked. I think if folks wonder why year zero doesn't work in their most-often-used TOML parser, they'll discover that the limitation isn't in TOML but in their language's implementation of year zero.

I am closing this PR. @arp242 I would like you to likewise close your PR, #1000. If the issue of year zero ever comes back to us, we will open an issue to address the matter again.

@eksortso eksortso closed this Mar 17, 2024
@eksortso eksortso deleted the i1000-noyear0 branch March 24, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants