-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update story_list() to handle indexed_date values with or without microseconds #89
Conversation
…ithout microseconds #88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Very pythonic.
I'm pretty sure datetime.datetime.fromisoformat accepts both with and without microseconds
BUT it expects EXACTLY six digits of fractional second!
>> import datetime as dt
>> dt.datetime.fromisoformat("2024-01-02 03:04:05")
datetime.datetime(2024, 1, 2, 3, 4, 5)
>> dt.datetime.fromisoformat("2024-01-02 03:04:05.123456")
datetime.datetime(2024, 1, 2, 3, 4, 5, 123456)
>> dt.datetime.fromisoformat("2024-01-02 03:04:05.12345")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Invalid isoformat string: '2024-01-02 03:04:05.12345'
|
Oh wow nice catch @philbudne ! I didn't realize fromisoformat() did that. Do you think there's any chance of ever receiving something other than 0 or 6 digits of a fractional second? If so I'll stick with the try/except but otherwise fromisoformat() is cleaner. |
I don't know the code path in this case, BUT story-indexer uses
datetime.isoformat() throughout to format datetimes.
And it looks like "fromisoformat" accepts three digits (ms) of
fractional second, as well as six, since fromisoformat's docstring
says: "Construct a time from the output of isoformat()."
And the isoformat docstring says:
| isoformat(...)
| [sep] -> string in ISO 8601 format, YYYY-MM-DDT[HH[:MM[:SS[.mmm[uuu]]]]][+HH:MM].
| sep is used to separate the year from the time, and defaults to 'T'.
| The optional argument timespec specifies the number of additional terms
| of the time to include. Valid options are 'auto', 'hours', 'minutes',
| 'seconds', 'milliseconds' and 'microseconds'.
And looking at /usr/lib/python3.10/datetime.py, I see:
def _parse_hh_mm_ss_ff(tstr):
# Parses things of the form HH[:MM[:SS[.fff[fff]]]]
len_str = len(tstr)
which is called by _parse_isoformat_time
which is called by datetime.fromisoformat
So, in my opinion, "fromisoformat" should be a safe and clean solution.
The only reason this bit of minutia came to my attention is that I
JUST had to deal with reading date/time strings from a CSV dumped by
PostgreSQL, and it seems like psql truncates all trailing zeroes (and
the dot), which is NOT acceptable to fromisoformat.
|
sounds good I like the suggestion, added it to the most recent commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great knowledge sharing. I like simple solutions 👍🏽
In reference to #88, this PR sets up a try/except block to include the maximum available resolution of a given timestamp in indexed_date rather than truncating data that comes after the seconds.