Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateTime elements have seconds precision, optimized for fractions #2308

Merged
merged 3 commits into from
Apr 15, 2024

Conversation

ReimarBauer
Copy link
Member

Timestamp elements have milliseconds precision in SQLite, and microsecond precision in SAP HANA and PostgreSQL.
DateTime elements have seconds precision, that was the cause we need to wait a second.

Purpose of PR?:

Fixes #2245

Does this PR introduce a breaking change?
Change to precission of Timestamps

https://cap.cloud.sap/docs/guides/databases-sqlite

Timestamp elements have milliseconds precision in SQLite, and microsecond precision in SAP HANA and PostgreSQL.
@ReimarBauer ReimarBauer requested a review from matrss March 27, 2024 19:34
@matrss
Copy link
Collaborator

matrss commented Mar 28, 2024

https://cap.cloud.sap/docs/guides/databases-sqlite

I don't see how this link (and the conclusions you took from it) are relevant. They describe datatypes in some js library that is unrelated to SQLAlchemy. SQLite does not have a TIMESTAMP type (and neither does it have a DATETIME type, datetimes are canonically stored as either ISO8601 text, a REAL or an INTEGER, more about that here: https://www.sqlite.org/datatype3.html). The DateTime type from SQLAlchemy does the right thing(TM) for SQLite, which is to store them as ISO8601 text, including microseconds (see: https://docs.sqlalchemy.org/en/20/dialects/sqlite.html#sqlalchemy.dialects.sqlite.DATETIME).

BUT: while the DateTime type from SQLAlchemy has microsecond precision for SQLite, I don't think this is always the case for other databases. At least in its docs (https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.DateTime) they recommend using dialect specific types if sub-second precision is needed. This would mean that we would have to pick an appropriate datatype for each and every database that we want to support (or maybe just do the ISO8601 text representation everywhere manually). I am not sure yet on if this is worth the effort, or if second precision is fine and sub-second precision is too much effort for too little gain.

@ReimarBauer
Copy link
Member Author

ReimarBauer commented Mar 28, 2024

SQLite supports several date and time types, including DATE, TIME, DATETIME, and TIMESTAMP. These types represent a combination of date and time or just the date or time separat

And as described DATETIME has in sqlite a precission of 1 Second. It is stored with milliseconds but not used in its functions. That is the problem.

The fix is correct

@matrss
Copy link
Collaborator

matrss commented Mar 28, 2024

SQLite supports several date and time types, including DATE, TIME, DATETIME, and TIMESTAMP.

No, it is this:

SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values:

TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC. 

Applications can choose to store dates and times in any of these formats and freely convert between formats using the built-in date and time functions.

(from https://www.sqlite.org/datatype3.html)

SQLAlchemy then introduces a DATETIME type in its SQLite dialect that uses the above described TEXT representation for datetimes. This representation, which is used by SQLAlchemy's DateTime Column type, does include microseconds. You can check that yourself:

import sqlalchemy
import sqlalchemy.orm
import datetime
from sqlalchemy import Column, Integer, DateTime, TIMESTAMP


engine = sqlalchemy.create_engine("sqlite:///demo.db")
Session = sqlalchemy.orm.sessionmaker(bind=engine)
session = Session()
Base = sqlalchemy.orm.declarative_base()


class Record(Base):
    __tablename__ = "records"

    id = Column(Integer, primary_key=True)
    dt = Column(DateTime)
    ts = Column(TIMESTAMP)


Base.metadata.create_all(engine)
now = datetime.datetime.now()
record = Record(dt=now, ts=now)
print(f"{record.id=}, {record.dt=}, {record.ts=}")
session.add(record)
session.commit()

del record

record = session.get(Record, 1)
print(f"{record.id=}, {record.dt=}, {record.ts=}")

Output:

$ python sqlalchemy_datetime_demo.py 
record.id=None, record.dt=datetime.datetime(2024, 3, 28, 12, 8, 42, 914199), record.ts=datetime.datetime(2024, 3, 28, 12, 8, 42, 914199)
record.id=1, record.dt=datetime.datetime(2024, 3, 28, 12, 8, 42, 914199), record.ts=datetime.datetime(2024, 3, 28, 12, 8, 42, 914199)

As you can see, DateTime handles microseconds perfectly fine. And due to SQLite's very lax typing TIMESTAMP does more or less the same.

This is what it looks like in the database:

sqlite> select * from records;
1|2024-03-28 12:08:42.914199|2024-03-28 12:08:42.914199
sqlite> select typeof(dt), typeof(ts) from records;
text|text

The data representation is the same and both columns have a type affinity of "text".

The only thing that differs is the actual supplied type name in the schema, but SQLite ignores that apart from determining the type affinity:

sqlite> .schema
CREATE TABLE records (
	id INTEGER NOT NULL, 
	dt DATETIME, 
	ts TIMESTAMP, 
	PRIMARY KEY (id)
);

So for SQLite there is no practical difference between the two.

The thing that I don't know is whether DateTime has sub-second precision on other databases. But then again, I haven't found a definitive source that this is the case for TIMESTAMP either. TIMESTAMP has the disadvantage that it is an "uppercase" datatype (https://docs.sqlalchemy.org/en/20/core/type_basics.html#the-uppercase-datatypes), which is rendered exactly as such in the schema. At least it seems to exist for all database backends, but if it didn't then it would produce an error, for example.

@ReimarBauer
Copy link
Member Author

Interestingly it did not crash, lets see what happens when I change the type

@ReimarBauer
Copy link
Member Author

ReimarBauer commented Mar 28, 2024

The important part for us is what it does with ordered_by by the time information. That was only in seconds and not with fractions. Looks like we can keep DateTime. And the other changes seem to fix it.

All the hints I readed before that I have also to change to TIMESTAMP, but obviously that was outdated and wrong.

@matrss
Copy link
Collaborator

matrss commented Mar 28, 2024

We are using postgres in production, right? I think we should check how the DateTime type behaves there as well (and in the future extend the CI suite to test against different databases as well).

@ReimarBauer
Copy link
Member Author

same idea, #2311

also now better understanding why the billiger de people mentioned that in their session

@ReimarBauer ReimarBauer changed the title DateTime elements have seconds precision, replaced by Timestamp DateTime elements have seconds precision, optimized for fractions Mar 28, 2024
@@ -64,7 +64,7 @@ def get_messages(self, op_id, timestamp=None):
if timestamp is None:
timestamp = datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc)
else:
timestamp = datetime.datetime.strptime(timestamp, "%Y-%m-%d, %H:%M:%S %z")
timestamp = datetime.datetime.strptime(timestamp, "%Y-%m-%d, %H:%M:%S.%f %z")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How flexible are we with this format? Is there some requirement on how it looks or is it just important that we use the same format for serialization and parsing between client and server respectively?

I am asking because I think this could just be .isoformat and .fromisoformat respectively. See: https://docs.python.org/3/library/datetime.html#datetime.datetime.isoformat, https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat.

With those there would be no potential for a format string mismatch and we would use a standardized representation that is unambiguous.

Copy link
Member Author

@ReimarBauer ReimarBauer Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check with the timezone, isoformat would be prefered too.

a little headache I do have with the digits there, the testing setup needs 3 there. I have no idea, if the production system is happy or needs 6. This also the limitation here to milliseconds.

https://github.com/Open-MSS/MSS/pull/2308/files#diff-c721905a36cfba76833f0aa2f08229d63d2cbfa4ef0dcd2447a3e182880519b5R372

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timezones shouldn't be an issue, they are properly handled by the isoformat methods. The number of zeros in that linked code line should be 6, since that would match what the %f format option also adds (6 zero-padded digits representing microseconds).

Copy link
Member Author

@ReimarBauer ReimarBauer Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct
https://www.sqlite.org/lang_datefunc.html
In formats 4, 7, and 10, the fractional seconds value SS.SSS can have one or more digits following the decimal point. Exactly three digits are shown in the examples because only the first three digits are significant to the result, but the input string can have fewer or more than three digits and the date/time functions will still operate correctly. Similarly, format 12 is shown with 10 significant digits, but the date/time functions will really accept as many or as few digits as are necessary to represent the Julian day number.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't follow. What are formats 4, 7, 10 and 12? Are you quoting from somewhere without referencing the source?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was because on many places the fractions symbolized by 3S

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, this applies to the date and time related functions from SQLite. I don't think SQLAlchemy uses those though (it just dumps iso 8601 formatted strings to the database and reads those back as strings before parsing them, SQLite only ever sees text and doesn't have to parse it), so I don't think any of that applies to our use here.

@@ -172,8 +171,6 @@ def test_get_all_changes(self):
with self.app.test_client():
flight_path, operation = self._create_operation(flight_path="V11")
assert self.fm.save_file(operation.id, "content1", self.user)
# we need to wait to get an updated created_at
time.sleep(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it would still need to wait a microsecond. Practically, I don't think that will ever be an issue...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing with sqlite would need a millisecond.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, microsecond is correct. As seen above the representation of DateTime in SQLite is stored with 6 decimal places, i.e. with a precision of one microsecond.

@joernu76
Copy link
Member

I reaaally dislike the fragility of this functionality.
We should use a parsing function which generously accepts well formed strings. Going to a standard such as iso8601 would be preferable.
I never understood the prevalence of strptime.

This seems to solve an immediate issue, so we can defer a robust fix to later.

@ReimarBauer
Copy link
Member Author

ReimarBauer commented Apr 15, 2024

I do also like a step by step improvement too This is currently a bug fix. The change to iso format is likly an enhancement of the whole.

Let's make that in a new PR.

#2321

@ReimarBauer ReimarBauer requested a review from joernu76 April 15, 2024 10:40
@ReimarBauer ReimarBauer merged commit e71d9f3 into Open-MSS:develop Apr 15, 2024
8 checks passed
@ReimarBauer ReimarBauer deleted the i2245 branch April 15, 2024 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

changes query needs compare .created_at with the stored milliseconds
3 participants