Skip to content

date_part returning wrong results due to overflows #14738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gabotechs opened this issue Feb 18, 2025 · 3 comments
Open

date_part returning wrong results due to overflows #14738

gabotechs opened this issue Feb 18, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@gabotechs
Copy link
Contributor

Describe the bug

When playing with the date_part function, I see that there's ways of triggering int32 multiplication overflows that either panic on a debug build, or return the wrong number at runtime.

To Reproduce

Executing the following statement shows the behavior.

SELECT date_part('microsecond', timestamp '1970-01-01T00:40:00' - timestamp '1970-01-01T00:00:00')

DataFusion fiddle link <- returns a wrong random number

Postgres fiddle link <- returns 0

Expected behavior

The date_part function should behave the same as Postgres

Additional context

Not 100% sure, but I would say that the changes introduced in #13466 look suspicious. There, the inner calculations are using int32 types, which are easy to overflow. Special mention to this line of code:
https://github.com/gabotechs/datafusion/blob/763bd681f09d58ce285ab3a677b81291c41adfce/datafusion/functions/src/datetime/date_part.rs#L290-L290

@gabotechs gabotechs added the bug Something isn't working label Feb 18, 2025
@gabotechs
Copy link
Contributor Author

It can also be replicated with intervals:

SELECT date_part('microseconds', interval '1 hour')
-- returns -694967296, but the result should be 0

@gabotechs
Copy link
Contributor Author

It seems like even without integer overflows, the overall logic is wrong:

SELECT date_part('seconds', interval '1 hour')
-- returns 3600, but the result should be 0

@Omega359
Copy link
Contributor

It seems like even without integer overflows, the overall logic is wrong:

SELECT date_part('seconds', interval '1 hour')
-- returns 3600, but the result should be 0

Hmm. I think this may be a separate issue as indeed it doesn't follow the typical pattern for that function where

SELECT date_part('seconds', interval '1 hour 5 second');

should return 5 (the seconds element in the interval, not the overall seconds the interval covers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants