Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python,rust,cli): add DATE function for SQL #11541

Merged
merged 5 commits into from
Oct 16, 2023

Conversation

cmdlineluser
Copy link
Contributor

https://stackoverflow.com/questions/77227895/polars-sql-context-filter-by-date-or-datetime

This was a recent question on SO and the mentioned DATE function doesn't seem to exist yet.

I was interested in seeing how polars-sql worked and thought I'd have a look.

import datetime
import polars as pl

df = pl.DataFrame({
    "date": [
        datetime.date(2021, 3, 15),
        datetime.date(2021, 3, 28),
        datetime.date(2021, 4, 4),
     ],
     "version": ["0.0.1", "0.7.3", "0.7.4"]
})

with pl.SQLContext(df=df, eager_execution=True) as ctx:
    print(ctx.execute("""SELECT *, date < DATE('2021-03-20') AS "alpha" from df"""))

# shape: (3, 3)
# ┌────────────┬─────────┬───────┐
# │ date       ┆ version ┆ alpha │
# │ ---        ┆ ---     ┆ ---   │
# │ date       ┆ str     ┆ bool  │
# ╞════════════╪═════════╪═══════╡
# │ 2021-03-15 ┆ 0.0.1   ┆ true  │
# │ 2021-03-28 ┆ 0.7.3   ┆ false │
# │ 2021-04-04 ┆ 0.7.4   ┆ false │
# └────────────┴─────────┴───────┘

Should DATETIME also be added as part of this?

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Oct 5, 2023
Copy link
Collaborator

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you also need to update the features in polars-sql

crates/polars-sql/src/functions.rs Show resolved Hide resolved
@@ -173,6 +173,10 @@ pub(crate) enum PolarsSqlFunctions {
/// ```
Round,

// Date Functions
// ----
Date,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe postgres (which we are trying to mimic where possible) calls this to_date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see - thanks.

So should Date and date be replaced with ToDate and to_date?

Copy link
Collaborator

@alexander-beedie alexander-beedie Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe postgres (which we are trying to mimic where possible) calls this to_date

Ahhh, TO_DATE is actually a separate function that requires a second argument (a format string which is not strftime, but rather one of the "YYYY-MM-DD" type ones; we should probably skip this for now until we can map from that flavour of format string to strftime 🤔).

In this case DATE is appropriate as it valid and only needs the string - can be called like so:

SELECT DATE '2020-12-31' as d1, DATE('2222-02-22') as d2

(I just quickly checked this on a local PostgreSQL connection ;))

@cmdlineluser
Copy link
Contributor Author

@universalmind303 Ah, thanks - so .str.to_date() requires dtype-date

Do I just need to add this to [features]?

dtype-date = ["polars-lazy/dtype-date"]

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Oct 5, 2023

Adding DATE is a good workaround (and another piece of valid SQL that we will now be able to handle!), but we should also have a look at fixing the "real" issue, which is cast syntax not working here (either using cast(x as date) or x::date). Looks to be a problem with AnyValue casts from Utf8Date,Datetime 🤔

@cmdlineluser: once you finish-up DATE you should be able to match Utf8Date in fn visit_cast and apply the same logic there? Then we'd get the new syntax and fix casts ;)

@cmdlineluser
Copy link
Contributor Author

cast syntax not working here

@alexander-beedie Ah. They didn't mention the use of CAST initially.

Does that mean it's just dispatching to .cast(pl.Date) (which would work when #10517 is implemented?)

pl.select(pl.lit('2023-01-01').cast(pl.Date))

# ComputeError: strict conversion from `str` to `date` failed for column: literal, value(s) ["2023-01-01"]; if you were trying to cast Utf8 to temporal dtypes, consider using `strptime`

Although in the sql context there is no error, it just returns null.

pl.select(pl.sql_expr("CAST('2023-01-01' as DATE)"))

# shape: (1, 1)
# ┌─────────┐
# │ literal │
# │ ---     │
# │ date    │
# ╞═════════╡
# │ null    │
# └─────────┘

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Oct 7, 2023

Does that mean it's just dispatching to .cast(pl.Date) (which would work when #10517 is implemented?)

Or, perhaps slightly better, redirect to .str.to_date() / .str.to_datetime() (as suggested above, I think) 👍

@MarcoGorelli
Copy link
Collaborator

Does that mean it's just dispatching to .cast(pl.Date) (which would work when #10517 is implemented?)

Or, perhaps slightly better, redirect to .str.to_date() / .str.to_datetime() (as suggested above, I think) 👍

exactly, that can already work today and auto-detects the standard kinds of date formats which you'd probably expect here (like %Y-%m-%d)

In case it helps with motivation to get this across the finish line, there's been some interest in this on linkedin https://www.linkedin.com/posts/daniel-beach-6ab8b4132_dataengineering-polars-activity-7117879064404664321-BlB3?utm_source=share&utm_medium=member_desktop

@ritchie46
Copy link
Member

@cmdlineluser any news on this one? Would be great to merge this.

@MarcoGorelli
Copy link
Collaborator

I just added a commit to add the dtypes-date feature - is anything else missing from this PR?

@cmdlineluser
Copy link
Contributor Author

Oops - Apologies everyone, missed the notifications on this one.

Thanks for update @MarcoGorelli

Perhaps I was unclear in my wording previously as it was already dispatching to .str.to_date()

I don't think anything else is missing? I was wondering if DATETIME() should also be added as part of this as it's essentially identical.

@alexander-beedie
Copy link
Collaborator

I don't think anything else is missing? I was wondering if DATETIME() should also be added as part of this as it's essentially identical.

In that case, how about we merge this one now, and you can add DATETIME in a new PR?

@ritchie46
Copy link
Member

Can you do a rebase @cmdlineluser? Then we can merge this one.

@cmdlineluser
Copy link
Contributor Author

Apologies @ritchie46 - first time learning about rebase.

Hopefully I haven't messed that up.

Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost correct...looks like you just duplicated test_sql_unary_ops_8890 in the rebase

py-polars/tests/unit/sql/test_sql.py Outdated Show resolved Hide resolved
@ritchie46
Copy link
Member

Alright here goes. Thank you @cmdlineluser, happy to learn you more git tricks ;)

@ritchie46 ritchie46 merged commit 8589836 into pola-rs:main Oct 16, 2023
@ritchie46
Copy link
Member

@cmdlineluser interested in following up with the datetime funciton?

@cmdlineluser
Copy link
Contributor Author

I did take an initial look @ritchie46 but got a bit stuck.

I did not realize that .str.to_datetime() is a little different as it also requires time_unit and time_zone (although they are keyword-only args in py-polars)

fn str_to_datetime(
&self,
format: Option<String>,
time_unit: Option<Wrap<TimeUnit>>,
time_zone: Option<TimeZone>,
strict: bool,
exact: bool,
cache: bool,
ambiguous: Self,
) -> Self {
let options = StrptimeOptions {
format,
strict,
exact,
cache,
};
self.inner
.clone()
.str()
.to_datetime(
time_unit.map(|tu| tu.0),
time_zone,
options,
ambiguous.inner,
)
.into()
}

I'm not sure if/how these are supposed to be exposed via polars-sql.

From what I can tell, POSTGRES supports "named args" with => e.g. DATETIME(..., time_unit => 'ns')

The current visit_* functions in polars-sql seem to only be for positional based arguments.

@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants