Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: to_datetime: document that %Y zero-pads years #18791

Open
2 tasks done
MLpranav opened this issue Sep 17, 2024 · 2 comments
Open
2 tasks done

docs: to_datetime: document that %Y zero-pads years #18791

MLpranav opened this issue Sep 17, 2024 · 2 comments
Labels
A-timeseries Area: date/time functionality documentation Improvements or additions to documentation python Related to Python Polars

Comments

@MLpranav
Copy link

MLpranav commented Sep 17, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import pandas as pd
import datetime

print(pl.Series(['12-AUG-20']).str.strptime(pl.Date, '%d-%b-%Y')[0])
# datetime.date(20, 8, 12) ❌

print(pd.to_datetime('12-AUG-20', format='%d-%b-%Y'))
# ValueError ✅

print(datetime.datetime.strptime('12-AUG-20', '%d-%b-%Y'))
# ValueError ✅

###

print(pl.Series(['12-AUG-20']).str.strptime(pl.Date, '%d-%b-%y')[0])
# datetime.date(2020, 8, 12) ✅

print(pd.to_datetime('12-AUG-20', format='%d-%b-%y'))
# Timestamp('2020-08-12 00:00:00') ✅

print(datetime.datetime.strptime('12-AUG-20', '%d-%b-%y'))
# datetime.datetime(2020, 8, 12, 0, 0) ✅

Log output

No response

Issue description

Instead of throwing an error when a 2 digit year is passed for format code %Y (not %y), polars is automatically zero padding it.

(By silently ignoring this exception, a try except clause for handling wrongly formatted data is not getting triggered.)

Expected behavior

An exception should be raised, similar to pandas' and datetime's behaviour.

Installed versions

--------Version info---------
Polars:              1.7.1
Index type:          UInt32
Platform:            macOS-14.6.1-arm64-arm-64bit
Python:              3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               24.2.1
great_tables         <not installed>
matplotlib           3.9.1
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             3.1.2
pandas               2.2.2
pyarrow              17.0.0
pydantic             2.8.2
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           3.2.0

@MLpranav MLpranav added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 17, 2024
@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Sep 17, 2024

thanks @MLpranav for the report

chrono parses it like this

use chrono::NaiveDate;
fn main() {
    let res = NaiveDate::parse_from_str("12-AUG-20", "%d-%b-%Y");
    println!("{:?}", res);  // Ok(0020-08-12)
}

but maybe it should be documented

@MarcoGorelli MarcoGorelli added documentation Improvements or additions to documentation and removed bug Something isn't working needs triage Awaiting prioritization by a maintainer labels Sep 17, 2024
@MarcoGorelli MarcoGorelli changed the title strptime - %d%b%Y interprets 12AUG20 year as 0020 instead of throwing error to_datetime: document that %Y zero-pads years Sep 17, 2024
@MarcoGorelli MarcoGorelli changed the title to_datetime: document that %Y zero-pads years docs: to_datetime: document that %Y zero-pads years Sep 17, 2024
@MLpranav
Copy link
Author

I would still consider this a bug, as %Y is not supposed to match 2 digit years, especially when default values of "strict" and "exact" are set to True. This is also not in line with the default pandas/datetime behaviour.

@MarcoGorelli MarcoGorelli added the A-timeseries Area: date/time functionality label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-timeseries Area: date/time functionality documentation Improvements or additions to documentation python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants