Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): improved xlsx2csv defaults for read_excel #12081

Merged
merged 2 commits into from
Oct 28, 2023

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Oct 28, 2023

Closes #12052 and closes #12054. If the user hasn't explicitly set these options themselves, we should default them to sensible values - otherwise we can silently lose data and/or precision that is available in the target worksheet.

Updated defaults

Applies to "xlsx2csv" engine (so before we read the resulting CSV data with our own parser):

  • skip_hidden_rowsFalse,
  • floatformat"%f"

(The other two defaults are not a change from the existing behaviour, they just codify the expected values).

Additional fix

  • read_excel supports more than one engine now, but "read_csv_options" and "xlsx2csv_options" only apply when using the xlsx2csv engine; we now ensure that these options are not mis-specified alongside the openpyxl engine (will think about cleaning this up further to make it more generic later 🤔)

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Oct 28, 2023
@alexander-beedie alexander-beedie force-pushed the read-excel-defaults branch 4 times, most recently from ff2af61 to eafe80d Compare October 28, 2023 11:08
@ritchie46 ritchie46 merged commit 3b6aa8f into pola-rs:main Oct 28, 2023
13 checks passed
@alexander-beedie alexander-beedie deleted the read-excel-defaults branch October 28, 2023 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pl.read_excel() returns null for filtered rows pl.read_excel only reads visible number of decimals
2 participants