fix(python): improved xlsx2csv defaults for read_excel
#12081
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #12052 and closes #12054. If the user hasn't explicitly set these options themselves, we should default them to sensible values - otherwise we can silently lose data and/or precision that is available in the target worksheet.
Updated defaults
Applies to "xlsx2csv" engine (so before we read the resulting CSV data with our own parser):
skip_hidden_rows
→False
,floatformat
→"%f"
(The other two defaults are not a change from the existing behaviour, they just codify the expected values).
Additional fix
read_excel
supports more than one engine now, but "read_csv_options" and "xlsx2csv_options" only apply when using the xlsx2csv engine; we now ensure that these options are not mis-specified alongside the openpyxl engine (will think about cleaning this up further to make it more generic later 🤔)