read_excel plus "calamine" engine issues when loading Excel data with some empty values #14174
Closed
2 tasks done
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Load the attached Excel data sample file, containing sparse data.
Log output
Issue description
sample_data_blanks_instead_of_nulls.xlsx
sample_data_nulls.xlsx
The attached Excel spreadsheets contain, for simplicity of reproduction, a total of 12 columns (A:L) with a row count of 287 (288 if you include header). One file has null/empty values with the "NULL" string placeholder, the other does not (empty/blank cell value instead). The data has integer columns, strings, float/double/numeric, and dates in timestamp format. Some of the columns have every row populated, some do not (55/287, 38/287, etc.)
Upon loading the data using
read_excel
with the newengine=calamine
integration, it results in aComputeError: Series lenght <# of rows without empty values> doesn't match the DataFrame height of <# total rows in the Excel spreadsheet>
I have tested this same behavior using the
openpyxl
and defaultxlsx_to_csv
engines and the data can and is read correctly.Expected behavior
The data should be loaded correctly into memory as a DataFrame, and the datatypes inferred as it happens when using the
openpyxl
engine.shape: (287, 12)
Installed versions
The text was updated successfully, but these errors were encountered: