You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My company processes excel files, and we often encounter errors in the way a file is constructed. We raise specific errors in cases when the file is encrypted, when the file cannot be opened, when there are formatting errors, etc. so we can notify other team members or clients that something is wrong with their data source. I observed a Rust panic which is not caught when using pandas-calamine engine to load an excel sheet that has structural formatting errors. This is a concern because it doesn't appear there's any way to handle the exception in Python, and thus we cannot surface the right kind of error.
Not sure if this belongs as an issue with you folks or with pyO3 since the rust bindings are managed through that library...
Expected Behavior
I would expect a different kind of exception to be raised, one native to the Python environment. It doesn't matter what.
Reproducible Example
Make an excel document using Microsoft Excel. Rename the document's extension to .zip from .xlsx and extract the data as a zip file. The file contents will be extracted in a new directory. Make a modification to the xl/styles.xml file to remove the name attribute from any instance of the cellStyles tag, so it looks something like this:
-traceback
Traceback (most recent call last):
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 367, in create_data_frame_sheet_file_object
excel = pd.ExcelFile(path_or_buffer=excel_file, engine=pandas_engine)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
self._reader = self._engines[engine](
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in __init__
super().__init__(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in __init__
self.book = self.load_workbook(self.handles.handle, engine_kwargs)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook
return load_workbook(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
reader.read()
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read
apply_stylesheet(self.archive, self.wb)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in __init__
self.name = name
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in __set__
raise TypeError(msg)
TypeError: .name should be but value is
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 371, in create_data_frame_sheet_file_object
excel = pd.ExcelFile(path_or_buffer=excel_file)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in init
self._reader = self._engines[engine](
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in init
super().init(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in init
self.book = self.load_workbook(self.handles.handle, engine_kwargs)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook
return load_workbook(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
reader.read()
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read
apply_stylesheet(self.archive, self.wb)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in init
self.name = name
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in set
raise TypeError(msg)
TypeError: <class 'openpyxl.styles.named_styles._NamedCellStyle'>.name should be <class 'str'> but value is <class 'NoneType'>
And later you get something like PanicException: index out of bounds: the len is 0 but the index is 0
Installed Versions
python : 3.9.16
pandas : 2.2.3
The text was updated successfully, but these errors were encountered:
Hi! PanicException is inherited from python BaseException, so you can catch it and check the exception type. PanicException can be reexported to Python, but I don't think it's a good idea, see PyO3/pyo3#3918 (comment).
Issue Description
My company processes excel files, and we often encounter errors in the way a file is constructed. We raise specific errors in cases when the file is encrypted, when the file cannot be opened, when there are formatting errors, etc. so we can notify other team members or clients that something is wrong with their data source. I observed a Rust panic which is not caught when using pandas-calamine engine to load an excel sheet that has structural formatting errors. This is a concern because it doesn't appear there's any way to handle the exception in Python, and thus we cannot surface the right kind of error.
Not sure if this belongs as an issue with you folks or with pyO3 since the rust bindings are managed through that library...
Expected Behavior
I would expect a different kind of exception to be raised, one native to the Python environment. It doesn't matter what.
Reproducible Example
<cellStyles count="1"><cellStyle xfId="0" builtinId="0" /></cellStyles></styleSheet>
Select all the files within the directory and zip them back up, renaming the output zip file to .xlsx
Attempt to load the file as an ExcelFile object in pandas using the following code:
You should get something like:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 371, in create_data_frame_sheet_file_object
excel = pd.ExcelFile(path_or_buffer=excel_file)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in init
self._reader = self._engines[engine](
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in init
super().init(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in init
self.book = self.load_workbook(self.handles.handle, engine_kwargs)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook
return load_workbook(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
reader.read()
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read
apply_stylesheet(self.archive, self.wb)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in init
self.name = name
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in set
raise TypeError(msg)
TypeError: <class 'openpyxl.styles.named_styles._NamedCellStyle'>.name should be <class 'str'> but value is <class 'NoneType'>
And later you get something like PanicException: index out of bounds: the len is 0 but the index is 0
Installed Versions
python : 3.9.16
pandas : 2.2.3
The text was updated successfully, but these errors were encountered: