Skip to content

Unhandled rust panic when processing some excel files #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kevinmccraneybp opened this issue Feb 10, 2025 · 1 comment
Open

Unhandled rust panic when processing some excel files #111

kevinmccraneybp opened this issue Feb 10, 2025 · 1 comment

Comments

@kevinmccraneybp
Copy link

Issue Description

My company processes excel files, and we often encounter errors in the way a file is constructed. We raise specific errors in cases when the file is encrypted, when the file cannot be opened, when there are formatting errors, etc. so we can notify other team members or clients that something is wrong with their data source. I observed a Rust panic which is not caught when using pandas-calamine engine to load an excel sheet that has structural formatting errors. This is a concern because it doesn't appear there's any way to handle the exception in Python, and thus we cannot surface the right kind of error.

Not sure if this belongs as an issue with you folks or with pyO3 since the rust bindings are managed through that library...

Expected Behavior

I would expect a different kind of exception to be raised, one native to the Python environment. It doesn't matter what.

Reproducible Example

  1. Make an excel document using Microsoft Excel. Rename the document's extension to .zip from .xlsx and extract the data as a zip file. The file contents will be extracted in a new directory. Make a modification to the xl/styles.xml file to remove the name attribute from any instance of the cellStyles tag, so it looks something like this:

<cellStyles count="1"><cellStyle xfId="0" builtinId="0" /></cellStyles></styleSheet>

  1. Select all the files within the directory and zip them back up, renaming the output zip file to .xlsx

  2. Attempt to load the file as an ExcelFile object in pandas using the following code:

try:
	e = pd.ExcelFile($YOUR_FILE_NAME, engine="openpyxl")
except Exception as exc: 
	print(exc)
	try:
		e = pd.ExcelFile($YOUR_FILE_NAME, engine="openpyxl")
	except Exception as ex:
		print(ex)

You should get something like:

-traceback Traceback (most recent call last): File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 367, in create_data_frame_sheet_file_object excel = pd.ExcelFile(path_or_buffer=excel_file, engine=pandas_engine) File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in __init__ self._reader = self._engines[engine]( File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in __init__ super().__init__( File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in __init__ self.book = self.load_workbook(self.handles.handle, engine_kwargs) File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook return load_workbook( File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook reader.read() File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read apply_stylesheet(self.archive, self.wb) File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet stylesheet = Stylesheet.from_tree(node) File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree return super(Stylesheet, cls).from_tree(node) File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree obj = desc.expected_type.from_tree(el) File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree obj = desc.expected_type.from_tree(el) File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree return cls(**attrib) File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in __init__ self.name = name File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in __set__ raise TypeError(msg) TypeError: .name should be but value is

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 371, in create_data_frame_sheet_file_object
excel = pd.ExcelFile(path_or_buffer=excel_file)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in init
self._reader = self._engines[engine](
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in init
super().init(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in init
self.book = self.load_workbook(self.handles.handle, engine_kwargs)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook
return load_workbook(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
reader.read()
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read
apply_stylesheet(self.archive, self.wb)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in init
self.name = name
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in set
raise TypeError(msg)
TypeError: <class 'openpyxl.styles.named_styles._NamedCellStyle'>.name should be <class 'str'> but value is <class 'NoneType'>

And later you get something like PanicException: index out of bounds: the len is 0 but the index is 0

Installed Versions

python : 3.9.16
pandas : 2.2.3

@dimastbk
Copy link
Owner

Hi! PanicException is inherited from python BaseException, so you can catch it and check the exception type. PanicException can be reexported to Python, but I don't think it's a good idea, see PyO3/pyo3#3918 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants