Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xlrd, used by Pandas to read Excel files no longer supports .xlsx Excel workbook files #183

Open
wittregr opened this issue Dec 16, 2020 · 4 comments
Labels
bug Something isn't working

Comments

@wittregr
Copy link
Collaborator

Version of pMuTT
pmutt 1.2.21

Describe the bug
Version 2.0+ of xlrd no longer supports reading Excel .xlxs files. This is the default Excel workbook file for current Excel version. Pandas uses xlrd to read Excel files. Since current versions of Excel use the .xlsx format reading Excel sheets with pmut i/o fails.

To Reproduce
conda install xlrd (Will install v 2.0.1 which does not support .xlsx files)
use pmutt to read data from a spreadsheet

Additional context
Short term work arrounds:

  1. Save Excel spreadsheets using the Excel 97-2003 Workbook format. This will save in .xls format and should still be readable
  2. Install an older version of xlrd. conda install xlrd=1.2.0 There is a warning that this could introduce a security issue but it will continue to read .xlsx files.
@wittregr wittregr added the bug Something isn't working label Dec 16, 2020
@jonlym
Copy link
Member

jonlym commented Dec 16, 2020

Looks like Pandas developers suggested to downgrade xlrd.

We can update the setup file to use the last working version.

@wittregr
Copy link
Collaborator Author

It might also be useful to lock the xlrd version to 1.2.0 to avoid accidentally updating it to a newer version. Add a file named "pinned" to your conda-meta folder (Usually in your Anaconda3 folder) with the line:

xlrd ==1.2.0

This will prevent any updates from updating xlrd to a newer version.

@hansgilead
Copy link

another option that worked for me is to specify engine='openpyxl' in the pd.read_excel call for .xlsx and later spreadsheets-- this shouldn't be necessary and will add in the complexity of trying to figure out in advance whether the spreadsheet you are trying to open will be .xls or .xlsx but if you're expecting a consistent file type this is another possible workaround until someone fixes pd.read_excel to pick the correct engine based on file extension.

@jonlym
Copy link
Member

jonlym commented Jan 2, 2021

That's a great suggestion, @hansgilead! Our users will probably only use 'xlsx' so this is a much more elegant solution than forcing users to use a certain version of xlrd.

@wittregr, I'll test this with a couple of our examples and make a new pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants