-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Csvxlsx file reader #664
Csvxlsx file reader #664
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #664 +/- ##
===========================================
- Coverage 94.64% 94.60% -0.04%
===========================================
Files 73 73
Lines 4759 4800 +41
===========================================
+ Hits 4504 4541 +37
- Misses 255 259 +4 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good. I've requested a minor change on the failure modes.
One thing - I'm not 100% sure about how the try
stack is a mix of CSV failure modes and Excel failure modes without it being clear which one is which. That seems harder to maintain. We could split them, but then you'd duplicate the FileNotFoundError, unless the code specifically tests for the file using Path().exists()
first - but that's bad practice because the file could in theory vanish between the two calls.
Should we just have separate CSV and Excel reader functions? It's more verbose but feels cleaner. Thoughts?
except Exception as err: | ||
to_raise = Exception(f"Unidentified exception opening {file}: {err}") | ||
LOGGER.critical(to_raise) | ||
raise to_raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth explicitly adding BadZipError - it's by far the most likely thing to happen and we can effectively report it as "Does not appear to be an Excel file". The general Exception can then handle anything else. I'd add a comment on that exception to say that openpyxl
has a whole bunch of assorted failure modes for corrupt/non-excel Zip files and we're not going to try and track down an exhaustive list.
You can then add an actual zip file of (e.g.) a simple text file to test that failure mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a corrupted excel file which triggers that error - I think that should cover that failure case?
Yeah, it already pained me a bit to have the duplication from |
@davidorme I pushed some changes to split up the tests and now the builds are failing. It seems like it is a problem with the poetry version in |
@sallymatson No - that's not you. For reasons I don't understand, The only problem being Can you push a change to to - name: Install Poetry
uses: abatilo/[email protected]
with:
poetry-version: 1.5.1 |
Aha! It did exist briefly, and the maintainer pulled it. abatilo/actions-poetry#88 |
@sallymatson Is this ready for re-review? If so, can you re-request review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I guess I'd also add something to the load_excel
that says at present the data is explicitly only loaded from the first sheet. We could fix that, but not now?
Description
Added a dataframe reader that uses pandas to handle csv and excel input. Wrote tests for the functionality of both. Added
openpyxl
to poetry as a dependency for reading excel files.Fixes # (issue)
Type of change
Key checklist
pre-commit
checks:$ pre-commit run -a
$ poetry run pytest
Further checks