Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathlib object handling for Universe, SingleFrameReaderBase and Toplogy parsers (Issue #3937) #4535

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

talagayev
Copy link
Member

@talagayev talagayev commented Mar 27, 2024

Partially Fixes #3937. The issue mentioned the addition of support and testing for pathlib objects for SingleFrameReaderBase.

Currently the SingleFrameReaderBase is able to handle both pathlib and str as input for SingleFrameReaderBase to display this, this PR is focusing on tests that display the handling of pathlib and str as input for SingleFrameReaderBase .

Changes made in this Pull Request:

  • Addition of tests for pathlib object and str input for SingleFrameReaderBase in test_gro.py and test_lammps.py

Currently the tests are as mentioned for GRO and LAMMPS cases. SingleFrameReaderBase also recognizes INPCRD, CRD, NAMDBIN and DMS if given as a single input, so tests for these cases could also be added if required.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4535.org.readthedocs.build/en/4535/

@pep8speaks
Copy link

pep8speaks commented Mar 27, 2024

Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 119:1: W293 blank line contains whitespace

Line 548:1: W293 blank line contains whitespace
Line 549:47: W291 trailing whitespace
Line 553:80: E501 line too long (86 > 79 characters)

Comment last updated at 2024-12-17 20:40:44 UTC

Copy link

github-actions bot commented Mar 27, 2024

Linter Bot Results:

Hi @talagayev! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/10542931892/job/29210309319


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.23%. Comparing base (73acc9b) to head (c181021).
Report is 9 commits behind head on develop.

Current head c181021 differs from pull request most recent head 5dafd01

Please upload reports for the commit 5dafd01 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4535      +/-   ##
===========================================
- Coverage    93.66%   93.23%   -0.44%     
===========================================
  Files          168       12     -156     
  Lines        21248     1079   -20169     
  Branches      3917        0    -3917     
===========================================
- Hits         19902     1006   -18896     
+ Misses         888       73     -815     
+ Partials       458        0     -458     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@orbeckst
Copy link
Member

@hmacdope would you be able to look at this PR or assign to someone else, please?

@hmacdope
Copy link
Member

hmacdope commented Apr 1, 2024

Apologies for the delay @orbeckst @talagayev, was away over easter. Reviewing now.

@hmacdope hmacdope self-requested a review April 1, 2024 11:16
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talagayev you have the right idea of what to test, and your test implementations are good!

However to achieve better coverage and to make your tests more powerful you can instead add your tests to BaseReaderTest and _SingleFrameReader that cover the base API functionality for all relevant trajectory types. This is in estsuite/MDAnalysisTests/coordinates/base.py. :)

@talagayev
Copy link
Member Author

@hmacdope

@talagayev you have the right idea of what to test, and your test implementations are good!

However to achieve better coverage and to make your tests more powerful you can instead add your tests to BaseReaderTest and _SingleFrameReader that cover the base API functionality for all relevant trajectory types. This is in estsuite/MDAnalysisTests/coordinates/base.py. :)

Ah, I think i see :) i would than adjust it so that the tests are in BaseReaderTest and _SingleFrameReader.

Does it than make also sense to move the ReaderBase tests from #3935 also in the same way in a separate PR? since currently from the tests they cover two cases, one in the testsuite/MDAnalysisTests/coordinates/test_dcd.py for PSF&DCD and testsuite/MDAnalysisTests/coordinates/test_xdr.py for GRO&XTC ?

@talagayev
Copy link
Member Author

@hmacdope
I managed to get the test locally in _SingleFrameReader and it covers the cases, but trying to add it into BaseReaderTest leads to errors such as:

AttributeError: 'PosixPath' object has no attribute 'encode'

in the case of test_dcd.py. Here it has problems with the TestDCDReader with also having the same errors with the TRRReader and XTCReader , whereas it is fine in test_gro.py with the TestGROReader. Is it possible that TestDCDReader also does access BaseReaderTest, which leads to the error, since it possibly needs a different case of handling than TestGROReader. Does it make sense to leave the test for the SingleFrameReaderBase in _SingleFrameReader and make one for MultiframeReaderTest so that both cases are divided and tested separatly in their classes?

@talagayev
Copy link
Member Author

@hmacdope
Hey Hugo, I adjusted now the tests and moved them to base.py as you suggested. As I mentioned there were the problems with the cases when i applied the tests for the single frame pathway to formats such as DCD, XTC so the ones that are not SingleFrameReader formats and thus I had to set exceptions for these cases.
Could you take a look if everything is fine now? For the checks most of them failed on the codecov stage in the end, so i am not sure if it is an error from the codecov site. Thanks in advance :)

@hmacdope
Copy link
Member

@talagayev sorry for the delay here. I will review ASAP.

@hmacdope hmacdope self-requested a review May 27, 2024 06:52
@talagayev
Copy link
Member Author

@hmacdope all good, no worries :)

@hmacdope hmacdope changed the title Addition of tests for pathlib object handling of SingleFrameReaderBase (Issue #3937) Pathlib object handling for SingleFrameReaderBase and Toplogy parsers (Issue #3937) May 27, 2024
@hmacdope
Copy link
Member

@talagayev I have pushed some changes to your branch that fix the encode issue, so we shouldn't need to exclude some readers, and also added support for Pathlib.path in Universe and the topology parsers. I have adjusted CHANGELOG and PR title to match also.

As we are now co-authors on this I can no longer fairly review, so I will defer to one of the other @MDAnalysis/coredevs. Perhaps @tylerjereddy as the author of previous Pathlib support issues and PRs #3937 (if you have some spare cycles).

@hmacdope hmacdope changed the title Pathlib object handling for SingleFrameReaderBase and Toplogy parsers (Issue #3937) Pathlib object handling for Universe, SingleFrameReaderBase and Toplogy parsers (Issue #3937) May 28, 2024
@hmacdope
Copy link
Member

Ok looks like I messed some stuff up, Ill fix.

@talagayev
Copy link
Member Author

@hmacdope Hm strange error, I try to find the error going through the pytests and running the developer version of MDAnalysis with this version and can't track down the error currently.

if isinstance(filename, NamedStream):
self.filename = filename
else:
self.filename = str(filename)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here errors if we use self.filename = str(filename) in the test_mmtf.py
The lead to errors similar to this one:

FileNotFoundError: [Errno 2] No such file or directory: '<mmtf.api.mmtf_reader.MMTFDecoder object at 0x135416780>'

I managed to fix the error by adding:
else:
self.filename = str(filename)
if "MMTF" in self.filename:
self.filename = filename

basically putting a specific condition for MMTF cases, not the cleanest approach, but maybe this helps @hmacdope fixing the error in a better way :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look more into it now, the "MMTF" is a specific case, the other errors appear mainly because of the parmed parser which has both problems with the base.py in the coordinates and topology. It is possible to use in the base.py in the coordinates:

elif hasattr(filename, 'topology'):
self.filename = filename

This helps to fix the errors, although it is quite similar as to just directly using self.filename = filename without any if/elif/else conditions. For the base.py in topology I will try to figure out if there is any option to fix it, since again having there only self.filename = filename also fixes the errors with the parmed parser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talagayev I think the FileNotFoundError for mmtf above is because instead of a filename, sometimes a MMTFDecoder object is encountered. I think what you've got is fixing it, but it might be clearer to have a block like:

if isinstance(filename, NamedStream):
    self.filename = filename
elif 'mmtf' in str(filename.__class__):
    # mmtf case, where we've avoided a mmtf import in detection
    self.filename = filename

Where you've enumerated all the special cases explicitly in each elif branch, rather than hiding some special cases inside other branches

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, yes the option that I got worked, but was definitely not the best option, but yes was connected with mmtf and yes elif 'mmtf' in str(filename.__class__): looks much better :)

if isinstance(reader, MemoryReader):
skip_reason = "MemoryReader"
pytest.skip(f"Skipping test for Pathlib input with reason: {skip_reason}")
path = Path(reader.filename)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here adding an additional Line:
path = path.as_posix())

Fixes the following Failed Pytest in 4 cases:
AttributeError: 'PosixPath' object has no attribute 'encode'

@@ -39,6 +40,7 @@ Fixes
* Fix groups.py doctests using sphinx directives (Issue #3925, PR #4374)

Enhancements
* Handling of Pathlib.Path in SingleFrameReaderBase, Topology and Universe (Issue #3937)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit vague and doesn't actually tell me what is fixed or what is now possible. e.g. Is it that pathlib.Paths are now correctly handled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, originally it was only targeting pathlib object handling of SingleFrameReaderBase in specific cases, but with the adjustment of @hmacdope it would cover Pathlib.Paths for all cases if the errros that appear currently due to the changes in the base.py in topology and also the base.py in coordinates, which could be fixed with

elif hasattr(filename, 'topology'):
self.filename = filename

but this would just exclude the cases of topology so I am not sure if this would fix it and cover then all the Pathlib.Path cases

@orbeckst
Copy link
Member

@hmacdope would it be worthwhile reviewing again? Can this PR be pushed across the finish line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MAINT, ENH: more pathlib support tasks
5 participants