-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic sdf format input/output #107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One more suggestions: could you add a test with an input missing the $$$$ line? This can be done as follows (more or less):
import pytest
def test_truncated_sdf(tmpdir):
fn_short = os.path.join(tmpdir, "tooshort.sdf")
with path('iodata.test.data', 'example.sdf') as fn_sdf:
with open(fn_sdf) as fin, open(fn_short, "w") as fout:
for _ in range(20):
fout.write(next(fin))
with pytest.raises(IOError):
load_one(fn_short)
iodata/formats/sdf.py
Outdated
while True: | ||
try: | ||
words = next(lit) | ||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except ValueError: | |
except StopIteration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because next(lit) is an iterator? I changed it in the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, any iterator in Python should raise StopIteration
when calling next(iterator)
after it already produced the last element.
iodata/formats/sdf.py
Outdated
atcoords[i, 0] = float(words[0]) * angstrom | ||
atcoords[i, 1] = float(words[1]) * angstrom | ||
atcoords[i, 2] = float(words[2]) * angstrom | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this try-except block is based on the XYZ code, but the XYZ code is actually a poor example of how to use dictionaries in Python. Both should be fixed ideally, to avoid making this suggestion. In general in Python, if there is a way to get the same effect without try-except, then it is preferable for clarity (and speed). In this case, it could also be done like this:
atnum = sym2num.get(words[3].title())
if atnum is None:
atnum = int(words[3])
atnums[i] = atnum
Note that the fallback option should also have index 3. I'm not sure if SDF ever uses integers instead of symbols at index 3, so the fallback option may be redundant anyway? If the fallback is not needed, it could become one line instead:
atnum = sym2num.get(words[3].title(), 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your are right, only elements are used. I changed it to your suggestion of only line.
iodata/formats/sdf.py
Outdated
@document_load_many("SDF", ['atcoords', 'atnums', 'title']) | ||
def load_many(lit: LineIterator) -> Iterator[dict]: | ||
"""Do not edit this docstring. It will be overwritten.""" | ||
# SDF files with more molecueles are a simple concatenation of individual SDF files,' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo molecueles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, corrected
with path('iodata.test.data', 'example.sdf') as fn_sdf: | ||
mols = list(load_many(str(fn_sdf))) | ||
assert len(mols) == 2 | ||
assert mols[1].title == '24978481' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add before this line the following?
check_example(mols[0])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, good idea. Thanks!
I included all changes and left one question about the change of "ValueError" to "StopIteration" |
I just realized we have already a utility to generated truncated files. See https://github.com/theochem/iodata/blob/master/iodata/test/test_cp2k.py#L208 Could you use this? Thanks. |
This is a nice feature. Thanks to pointing it out. Its now implemented in the next commit. |
Thank you! Merging... |
Addresses partially #43 and #37