Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic sdf format input/output #107

Merged
merged 6 commits into from
May 22, 2019
Merged

Basic sdf format input/output #107

merged 6 commits into from
May 22, 2019

Conversation

evohringer
Copy link
Contributor

Addresses partially #43 and #37

Copy link
Member

@tovrstra tovrstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One more suggestions: could you add a test with an input missing the $$$$ line? This can be done as follows (more or less):

import pytest
def test_truncated_sdf(tmpdir):
    fn_short = os.path.join(tmpdir, "tooshort.sdf")
    with path('iodata.test.data', 'example.sdf') as fn_sdf:
        with open(fn_sdf) as fin, open(fn_short, "w") as fout:
            for _ in range(20):
                fout.write(next(fin))
    with pytest.raises(IOError):
        load_one(fn_short)

while True:
try:
words = next(lit)
except ValueError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except ValueError:
except StopIteration:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because next(lit) is an iterator? I changed it in the commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any iterator in Python should raise StopIteration when calling next(iterator) after it already produced the last element.

atcoords[i, 0] = float(words[0]) * angstrom
atcoords[i, 1] = float(words[1]) * angstrom
atcoords[i, 2] = float(words[2]) * angstrom
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this try-except block is based on the XYZ code, but the XYZ code is actually a poor example of how to use dictionaries in Python. Both should be fixed ideally, to avoid making this suggestion. In general in Python, if there is a way to get the same effect without try-except, then it is preferable for clarity (and speed). In this case, it could also be done like this:

atnum = sym2num.get(words[3].title())
if atnum is None:
    atnum = int(words[3])
atnums[i] = atnum

Note that the fallback option should also have index 3. I'm not sure if SDF ever uses integers instead of symbols at index 3, so the fallback option may be redundant anyway? If the fallback is not needed, it could become one line instead:

atnum = sym2num.get(words[3].title(), 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your are right, only elements are used. I changed it to your suggestion of only line.

@document_load_many("SDF", ['atcoords', 'atnums', 'title'])
def load_many(lit: LineIterator) -> Iterator[dict]:
"""Do not edit this docstring. It will be overwritten."""
# SDF files with more molecueles are a simple concatenation of individual SDF files,'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo molecueles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, corrected

with path('iodata.test.data', 'example.sdf') as fn_sdf:
mols = list(load_many(str(fn_sdf)))
assert len(mols) == 2
assert mols[1].title == '24978481'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add before this line the following?

check_example(mols[0])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, good idea. Thanks!

@evohringer
Copy link
Contributor Author

I included all changes and left one question about the change of "ValueError" to "StopIteration"

@tovrstra
Copy link
Member

I just realized we have already a utility to generated truncated files. See https://github.com/theochem/iodata/blob/master/iodata/test/test_cp2k.py#L208 Could you use this? Thanks.

@evohringer
Copy link
Contributor Author

This is a nice feature. Thanks to pointing it out. Its now implemented in the next commit.

@tovrstra
Copy link
Member

Thank you! Merging...

@tovrstra tovrstra merged commit 55a71fe into theochem:master May 22, 2019
@evohringer evohringer deleted the sdf_inout branch May 22, 2019 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants