Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dataclass by abstract class #176

Draft
wants to merge 37 commits into
base: main
Choose a base branch
from
Draft

Replace dataclass by abstract class #176

wants to merge 37 commits into from

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Jan 5, 2024

Rather than defining dataclasses with each field being a callable, this pull request uses abstract classes to define the interface. The aim of this pull request is to simplify the implementation of new parsers.

For codes like LAMMPS or ASE which already provide functions for the individual properties, writing an interface is as easy as:

class LammpsOutput(OutputStatic):
    def __init__(self, lmp):
        self._lmp = lmp

    def forces(self):
        return self._lmp.interactive_forces_getter()

    def energy(self):
        return self._lmp.interactive_energy_pot_getter()

    def stress(self):
        return self._lmp.interactive_pressures_getter()

    def volume(self):
        return self._lmp.interactive_volume_getter()

When the interface is based on writing input files and collecting output files, then a parser could be implemented as follows:

class LammpsOutput(OutputStatic):
    def __init__(self, working_directory, output):
        self._output_dict = collect_output(working_directory=working_directory, output=output)

    def forces(self):
        return self._output_dict["forces"]

    def energy(self):
        return self._output_dict["energy"]

    def stress(self):
        return self._output_dict["stress"]

    def volume(self):
        return self._output_dict["volume"]

Still if the collect_output() function already supports selective parsing by accepting a set of output keys via the output parameter than the use of the Abstract Class is unnecessary.

@jan-janssen jan-janssen added the format_black Launch the pyiron/actions black formatting routine label Jan 5, 2024
@jan-janssen jan-janssen requested a review from liamhuber January 5, 2024 22:34
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

# Conflicts:
#	atomistics/workflows/evcurve/workflow.py
#	atomistics/workflows/quasiharmonic.py
Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be able to do a detailed line-by line until Monday (available your Tuesday), but here is a high level view: overall I think it moves in a great direction and is a good continuation of the simplifications started by introducing a dataclass. I like also delaying the instantiation of the workflow-specific output classes until the last possible minute (i.e. return on analyse) as it removes an extra layer of variables without hurting readability.

Points for change

  1. (Definitely) The most serious flaw I see on a quick skim-through is that the class naming need some more intentionality so that meaning becomes clear. In some places it feels very weird, e.g. ASEExecutor inherits from (and only from) classes called Output*.
  2. (Definitely) The base Output class should get a docstring to guide future developers on what its intent is, and what behaviour it promises/child classes should promise. Crafting this docstring might help clarify in your mind good changes for (1).
  3. (Maybe) I'm not sure, but I think it probably makes more sense to have the methods on children of Output be properties (@property;@abstractmethod;def thing(self)) instead of method calls. Downside: perhaps these calls are not always trivially cheap and () might help indicate that; Upside: these are all (rightly) nouns, and always giving back some piece of data, so a property "feels good".
  4. (Later OK) The child classes (the intermediate still-abstract ones especially) are quite verbose, and would only get more so if we tack on @property decorators too. It might be nice to use a factory pattern to tighten this up, e.g. abstract_output_factory(class_name: str, *data: str | tuple[str, type]) -> type[Output], with invocations like OutputThermalExpansion = abstract_output_factory("OutputThermalExpansion", ("temperatures", np.ndarray), ("volumes", np.ndarray). At its heart this point is really a pure re-factoring, and could be left for a follow-up PR; verbosity sucks but is not a merge-blocker.

@jan-janssen jan-janssen added format_black Launch the pyiron/actions black formatting routine and removed format_black Launch the pyiron/actions black formatting routine labels Jan 7, 2024
@jan-janssen jan-janssen removed the format_black Launch the pyiron/actions black formatting routine label Jan 7, 2024
@jan-janssen
Copy link
Member Author

@liamhuber I addressed the points (1)-(3) and would prefer to merge this now and release it as a new 0.2.0 version.

# Conflicts:
#	atomistics/calculators/ase.py
#	atomistics/calculators/lammps/calculator.py
#	atomistics/calculators/lammps/helpers.py
#	atomistics/calculators/qe.py
#	atomistics/workflows/elastic/workflow.py
#	atomistics/workflows/evcurve/debye.py
#	atomistics/workflows/evcurve/workflow.py
#	atomistics/workflows/phonons/workflow.py
#	atomistics/workflows/quasiharmonic.py
@jan-janssen jan-janssen marked this pull request as draft January 8, 2024 18:23
# Conflicts:
#	atomistics/calculators/ase.py
#	atomistics/calculators/lammps/calculator.py
#	atomistics/calculators/lammps/helpers.py
#	atomistics/calculators/lammps/output.py
#	atomistics/calculators/qe.py
#	atomistics/shared/output.py
#	atomistics/shared/thermal_expansion.py
#	atomistics/workflows/elastic/workflow.py
#	atomistics/workflows/evcurve/debye.py
#	atomistics/workflows/evcurve/workflow.py
#	atomistics/workflows/phonons/workflow.py
#	atomistics/workflows/quasiharmonic.py
@jan-janssen
Copy link
Member Author

After merging a subset of the functionality as individual pull requests, this pull requests now only contains the refactoring and conversion of the data class to an abstract class.

@pmrv
Copy link
Contributor

pmrv commented Jan 10, 2024

I've build a sample PoC for what we discussed on Monday

from abc import ABCMeta, abstractmethod

class AbstractInterface(ABCMeta):
    def __new__(cls, name, bases, namespace):
        for k in namespace.get('__annotations__', {}):
            namespace[k] = abstractmethod(lambda self: None)
        return super().__new__(cls, name, bases, namespace)

class MyInterface(metaclass=AbstractInterface):
    foo: int
    bar: float

class MyIncompleteImplementation(MyInterface):
    pass

MyIncompleteImplementation() # Error complaining not everything is implemented

class MyCompleteImplementation(MyInterface):
    @property
    def foo(self):
        return 4
    @property
    def bar(self):
        return 4.2

MyCompleteImplementation() # No Error

or in concrete terms

# atomistics/shared/output.py

import numpy.typing as npt
class OutputStatic(metaclass=AbstractInterface):
    forces: npt.NDArray[float]
    energy: float
    stress: npt.NDArray[float]
    volume: float

# atomistics/ase.py

class ASEOutput(OutputStatic):
    def __init__(self, ase_structure, ase_calculator):
        self.structure = ase_structure
        self.structure.calc = ase_calculator

    @property
    def forces(self):
        return self.structure.get_forces()

    @property
    def energy(self):
        return self.structure.get_potential_energy()

    @property
    def stress(self):
        return self.structure.get_stress(voigt=False)

    @property
    def volume(self):
        return self.structure.get_volume()

@pmrv
Copy link
Contributor

pmrv commented Jan 10, 2024

This doesn't check the types from the annotations now, and implementers have to use property explicitly, but this could be added somewhat easily, if we decide on this approach.

@jan-janssen
Copy link
Member Author

This doesn't check the types from the annotations now, and implementers have to use property explicitly, but this could be added somewhat easily, if we decide on this approach.

I like the approach, I am just wondering if we are reinventing the wheel here, maybe one of the existing data classes or alternative implementations already solves this issue. I did a small search and found quite a series of different implementations:

But I am not yet exactly sure when to use which one and if we can make a general decision for pyiron as a whole to use one or the other. At least when looking at the following nested example for data classes https://gist.github.com/ricky-lim/d4ab3ae470d331ead3875f22f6f0580f I feel that this gets ugly, but I am also not sure if the other data classes do it better.

@pmrv
Copy link
Contributor

pmrv commented Jan 21, 2024

In the last meeting we had discussed a distinction between classes that inform atomistics how to obtain standardized quantities from specific codes and classes that carry the standardized quantities and store them.

The snippet I posted is basically syntactic sugar over using ABC/abstractmethod, i.e. it ensures that sub classes correctly implement the functionality required of them while keeping the interface definition concise. So it is a tool for the first kind of classes.

The frameworks that you linked are for classes of the second kind, so imo it's very orthogonal to the snippet (or even the whole PR). I still think it's worthwhile to adapt them, but I don't have strong opinions on the framework. Just data classes would probably be the simplest, until we find hard constraints that make them nonviable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants