-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace dataclass by abstract class #176
base: main
Are you sure you want to change the base?
Conversation
…ormat # Conflicts: # atomistics/calculators/ase.py # atomistics/calculators/qe.py # atomistics/shared/thermal_expansion.py # atomistics/workflows/evcurve/debye.py # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/phonons/workflow.py # atomistics/workflows/quasiharmonic.py
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
# Conflicts: # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/quasiharmonic.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't be able to do a detailed line-by line until Monday (available your Tuesday), but here is a high level view: overall I think it moves in a great direction and is a good continuation of the simplifications started by introducing a dataclass. I like also delaying the instantiation of the workflow-specific output classes until the last possible minute (i.e. return on analyse) as it removes an extra layer of variables without hurting readability.
Points for change
- (Definitely) The most serious flaw I see on a quick skim-through is that the class naming need some more intentionality so that meaning becomes clear. In some places it feels very weird, e.g.
ASEExecutor
inherits from (and only from) classes calledOutput*
. - (Definitely) The base
Output
class should get a docstring to guide future developers on what its intent is, and what behaviour it promises/child classes should promise. Crafting this docstring might help clarify in your mind good changes for (1). - (Maybe) I'm not sure, but I think it probably makes more sense to have the methods on children of
Output
be properties (@property;@abstractmethod;def thing(self)
) instead of method calls. Downside: perhaps these calls are not always trivially cheap and()
might help indicate that; Upside: these are all (rightly) nouns, and always giving back some piece of data, so a property "feels good". - (Later OK) The child classes (the intermediate still-abstract ones especially) are quite verbose, and would only get more so if we tack on
@property
decorators too. It might be nice to use a factory pattern to tighten this up, e.g.abstract_output_factory(class_name: str, *data: str | tuple[str, type]) -> type[Output]
, with invocations likeOutputThermalExpansion = abstract_output_factory("OutputThermalExpansion", ("temperatures", np.ndarray), ("volumes", np.ndarray)
. At its heart this point is really a pure re-factoring, and could be left for a follow-up PR; verbosity sucks but is not a merge-blocker.
@liamhuber I addressed the points (1)-(3) and would prefer to merge this now and release it as a new |
# Conflicts: # atomistics/calculators/ase.py # atomistics/calculators/lammps/calculator.py # atomistics/calculators/lammps/helpers.py # atomistics/calculators/qe.py # atomistics/workflows/elastic/workflow.py # atomistics/workflows/evcurve/debye.py # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/phonons/workflow.py # atomistics/workflows/quasiharmonic.py
# Conflicts: # atomistics/calculators/ase.py # atomistics/calculators/lammps/calculator.py # atomistics/calculators/lammps/helpers.py # atomistics/calculators/lammps/output.py # atomistics/calculators/qe.py # atomistics/shared/output.py # atomistics/shared/thermal_expansion.py # atomistics/workflows/elastic/workflow.py # atomistics/workflows/evcurve/debye.py # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/phonons/workflow.py # atomistics/workflows/quasiharmonic.py
After merging a subset of the functionality as individual pull requests, this pull requests now only contains the refactoring and conversion of the data class to an abstract class. |
I've build a sample PoC for what we discussed on Monday from abc import ABCMeta, abstractmethod
class AbstractInterface(ABCMeta):
def __new__(cls, name, bases, namespace):
for k in namespace.get('__annotations__', {}):
namespace[k] = abstractmethod(lambda self: None)
return super().__new__(cls, name, bases, namespace)
class MyInterface(metaclass=AbstractInterface):
foo: int
bar: float
class MyIncompleteImplementation(MyInterface):
pass
MyIncompleteImplementation() # Error complaining not everything is implemented
class MyCompleteImplementation(MyInterface):
@property
def foo(self):
return 4
@property
def bar(self):
return 4.2
MyCompleteImplementation() # No Error or in concrete terms # atomistics/shared/output.py
import numpy.typing as npt
class OutputStatic(metaclass=AbstractInterface):
forces: npt.NDArray[float]
energy: float
stress: npt.NDArray[float]
volume: float
# atomistics/ase.py
class ASEOutput(OutputStatic):
def __init__(self, ase_structure, ase_calculator):
self.structure = ase_structure
self.structure.calc = ase_calculator
@property
def forces(self):
return self.structure.get_forces()
@property
def energy(self):
return self.structure.get_potential_energy()
@property
def stress(self):
return self.structure.get_stress(voigt=False)
@property
def volume(self):
return self.structure.get_volume() |
This doesn't check the types from the annotations now, and implementers have to use |
I like the approach, I am just wondering if we are reinventing the wheel here, maybe one of the existing data classes or alternative implementations already solves this issue. I did a small search and found quite a series of different implementations:
But I am not yet exactly sure when to use which one and if we can make a general decision for pyiron as a whole to use one or the other. At least when looking at the following nested example for data classes https://gist.github.com/ricky-lim/d4ab3ae470d331ead3875f22f6f0580f I feel that this gets ugly, but I am also not sure if the other data classes do it better. |
In the last meeting we had discussed a distinction between classes that inform The snippet I posted is basically syntactic sugar over using The frameworks that you linked are for classes of the second kind, so imo it's very orthogonal to the snippet (or even the whole PR). I still think it's worthwhile to adapt them, but I don't have strong opinions on the framework. Just data classes would probably be the simplest, until we find hard constraints that make them nonviable. |
Rather than defining dataclasses with each field being a callable, this pull request uses abstract classes to define the interface. The aim of this pull request is to simplify the implementation of new parsers.
For codes like LAMMPS or ASE which already provide functions for the individual properties, writing an interface is as easy as:
When the interface is based on writing input files and collecting output files, then a parser could be implemented as follows:
Still if the
collect_output()
function already supports selective parsing by accepting a set of output keys via theoutput
parameter than the use of the Abstract Class is unnecessary.