Replace dataclass by abstract class #176

jan-janssen · 2024-01-05T21:59:30Z

Rather than defining dataclasses with each field being a callable, this pull request uses abstract classes to define the interface. The aim of this pull request is to simplify the implementation of new parsers.

For codes like LAMMPS or ASE which already provide functions for the individual properties, writing an interface is as easy as:

class LammpsOutput(OutputStatic):
    def __init__(self, lmp):
        self._lmp = lmp

    def forces(self):
        return self._lmp.interactive_forces_getter()

    def energy(self):
        return self._lmp.interactive_energy_pot_getter()

    def stress(self):
        return self._lmp.interactive_pressures_getter()

    def volume(self):
        return self._lmp.interactive_volume_getter()

When the interface is based on writing input files and collecting output files, then a parser could be implemented as follows:

class LammpsOutput(OutputStatic):
    def __init__(self, working_directory, output):
        self._output_dict = collect_output(working_directory=working_directory, output=output)

    def forces(self):
        return self._output_dict["forces"]

    def energy(self):
        return self._output_dict["energy"]

    def stress(self):
        return self._output_dict["stress"]

    def volume(self):
        return self._output_dict["volume"]

Still if the collect_output() function already supports selective parsing by accepting a set of output keys via the output parameter than the use of the Abstract Class is unnecessary.

…ormat # Conflicts: # atomistics/calculators/ase.py # atomistics/calculators/qe.py # atomistics/shared/thermal_expansion.py # atomistics/workflows/evcurve/debye.py # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/phonons/workflow.py # atomistics/workflows/quasiharmonic.py

review-notebook-app · 2024-01-06T07:05:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

# Conflicts: # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/quasiharmonic.py

liamhuber

I won't be able to do a detailed line-by line until Monday (available your Tuesday), but here is a high level view: overall I think it moves in a great direction and is a good continuation of the simplifications started by introducing a dataclass. I like also delaying the instantiation of the workflow-specific output classes until the last possible minute (i.e. return on analyse) as it removes an extra layer of variables without hurting readability.

Points for change

(Definitely) The most serious flaw I see on a quick skim-through is that the class naming need some more intentionality so that meaning becomes clear. In some places it feels very weird, e.g. ASEExecutor inherits from (and only from) classes called Output*.
(Definitely) The base Output class should get a docstring to guide future developers on what its intent is, and what behaviour it promises/child classes should promise. Crafting this docstring might help clarify in your mind good changes for (1).
(Maybe) I'm not sure, but I think it probably makes more sense to have the methods on children of Output be properties (@property;@abstractmethod;def thing(self)) instead of method calls. Downside: perhaps these calls are not always trivially cheap and () might help indicate that; Upside: these are all (rightly) nouns, and always giving back some piece of data, so a property "feels good".
(Later OK) The child classes (the intermediate still-abstract ones especially) are quite verbose, and would only get more so if we tack on @property decorators too. It might be nice to use a factory pattern to tighten this up, e.g. abstract_output_factory(class_name: str, *data: str | tuple[str, type]) -> type[Output], with invocations like OutputThermalExpansion = abstract_output_factory("OutputThermalExpansion", ("temperatures", np.ndarray), ("volumes", np.ndarray). At its heart this point is really a pure re-factoring, and could be left for a follow-up PR; verbosity sucks but is not a merge-blocker.

jan-janssen · 2024-01-07T10:18:54Z

@liamhuber I addressed the points (1)-(3) and would prefer to merge this now and release it as a new 0.2.0 version.

# Conflicts: # atomistics/calculators/ase.py # atomistics/calculators/lammps/calculator.py # atomistics/calculators/lammps/helpers.py # atomistics/calculators/qe.py # atomistics/workflows/elastic/workflow.py # atomistics/workflows/evcurve/debye.py # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/phonons/workflow.py # atomistics/workflows/quasiharmonic.py

# Conflicts: # atomistics/calculators/ase.py # atomistics/calculators/lammps/calculator.py # atomistics/calculators/lammps/helpers.py # atomistics/calculators/lammps/output.py # atomistics/calculators/qe.py # atomistics/shared/output.py # atomistics/shared/thermal_expansion.py # atomistics/workflows/elastic/workflow.py # atomistics/workflows/evcurve/debye.py # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/phonons/workflow.py # atomistics/workflows/quasiharmonic.py

jan-janssen · 2024-01-08T21:56:12Z

After merging a subset of the functionality as individual pull requests, this pull requests now only contains the refactoring and conversion of the data class to an abstract class.

pmrv · 2024-01-10T11:16:33Z

I've build a sample PoC for what we discussed on Monday

from abc import ABCMeta, abstractmethod

class AbstractInterface(ABCMeta):
    def __new__(cls, name, bases, namespace):
        for k in namespace.get('__annotations__', {}):
            namespace[k] = abstractmethod(lambda self: None)
        return super().__new__(cls, name, bases, namespace)

class MyInterface(metaclass=AbstractInterface):
    foo: int
    bar: float

class MyIncompleteImplementation(MyInterface):
    pass

MyIncompleteImplementation() # Error complaining not everything is implemented

class MyCompleteImplementation(MyInterface):
    @property
    def foo(self):
        return 4
    @property
    def bar(self):
        return 4.2

MyCompleteImplementation() # No Error

or in concrete terms

# atomistics/shared/output.py

import numpy.typing as npt
class OutputStatic(metaclass=AbstractInterface):
    forces: npt.NDArray[float]
    energy: float
    stress: npt.NDArray[float]
    volume: float

# atomistics/ase.py

class ASEOutput(OutputStatic):
    def __init__(self, ase_structure, ase_calculator):
        self.structure = ase_structure
        self.structure.calc = ase_calculator

    @property
    def forces(self):
        return self.structure.get_forces()

    @property
    def energy(self):
        return self.structure.get_potential_energy()

    @property
    def stress(self):
        return self.structure.get_stress(voigt=False)

    @property
    def volume(self):
        return self.structure.get_volume()

pmrv · 2024-01-10T11:18:38Z

This doesn't check the types from the annotations now, and implementers have to use property explicitly, but this could be added somewhat easily, if we decide on this approach.

jan-janssen · 2024-01-10T12:15:38Z

This doesn't check the types from the annotations now, and implementers have to use property explicitly, but this could be added somewhat easily, if we decide on this approach.

I like the approach, I am just wondering if we are reinventing the wheel here, maybe one of the existing data classes or alternative implementations already solves this issue. I did a small search and found quite a series of different implementations:

But I am not yet exactly sure when to use which one and if we can make a general decision for pyiron as a whole to use one or the other. At least when looking at the following nested example for data classes https://gist.github.com/ricky-lim/d4ab3ae470d331ead3875f22f6f0580f I feel that this gets ugly, but I am also not sure if the other data classes do it better.

pmrv · 2024-01-21T14:32:20Z

In the last meeting we had discussed a distinction between classes that inform atomistics how to obtain standardized quantities from specific codes and classes that carry the standardized quantities and store them.

The snippet I posted is basically syntactic sugar over using ABC/abstractmethod, i.e. it ensures that sub classes correctly implement the functionality required of them while keeping the interface definition concise. So it is a tool for the first kind of classes.

The frameworks that you linked are for classes of the second kind, so imo it's very orthogonal to the snippet (or even the whole PR). I still think it's worthwhile to adapt them, but I don't have strong opinions on the framework. Just data classes would probably be the simplest, until we find hard constraints that make them nonviable.

jan-janssen and others added 17 commits January 5, 2024 15:06

Define the generic format in a separate file

f5aeeef

fix bug

c4f5197

make_output_dataclass()

8f84cbe

black formating

7a5bc1a

no more dataclass

51462e2

Initialize dataclass only shortly before usage

a3919d0

Fix LAMMPS bug

f2f6edb

define function calls directly

c70d2ed

refactor get call

cc938b4

Format black

919e34d

Merge remote-tracking branch 'origin/main' into generic_format

d0fefe3

Replace dataclass by abstract class

ffcd881

fix phonon jobs

33a183b

fit energy volume curve

de46f34

more fixes

aeae541

return tuple from get_keys()

6d2e837

jan-janssen added the format_black Launch the pyiron/actions black formatting routine label Jan 5, 2024

Format black

1acb557

jan-janssen requested a review from liamhuber January 5, 2024 22:34

jan-janssen added 7 commits January 5, 2024 23:36

fix thermal expasion

bbd6e99

Merge remote-tracking branch 'origin/abstract_class' into abstract_class

eea1e4c

fix coverage

9cd9cb9

fix function list

ab97e6c

be more explicit

ddcf101

black formatting

a6b774f

rename output parameter to output_keys

0665c4d

jan-janssen and others added 2 commits January 6, 2024 08:08

Update output.py

fc12349

fix test

f100af0

jan-janssen added 4 commits January 6, 2024 08:22

fix signature test

cb2c5c6

use type hints in abstract class

c23292a

Merge remote-tracking branch 'origin/main' into abstract_class

7650963

# Conflicts: # atomistics/workflows/evcurve/workflow.py # atomistics/workflows/quasiharmonic.py

updates

2b2681c

liamhuber reviewed Jan 7, 2024

View reviewed changes

introduce output properties and rename output classes

c183384

jan-janssen added format_black Launch the pyiron/actions black formatting routine and removed format_black Launch the pyiron/actions black formatting routine labels Jan 7, 2024

pyiron-runner and others added 2 commits January 7, 2024 09:46

Format black

d390579

fix test

d0f55ed

jan-janssen removed the format_black Launch the pyiron/actions black formatting routine label Jan 7, 2024

Add Docstrings to output class

803c889

liamhuber mentioned this pull request Jan 8, 2024

Improve abstraction for data collections (e.g. Output) #181

Open

jan-janssen mentioned this pull request Jan 8, 2024

Rename output parameter to output_keys #180

Merged

jan-janssen marked this pull request as draft January 8, 2024 18:23

This was referenced Jan 8, 2024

Rename fields() to keys() #182

Merged

Introduce get_thermal_expansion_output() function #183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace dataclass by abstract class #176

Replace dataclass by abstract class #176

jan-janssen commented Jan 5, 2024 •

edited

Loading

review-notebook-app bot commented Jan 6, 2024

liamhuber left a comment

jan-janssen commented Jan 7, 2024

jan-janssen commented Jan 8, 2024

pmrv commented Jan 10, 2024 •

edited

Loading

pmrv commented Jan 10, 2024

jan-janssen commented Jan 10, 2024

pmrv commented Jan 21, 2024

Replace dataclass by abstract class #176

Are you sure you want to change the base?

Replace dataclass by abstract class #176

Conversation

jan-janssen commented Jan 5, 2024 • edited Loading

review-notebook-app bot commented Jan 6, 2024

liamhuber left a comment

Choose a reason for hiding this comment

jan-janssen commented Jan 7, 2024

jan-janssen commented Jan 8, 2024

pmrv commented Jan 10, 2024 • edited Loading

pmrv commented Jan 10, 2024

jan-janssen commented Jan 10, 2024

pmrv commented Jan 21, 2024

jan-janssen commented Jan 5, 2024 •

edited

Loading

pmrv commented Jan 10, 2024 •

edited

Loading