Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify Datastructures instead of Methods #60

Open
s9105947 opened this issue Feb 23, 2022 · 2 comments
Open

Specify Datastructures instead of Methods #60

s9105947 opened this issue Feb 23, 2022 · 2 comments

Comments

@s9105947
Copy link

s9105947 commented Feb 23, 2022

The README of PICMI states that "The goal of the standard is to propose a set (or dictionary) of names and definitions [...]". However, the documentation at no point specifies sets or dictionaries, it always specifies __init__() methods. Which variables are set by __init__() is entirely unspecified (and has to be looked up in the code).

For the most cases this does not matter, e.g. the attribute name of a Species is later accessible as species.name.
For other cases this is not as clear: What properties does Simulation.add_species() modify? In which way?
I am vary of just reading the code, b/c there are no guarantees made by PICMI that this code will be stable.

In addition to that the behavior of the __init__() method is not clear: Most only copy their parameters to the attributes, but some perform more operations, e.g.:

grid = picmi.Cartesian3DGrid(number_of_cells=[192, 2048, 12],
                             lower_bound=[0, 0, 0],
                             upper_bound=[4e-5, 4e-5, 2e-6],
                             lower_boundary_conditions=3 * ["open"],
                             upper_boundary_conditions=3 * ["open"])
# now grid.nx == 192
grid.nx = 1237
# How many cells does the x axis now have?

Or put another way: PICMI currently specifies a user interfaces, but the object that is passed to the implementing PIC simulation is unspecified.

There are a couple of approaches to this, I want to propose these steps:

  1. Drop all methods from PICMI
  2. Add specifications for attributes of classes
  3. Auto-generate "typical" methods (__init__(), __eq__(), __hash__(), ...)
  4. Add methods for convenience, but they only modify the previously defined attributes, e.g. add_species()
  5. optional: Provide (protected) helper functions for implementations, e.g. derive mass&charge from particle type, replace None by default values where applicable etc.

I have attached the species definition as an example how I would imagine the specification to look like & and the test case that sketches its behavior.

This example uses a typing system. For that please refer to #58 .
Semantic checks are entirely omitted. Please see #59 for that.

Implementation:

import typing


# Idea: use decorator here
class Species:
    """
    describes a particle species

    For PICMI, a "particle species" is a set of particles, initialized from a
    common parameter set.

    Fundamental properties of the particle species are defined directly in a
    species object; more complex properties are delegated to other objects
    (which are stored as attributes).
    """

    name: str = None
    """
    name of this species

    Must be unique in the simulation.

    Can be used in the output to reference this specific species.
    """

    particle_type: typing.Optional[str] = None
    """
    Particle Name

    A string describing a single particle type as specified by the openPMD
    Species Type Extension.

    An implementation may derive additional properties from this attribute.

    If the particle_type is given, charge and mass may be omitted.

    Examples: He, electron
    """

    mass: typing.Optional[float] = None
    """
    mass of a single physical particle in kg

    MUST be None if particle_type is given.
    """

    charge: typing.Optional[float] = None
    """
    charge of a single physical particle in C

    MUST be None if particle_type is given.
    """

    charge_state: typing.Optional[int] = None
    """
    charge state of atoms

    The number of missing electrons of an (otherwise) neutral atom.

    MUST be None for particles that are not atoms.
    """

    # Note: use forward type declaration (string) here
    initial_distribution: "Distribution" = None
    """
    distribution at t=0

    Describes the density in space of this species.

    To not initialize any particles of this species explicitly provide density
    which is globally zero.
    """

    density_scale: float = 1.0
    """
    factor to be applied to initial_distribution

    Before applying the density according to initial_distribution, it is
    multiplied with the density_scale.

    This is intended to enable re-use a distribution objects.

    MUST be a float >0
    """

    particle_shape: typing.Optional[str] = None
    """
    shape of the particles of this species

    MUST be one of "NGP", "linear", "quadratic", "cubic"

    Can be left at None, in which case the particle shape of the owning
    simulation object will be used.
    """

Test case:

from .species import Species

import unittest
from .distribution import UniformDistribution
import picmi
import typing


class TestSpecies(unittest.TestCase):
    def setUp(self):
        print("This test case is not exhaustive!!")
        self.uniform_density = UniformDistribution(density=1)

    def test_constructor_basic(self):
        """constructor sets attributes"""
        species = Species(name="myname", particle_type="electron",
                          initial_distribution=self.uniform_density)
        self.assertEqual("myname", species.name)
        self.assertEqual("electron", species.particle_type)
        self.assertEqual(self.uniform_density, species.initial_distribution)

    def test_defaults(self):
        """attributes have defaults"""
        # note that this is not valid from a semantic point of view, and
        # automated checks of the constructor could make this test fail
        empty_implicit = Species()
        empty_explicit = Species(name=None, particle_type=None, mass=None,
                                 charge=None, charge_state=None,
                                 initial_distribution=None, density_scale=1.0,
                                 particle_shape=None)
        self.assertEqual(empty_explicit, empty_implicit)

    def test_constructor_codespecific(self):
        """code-specific params are okay, everything else is rejected"""
        picmi.codename = "dummypic"

        custom_allowed = Species(dummypic_favorite_icecream="Vanilla",
                                 dummypiclist=["my", "list", 1])
        self.assertEqual("Vanilla", custom_allowed.dummypic_favorite_icecream)
        self.assertEqual(["my", "list", 1], custom_allowed.dummypiclist)

        with self.assertRaisesRegex(NameError, ".*irregular_name.*"):
            # note: detected even if value is None
            Species(irregular_name=None)

    def test_typechecking(self):
        """auto typeconversion/rejection"""
        # (1) types are automagically converted if possible
        species = Species(name=1)
        species.charge_state = "17"
        self.assertEqual(17, species.charge_state)
        self.assertEqual("1", species.name)

        # (2) unconvertible types are rejected
        with self.assertRaises(TypeError):
            # note this could *in theory* be converted, but is rejected b/c 0.3
            # would be lost
            Species(charge_state=1.3)

        with self.assertRaises(TypeError):
            species.particle_shape = ["NGP"]

    def test_strict_attribute_checking(self):
        """unknown attributes are rejected"""
        # maybe this comment has to be replaced by some annotation
        class DummyPICSpecies(Species):
            dummypic_var: typing.Any
        picmi.codename = "dummypic"

        species = DummyPICSpecies()
        # known names ok:
        species.dummypic_var = 123
        species.name = "abc"

        # unkown names not allowed:
        with self.assertRaisesRegex(NameError, ".*dummypic_anythingelse.*"):
            species.dummypic_anythingelse = 123
        with self.assertRaisesRegex(NameError, ".*unkown_var.*"):
            species.unkown_var = None

    def test_hash_eq_generated(self):
        """methods __eq__() and __hash__() are auto-generated"""
        # non-exhaustive!!
        s1 = Species(name="abc")
        s2 = Species(name="abc")
        self.assertTrue(s1 is not s2)
        self.assertEqual(s1, s2)
        self.assertEqual(hash(s1), hash(s2))

        s1.name = "def"
        self.assertNotEqual(s1, s2)
        self.assertNotEqual(hash(s1), hash(s2))
        s2.name = "def"
        self.assertEqual(s1, s2)
        self.assertEqual(hash(s1), hash(s2))

        # automated inheritance is prohibited (b/c children could add unkown
        # attributes) -> children should use a decorator (or sth similar) to
        # generate their own hash/eq methods
        class DummyPICSpecies(Species):
            pass
        d1 = DummyPICSpecies()
        d2 = DummyPICSpecies()
        with self.assertRaises(NotImplementedError):
            hash(d1)
        with self.assertRaises(NotImplementedError):
            d1 == d2

Notably, this would be very close to how openPMD is specified, as purely a list of attributes.
(Which would then allow separating the reference implementation and the specification more clearly, as mentioned in #3)

@dpgrote
Copy link
Member

dpgrote commented Feb 24, 2022

From the description, it sounds like most of the issue is not the PICMI code itself, but uncertainty about what data is available in the class instances and how to access it. Instead of rewriting the PICMI classes, an alternate solution is to provide documentation for the implementers describing what attributes are available and what they are (including type information). This would be a set of guidelines to follow. For instance, regarding the add_species method, the guidelines would state the the list of species set up by the user can be found in the Simulation class in the attribute species, which is a Python list. It sounds like documentation like this would go a long way toward solving the issues you bring up.

I agree that there can be an issue with have some data stored in two different ways, e.g. the number of grid cells. I have long learned that there is only so much that can be done to limit the damage when users abuse code since it is impossible to predict what users will do. In this case though, perhaps a small clean up can be done so that only one of the two alternatives is saved internally, for example only storing number_of_cells but not the individual nx etc. The interface was written allowing both options to provide flexibility for users.

@s9105947
Copy link
Author

s9105947 commented Feb 24, 2022

It sounds like documentation like this would go a long way toward solving the issues you bring up.

Yes.
Currently there are two interface to PICMI:

  • "PICMI-in": the stuff that the user writes when using PICMI ("input parameters")
  • "PICMI-out": the data structure created from that consumed by an implementation

Both are close, but subtly different. For implementations this would acceptable, but:

One of the high-regarded workflows for Smilei is referencing the input parameters when processing the output.
In this case the user will use the same data structure as the implementing simulation, i.e. PICMI-out.
Now the user writes PICMI-in, works with PICMI-out, and maybe doesn't even notice the difference until one of the subtle differences bites their back.
Reducing PICMI to a set of arguments + auxiliary, convenient functions to modify these ensures that such differences are impossible to occur.

Note: This would also make serialization very simple, which would enable the process of reading PICMI during processing of output also when not working from a single python context.

Another short argument: The constructors are currently a lot of boilerplate code. (Admittedly, currently at an somewhat acceptable level.)

I believe that such a restructuring could be fully compatible to the current usage.
If you're not opposed to the general idea I can hack together a fork for demonstration during the next week.


I have long learned that there is only so much that can be done to limit the damage when users abuse code since it is impossible to predict what users will do.

  1. I'm young, naive, and feel like that we can push this "abusive zone" pretty far out of the way. (Ask me again in 10 years if I still think that.) Though I trust your experience here :)
  2. For this specific case I'm still trying to come up with a clean solution, which does not require boilerplating properties. If I do, I'll send a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants