Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨feat: dataclasses with converters #20

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nstarman
Copy link
Member

@nstarman nstarman commented Dec 8, 2024

@mhvk, as promised 😎.
IMO this should be spun off into a separate utility package, so it can be used, e.g. for the coordinates APE. CC @jeffjennings & @adrn.

@nstarman nstarman force-pushed the feat-dataclass-with-converter branch from 56fabe8 to ae42f0c Compare December 8, 2024 23:53
@nstarman nstarman marked this pull request as ready for review December 8, 2024 23:55
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice. But I'm not quite sure whether we want to necessarily lock ourselves in too much: e.g., for python scalars, we don't really want to do np.asarray, right? Though I guess we can relax that later, and then having the ability to wrap the scalar in a light class that provides .shape() etc would be good.

Similarly, in principle one would like to worry as little as possible about the unit class (but again, can relax that later).

Probably also premature, but does this get executed only on first initialization, or also on replace? I.e., is there a performance penalty for every operation?

@nstarman
Copy link
Member Author

nstarman commented Dec 9, 2024

But I'm not quite sure whether we want to necessarily lock ourselves in too much: e.g., for python scalars, we don't really want to do np.asarray, right? Though I guess we can relax that later, and then having the ability to wrap the scalar in a light class that provides .shape() etc would be good.

Agreed. We probably want to do PythonScalar(float) rather than ndarray.

Similarly, in principle one would like to worry as little as possible about the unit class (but again, can relax that later).

Yes, the converter for the units field allows us to delegate determining what's a viable input to the unit constructor, rather than Quantity.

Probably also premature, but does this get executed only on first initialization, or also on replace? I.e., is there a performance penalty for every operation?

Yes there is. We can provide a past-path around this. e.g.

    @functools.wraps(dcls.__init__)  # give it the same signature
    def __init__(self, *args: Any, _fastpath: bool = False, **kwargs: Any) -> None:
        if _fastpath:
            return self.__init__.__wrapped__(self, *args, **kwargs)
        ...

Then the only performance penalty is in calling the wrapper init. I think we can mitigate even that function call overhead using mypyc to turn this decorator into C code.

@nstarman nstarman force-pushed the feat-dataclass-with-converter branch from ae42f0c to 14f06f7 Compare December 9, 2024 18:03
@nstarman nstarman requested a review from mhvk December 9, 2024 18:04
@nstarman
Copy link
Member Author

nstarman commented Dec 9, 2024

Done. I enabled fast-pathing.

@nstarman nstarman force-pushed the feat-dataclass-with-converter branch from 14f06f7 to cfafaff Compare December 11, 2024 06:22
@nstarman
Copy link
Member Author

@mhvk how does this look now?

@nstarman nstarman requested review from mhvk and removed request for mhvk December 30, 2024 19:48
value: Any
unit: u.UnitBase
value: Array = field(converter=_value_converter)
unit: u.UnitBase = field(converter=u.Unit)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, to make this completely unit-type agnostic, we would set the type to be a runtime-checkable protocol and the converter would be something like converter=lambda x: x if isinstance(x, UnitAPI) else config.get("units_backend")(x), where the user can configure a preferred units package. Or, if we keep this more closely tied to Astropy, we would have converter=lambda x: x if isinstance(x, UnitAPI) else u.Unit(x)

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be the way to go but worry that it is a bit too soon. Partially because I still think this is a bit too fast in turning python floats etc into numpy arrays -- after all, they promote differently, and also they are not annoyingly eager in converting everything else to ndarray.

But also because it is not currently strict enough: e.g., what do we do with Quantity(q, unit)? (with q either our own quantity or astropy's, or even an astropy Column) Do we bail? Do we convert, like astropy? If so, one needs access to the unit, and defining __init__ would be more appropriate.

My overall sense would be to postpone deciding on this, and first try to get things working assuming input is sane (though of course that should not stop us trying to reason things through if it isn't!).

@nstarman
Copy link
Member Author

nstarman commented Jan 4, 2025

Partially because I still think this is a bit too fast in turning python floats etc into numpy arrays -- after all, they promote differently, and also they are not annoyingly eager in converting everything else to ndarray.

Agreed about the promotion. I do think that strictly enforcing that Quantity.value is Array API will make maintenance significantly easier. Only the constructor needs to consider non Array API inputs. Everything after that is just Array API . If we merge this now I can start working on a PythonArrayScalar class that implements the API and is a thin wrapper around a Python Number. Then it will be trivial to change the value converter from x if isinstance(x, ArrayAPI) else np.array(x) to

if isinstance(x, ArrayAPI):
    return x
elif isinstance(Number):
    return PythonArrayScalar(x)
else:
    return np.array(x)

But also because it is not currently strict enough: e.g., what do we do with Quantity(q, unit)? (with q either our own quantity or astropy's, or even an astropy Column) Do we bail? Do we convert, like astropy? If so, one needs access to the unit, and defining init would be more appropriate.

I think we do what numpy does. np.ndarray can be used to construct arrays, but it expects good input and doesn't do much user handholding. If you want a more general constructor, then use np.array(...).
Let's do similarly for Quantity and quantity():

  • Quantity accepts 2 arguments, something that can be directly converted to the value, e.g. a non-quantity ArrayAPI object or Python scalar and something that can be interpreted as the unit using a unit constructor.
  • quantity() can do a lot more interpretation. quantity(Quantity) -> Self, quantity("1m") -> Quantity(PythonArrayScalar(1), Unit("m")), quantity({"value": [1], "unit": "m"}) -> Quantity(np.array([1]), Unit("m"))

Therefore I would suggest using __post_init__ or the converter= to do basic checking, e.g.

@dataclass
class Quantity
    def __post_init__(self) -> None:
        assert not isinstance(self.value, Quantity)

or

if isinstance(x, Quantity):
    raise ValueError("use quantity(...)")
elif isinstance(x, ArrayAPI):
    return x
elif isinstance(Number):
    return PythonArrayScalar(x)
else:
    return np.array(x)

IMO the lesson to learn from numpy ndarray is that we want a generalized constructor for Quantity, not that Quantity.__init__ needs to be that constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants