Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantity API #23

Open
andrewgsavage opened this issue Jan 4, 2025 · 35 comments
Open

Quantity API #23

andrewgsavage opened this issue Jan 4, 2025 · 35 comments

Comments

@andrewgsavage
Copy link

There has been some interest in a Quantity or Units API, similar to the array API standard.

hgrecco/pint#2101
astropy/astropy#8210
astropy/astropy#13460
astropy/astropy-APEs#91
https://pydims.github.io/pydims/developer/index.html

A Quantity API that standardises methods and attributes would make it easier to write code that supports multiple unit libraries. At present this is difficult and has lead to multiple integration packages being written to support units, eg pint-xarray and xarray-quantity.

Although the implementations of the unit libraries differ, they share the same core concepts of a Quantity and Unit. I echo the suggestion to create a Protocol, similiar to https://github.com/nstarman/units/tree/main/src/units/api However I think the first version of a Quantity API should have a smaller scope to make it easier to agree on and adopt.

At time of writing, support for the array api is in development. An initial implementation suggests standardising the following methods would allow an implementation to be used across multiple libraries.

Quantity.__init__(value, units)
Quantity.units / .unit
Quantity.magnitude / .value
Quantity.m_as(units) / .to_value
Unit.__pow__
Unit.__add__
Unit.__sub__
Unit.__mul__
Unit.__div__

Unit.__sub__ is used to get a delta unit, eg for temperature. Quantity.__sub__ could be used instead.

That's not many functions at all!
pint-pandas looks like it'd also need Unit(unit_string) and I've used Quantity.to(Unit).magnitude instead of .m_as. It also uses pint's errors and formatting but that's quite complex.

Equally I think this could be easy for the unit libraries to implement; for example pint would add Quantity.value that returns Quantity.magnitude, astropy would add Quantity.units that returns Quantity.unit and so on (attribute names tbd!), there's no need to depreciate the current attributes. It would be good to change documentation to use the standardised methods to encourage their use.

I'm not against standardising other classes or methods like Systems or <<, I'd just rather do so in a future version.

@lucascolley
Copy link

Slightly off-topic, but I was wondering about a potential name for a standardised project. How about pyquantity (perhaps stylised as PyQuantity)? It seems like the standardised quantity/unit API and support for functions in the array API standard could belong outside of the astropy org.

@andrewgsavage
Copy link
Author

andrewgsavage commented Jan 7, 2025 via email

@lucascolley
Copy link

Or QuantiPy!

Haha I did check that, but there is https://pypi.org/project/QuantiPy/

Perhaps it would be a fit for data-apis.org ?

I would be open to that, although the "standard" set by the work on the array API standard is quite high, and may be a little overkill for the task at hand. It might make more sense to get some rough prototypes together first and see how collaboration goes. Then if it looks like there are difficult decisions to make, we can look to following the data-driven approach and https://github.com/data-apis/governance/blob/main/process_document.md.

@SimonHeybrock
Copy link

SimonHeybrock commented Jan 7, 2025

I feel that a Units API should be favored over a Quantity API. Or at the very least, both should exist. The motivation for this is the (downsides of) nesting. People also want named dimensions (Xarray, NamedArray, Scipp, PyDims, ...), masks, and more (vectors, bin-edge arrays, ...). If we define, say,

Quantity.__init__(value, units)

then each of the extra components such as masks or dimension names may need to be constructed and accessed in a nested manner. We'd get something like

arr = MaskedArray(NamedArray(('x',), mask), NamedArray(('x',), Quantity(np.arange(5), 'meter')))
# with value accessed as
arr.value.value.value

which is obviously a problem. One could try to add special accessors at different level, but I think this will quickly get out of hand. I have experimented with something like this a while ago (see https://github.com/scipp/scippx and https://discuss.scientific-python.org/t/multiple-duck-array-layers-composability-implementations-and-usability-concerns/552) and my conclusion was that it probably is not a workable solution.

Thus, the approach I favor is what I have outlined in PyDims: Have API standards on the lowest level for the various components (mainly array and unit, but there may be more such as coords), then integrate all components in a single step. Now, integration in a single step does not imply that this would be a single library doing the integration. People will always have different and partially conflicting requirements so I would assume we would have multiple such libraries coexist, such as a plain Quantity library in addition to others that also add dimension names or masks.

At least this is how for my thought process went. It does leave important questions open though. For example, one would like to use SciPy and friends with units but it is not clear to me how to achieve this.

@nstarman
Copy link
Member

nstarman commented Jan 7, 2025

I agree about the need for both a Quantity and Units API and also that they should probably be separate, but developed in dialogue with one another. In the various versions I've coded up (https://github.com/nstarman/units, https://github.com/nstarman/astropy-APEs/blob/units-quantity-2.0/APE25/report.pdf) I proposed a (pseudocode with illustrative names and attributes)

class QuantityAPI(Protocol[Value]):
    value: Value
    unit: ...
    def to_unit(self, unit): ...

class ArrayAPI(Protocol): ...

class QuantityArrayAPI(ArrayAPI , QuantityAPI[ArrayAPIType], Protocol): ...

So long as we all agree on a common Quantity API, and likewise a common Units API, then it's totally fine to have a proliferation of implementing libraries with different focuses, e.g. non-array values like python scalars or lists, or specific Q subclasses like longitudes, or masked values, or named fields. If there's eventually a Mask API then a masked Q class would be the intersection type, like QuantityArrayAPI is for the QuantityAPI and ArrayAPI.

@lucascolley
Copy link

lucascolley commented Jan 7, 2025

@SimonHeybrock at least while we are in array API compatible territory, I think there is a solution, but I haven't tried it yet. Idea:

  • in all creation functions (asarray, arange etc.) allow wrapping via a kwarg (e.g. units=..., mask=...) and accept arbitrary **kwargs
  • pass **kwargs to the level below

I think this pattern should work as long as you have array API compatible layers throughout, except maybe for the top layer, which only needs to consume standard arrays and provide the asarray interface. So I think something like this should work:

from pint_array import pint_namespace, UnitRegistry
from marray import marray_namespace
# XXX: imagine that `dask_namespace` exists
from dask.array import dask_namespace
import cupy as cp

ureg = UnitRegistry()

dacp = dask_namespace(cupy)
pmdacp = pint_namespace(marray_namespace(dacp))
x = pmdacp.arange(4, chunks=2, mask=dacp.asarray([False, True, False, True]), units=ureg.pint)

@SimonHeybrock
Copy link

@lucascolley That solves the manual initialization, but the problem goes deeper, I think. Accessing nested "value" properties, unclear or incompatible order of nesting, and creation of derived objects in binary operations.

@lucascolley
Copy link

Accessing nested "value" properties

That does seem like a problem. Perhaps solvable if each array carries a .info (or something unique) attribute which can propagate these properties to the top level?

unclear order of nesting

That shouldn't be a problem, at least with standard-compatible wraps. x.__array_namespace__().__name__ should describe the precise ordering of nesting.

incompatible order of nesting

Perhaps the onus could be on users to not mix incompatible namespaces? The "users" here are still library authors rather than end users.

creation of derived objects in binary operations

could you give an example?

@SimonHeybrock
Copy link

creation of derived objects in binary operations

could you give an example?

As a very simple case, something we do quite often is compare two quantity-arrays to obtain a mask. That is,

mask = arr1 < arr2

should return an object that does not have a unit (or unit=None), but that has everything else (dims, coords, ...). That is, one might need to be able to remove a particular "layer" from the "stack" of array libraries.

Footnote: I don't think using unit='dimensionless' is correct for masks.

@lucascolley
Copy link

Footnote: I don't think using unit='dimensionless' is correct for masks.

I can see why it feels wrong, but practically, are there any problems?

@SimonHeybrock
Copy link

SimonHeybrock commented Jan 7, 2025

Footnote: I don't think using unit='dimensionless' is correct for masks.

I can see why it feels wrong, but practically, are there any problems?

We (Scipp) have done that in the past but eventually decided to distinguish dimensionless from unit-less. This was not just for masks but also (memory) index arrays, etc. Unfortunately we were not so good at documenting the full reasoning for the decision at the time. I think it is was mainly about being able to detector user errors better. It was also weird to have units for say, an array of strings or other Python objects, so we wanted unit-less anyway.

Edit: One example of the problems we have had, albeit a bit cryptic, reading it now: scipp/scipp#2396

@mhvk
Copy link
Contributor

mhvk commented Jan 7, 2025

Astropy and Quantity2 also remove the unit when booleans are involved.

Indeed, I've argued for astropy (and may do so here) that Quantity should only take real or complex for the value - integer quantities make essentially no sense.

@neutrinoceros
Copy link
Contributor

neutrinoceros commented Jan 7, 2025

integer quantities make essentially no sense.

Preach. It comes up regularly in unyt's bug tracker that allowing quantity arrays to have an int dtype breaks end applications in surprising ways, and I don't really have an argument for why this is supported in the first place.

@nstarman
Copy link
Member

nstarman commented Jan 7, 2025

The Array API doesn't police how dtypes are defined / used. It has a minimum set of dtypes (https://data-apis.org/array-api/latest/API_specification/data_types.html) but many libraries define more dtypes, e.g. for ML. I think it's literally impossible / highly impractical / way too much work for us to enumerate for all python array libraries what subset of dtypes are allowed. Especially for the Quantity API I don't think this makes much sense. Individual implementing libraries can choose to take on this Sisyphean task, but for the Quantity API it's probably best to say that values are e.g. the Array API and leave it at that.

@mhvk
Copy link
Contributor

mhvk commented Jan 7, 2025

Agree that integers/bools at some level is an implementation detail.

Back a bit to the main topic: I agree with @SimonHeybrock that the units API is perhaps the most important to lock in - if designed right, the Quantity API can be nearly trivial, since the units have to take care of everything complicated anyway (e.g., the unit is the right place to know what needs to happen for a call to xp.sin). Indeed, in #4 we were discussing that even unit conversion should arguably not involve a method on the Quantity, but rather something provided by the unit.

So, it seems units have to provide ways to

  1. Parse a string unit (presumably, __init__ in most cases)
  2. Convert to string (presumably, __str__)
  3. Convert to a standard system (.si? does every units package have that? Introduce __si__?)
  4. Produce a function that converts a value from one unit to another (unit.to(...) in astropy, which handles equivalencies too)
  5. Given an (Array API/numpy) function, produce a set of converters that should be applied to inputs before calling the function on values, and a resulting unit.
  6. Anything else?

In astropy, (4) and (5) are dealt with somewhat ad-hoc, in an interaction between Quantity.__array_{ufunc|function}__ and the units. Especially (5) doesn't necessarily make sense as a unit method, but it needs some way to do. Maybe in analogy with the array API, we can have a unit namespace that provides specific functions that do have predetermined names we all agree on? (In, #4, @byrdie suggested a convert function, but hoped it would allow for dispatch.) That could even cover items 1-3.

p.s. One way of thinking of units is that they are part of the .dtype - for numpy ufuncs, indeed it is now possible to do it completely like that, with array methods allowing conversion before actual execution (just like in an addition of float32 and float64, the former would be upconverted). I think this actually does not end up scaling well to other array packages, but think it is useful to think of converters in that sense.

p.s.2 Back to Quantity, I'm a bit partial to try to make a units API that defines dunder methods that are useful to Quantity (e.g., unit conversion with q << new_unit via unit.__rlshift__) rather than try to find a consistent naming scheme, or, to me worse, every Quantity package having duplicated methods that do the same thing.

@lucascolley
Copy link

I wonder what the implications would be for getting things to work downstream without support for integral/boolean dtypes. E.g. I wonder whether there is any code in SciPy which relies on the specified behaviour that asarray on a list of integers returns an array with an integral dtype: https://data-apis.org/array-api/latest/API_specification/generated/array_api.asarray.html

@nstarman
Copy link
Member

nstarman commented Jan 7, 2025

Given an (Array API/numpy) function, produce a set of converters that should be applied to inputs before calling the function on values, and a resulting unit.

Related to this, I've long wanted something akin to result_type but that works on units, not dtypes.
Two patterns come to mind.

result_unit(Literal["<func_name>"], *units_of_args: Unit) -> Unit: ...

where the literal might be a string name, enum, or something. So usage would be

result_unit("multiply", unit1, unit2) -> unit1 * unit2

or mirroring the Array API namespace, but for units

uns = unit.__unit_namespace__()
uns.multiply(unit1, unit2) -> unit1 * unit2

@lucascolley
Copy link

lucascolley commented Jan 7, 2025

That list for units seems good to me. Then for quantities, it is a matter of implementing at least the methods (and operators) required by the array API standard.

@andrewgsavage
Copy link
Author

andrewgsavage commented Jan 7, 2025 via email

@andrewgsavage
Copy link
Author

andrewgsavage commented Jan 7, 2025 via email

@mhvk
Copy link
Contributor

mhvk commented Jan 7, 2025

Yes, my SI is perhaps optional - I think of it more as a way to a normally guaranteed-to-work way of converting a unit from one package to another.

But just like the Array API does not decide what dtype should exist, the units API probably should not decide what types of units should exist. Indeed, in that respect, the API we are discussing here can perhaps not deal with initializing units at all, or on how they interact with each other, but just with how they interact with arrays and functions, i.e., my items 4, have a standard way to get a conversion function from one unit to another, and a standard way to define what conversions need to be done for a given function, and what the result unit is.

p.s. One further reason why perhaps we should not jump to defining how units interact, is that it is not always totally obvious. You mentioned subtracting units for temperatures - we don't do that in astropy, but we have something similar for magnitudes, i.e., logarithmic units of some other units. And we can subtract magnitude units... But I think none of that has to be part of a standard units API - the real need would seem to be conversion.

@mhvk
Copy link
Contributor

mhvk commented Jan 7, 2025

@andrewgsavage - in astropy, q << unit converts a quantity to the new unit (or attaches it to an array; we do also use multiplication to initialize, a * unit, but that makes a copy). We dithered quite a bit about this, also considering a | unit (which is used by the AMUSE project - yet another unit implementation just for astronomy!), but liked that << is not symmetric. I'm still a bit partial to q >> unit or q // unit as equivalent to q.to_new_unit(unit).value...

@andrewgsavage
Copy link
Author

even when SI is defined, libraries may make different decisions as to whether angle or information etc are defined as base units. pint doesn't assign them as base units so would cancel them out when converting to base units.

I presume you don't do unit-unit because astropy doesn't have a delta_degC like pint. A subtract method is needed to work out the return unit for some operations, eg degC-degC = delta_degC

result_unit("subtract", unit1, unit2) -> unit1 - unit2

Could astropy add a subtract method that would return degC for degC-degC?
Otherwise I can't see how to get to delta_degC for pint, without using a Quantity

q << unit converts a quantity to the new unit (or attaches it to an array; we do also use multiplication to initialize, a * unit, but that makes a copy).

I am a little lost as to the benefit of <<. In both those cases you can use u.Quantity(q, unit) and it does the same thing (maybe with copy=False), but for a new user seeing Quantity() is much clearer.
With <<, I must find how what the << operator is (many users won't have used it before), realise astropy's made a custom implementation for it and find the docs for it. Is this worth it to save a few characters? I'm unsure.

@mhvk
Copy link
Contributor

mhvk commented Jan 7, 2025

Indeed, astropy has degrees C and F, but very partial support -- we'd need to write a new unit type that allows for offsets. As I mentioned, our logarithmic units do have __sub__. From the docs,

dBm = u.dB(u.mW)
signal_in, signal_out = 100. * dBm, 50 * dBm
cable_loss = (signal_in - signal_out) / (100. * u.m)
signal_in, signal_out, cable_loss  
# (<Decibel 100. dB(mW)>, <Decibel 50. dB(mW)>, <Quantity 0.5 dB / m>)

better_cable_loss = 0.2 * u.dB / u.m
signal_in - better_cable_loss * 100. * u.m  
# <Decibel 80. dB(mW)>

(Of course, we really need those for magnitudes...)

Anyway, I think the bottom line is that we perhaps most of all need to define how units interact with values/arrays, converting to a unit within the same unit package, and being able to define what conversion is needed for given functions.

Also, agree that it is important to define dimensionless.

@SimonHeybrock
Copy link

SimonHeybrock commented Jan 8, 2025

integer quantities make essentially no sense.

Preach. It comes up regularly in unyt's bug tracker that allowing quantity arrays to have an int dtype breaks end applications in surprising ways, and I don't really have an argument for why this is supported in the first place.

There are good reasons for supporting int actually. For example, datetimes are internally typically integers, e.g., nanoseconds since epoch. Then, subtracting two datetimes results in a time-difference, naturally yielding an int dtype with a time unit.

And aside from this, I agree that things like this should not be restricted. When I wrote above that we use bool without unit, this was not meant to say that it is not or should not be supported. In fact I have made the experience that typically such restrictions are a bad idea. Restrictions complicate implementations, maintenance, documentation, and testing, and sooner or later get in the way in some edge case.

@phlptp
Copy link

phlptp commented Jan 8, 2025

Interesting discussion. I am quite new to the world of units, Quantities(Measurements) in Python, having worked mostly in C++ to do a lot of the same things as you are talking about here in llnl/units. This library now includes a python wrapper, which was needed to support interoperability with other code using the C++ version. If a "standard" is agreed upon here I would be happy to align to it to the extent possible.

A couple things came up reading through the discussion. I am not clear what the definition of + or - would be on a pure Unit object. I don't support that operation in our library, vs multiplication and division which are generalizable to any unit producing another different unit. Whereas I don't know what meter - second might means.

The other topic which I haven't seen is error handling, what to do when a string doesn't convert to a valid unit, or a conversion is invalid, or math operation doesn't make sense. My suspicion is that is handled differently in the different libraries, and if one purpose of this is to enable a higher level operations (like arrays) then there also needs to be consistent ways of marking errors so that library can handle those kinds of conditions in a consistent way.

@andrewgsavage
Copy link
Author

andrewgsavage commented Jan 8, 2025 via email

@lucascolley
Copy link

@ksunden I think you may be interested in the discussion here, based on https://github.com/matplotlib/data-prototype/blob/main/examples/units.py

@nstarman
Copy link
Member

nstarman commented Jan 8, 2025

Perhaps we should make a new org, e.g.. quantity-dev with 2 repos — quantity-api and units-api — and a org-level Discussion board. Then we can start hashing out a design document and start making typing.Protocol objects with the agreed-upon APIs.

@lucascolley
Copy link

Perhaps we should make a new org, e.g.. quantity-dev with 2 repos — quantity-api and units-api — and a org-level Discussion board. Then we can start hashing out a design document and start making typing.Protocol objects with the agreed-upon APIs.

Sounds good to me. I've created the org and invited @nstarman @andrewgsavage @mhvk @SimonHeybrock. If anybody else is interested in working on these libraries, let me know. I've also transferred my work on the array API standard interface to https://github.com/quantity-dev/quantity-array.

@mhvk
Copy link
Contributor

mhvk commented Jan 8, 2025

Thanks for doing the new org! Most useful apart from discussion will be to have a test suite that can verify conformity. I'm not sure we necessarily want to host different implementations in it, but having forks may be useful for internal testing - anyway, we can see how it goes.

@phlptp
Copy link

phlptp commented Jan 8, 2025

I would be interested and happy to help test it out.

@lucascolley
Copy link

discussion board at https://github.com/orgs/quantity-dev/discussions. I've added you as an owner @nstarman.

@neutrinoceros
Copy link
Contributor

If anybody else is interested in working on these libraries, let me know.

🤚🏻

@jules-ch
Copy link

interested !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants