-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantity API #23
Comments
Slightly off-topic, but I was wondering about a potential name for a standardised project. How about |
Or QuantiPy!
Yes I agree it should live outside astropy to signal it is common across
libraries. Perhaps it would be a fit for https://data-apis.org/ ?
…On Sun, 5 Jan 2025, 00:13 Lucas Colley, ***@***.***> wrote:
Slightly off-topic, but I was wondering about a potential name for a
standardised project. How about pyquantity (perhaps stylised as
PyQuantity)? It seems like the standardised quantity/unit API and support
for functions in the array API standard could belong outside of the astropy
org.
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEMLEEAMFXDULH4MCA4MV32JB2LBAVCNFSM6AAAAABUSSR52OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZRGQ2DOOBYGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Haha I did check that, but there is https://pypi.org/project/QuantiPy/
I would be open to that, although the "standard" set by the work on the array API standard is quite high, and may be a little overkill for the task at hand. It might make more sense to get some rough prototypes together first and see how collaboration goes. Then if it looks like there are difficult decisions to make, we can look to following the data-driven approach and https://github.com/data-apis/governance/blob/main/process_document.md. |
I feel that a Units API should be favored over a Quantity API. Or at the very least, both should exist. The motivation for this is the (downsides of) nesting. People also want named dimensions (Xarray, NamedArray, Scipp, PyDims, ...), masks, and more (vectors, bin-edge arrays, ...). If we define, say, Quantity.__init__(value, units) then each of the extra components such as masks or dimension names may need to be constructed and accessed in a nested manner. We'd get something like arr = MaskedArray(NamedArray(('x',), mask), NamedArray(('x',), Quantity(np.arange(5), 'meter')))
# with value accessed as
arr.value.value.value which is obviously a problem. One could try to add special accessors at different level, but I think this will quickly get out of hand. I have experimented with something like this a while ago (see https://github.com/scipp/scippx and https://discuss.scientific-python.org/t/multiple-duck-array-layers-composability-implementations-and-usability-concerns/552) and my conclusion was that it probably is not a workable solution. Thus, the approach I favor is what I have outlined in At least this is how for my thought process went. It does leave important questions open though. For example, one would like to use SciPy and friends with units but it is not clear to me how to achieve this. |
I agree about the need for both a Quantity and Units API and also that they should probably be separate, but developed in dialogue with one another. In the various versions I've coded up (https://github.com/nstarman/units, https://github.com/nstarman/astropy-APEs/blob/units-quantity-2.0/APE25/report.pdf) I proposed a (pseudocode with illustrative names and attributes) class QuantityAPI(Protocol[Value]):
value: Value
unit: ...
def to_unit(self, unit): ...
class ArrayAPI(Protocol): ...
class QuantityArrayAPI(ArrayAPI , QuantityAPI[ArrayAPIType], Protocol): ... So long as we all agree on a common Quantity API, and likewise a common Units API, then it's totally fine to have a proliferation of implementing libraries with different focuses, e.g. non-array values like python scalars or lists, or specific Q subclasses like longitudes, or masked values, or named fields. If there's eventually a Mask API then a masked Q class would be the intersection type, like |
@SimonHeybrock at least while we are in array API compatible territory, I think there is a solution, but I haven't tried it yet. Idea:
I think this pattern should work as long as you have array API compatible layers throughout, except maybe for the top layer, which only needs to consume standard arrays and provide the from pint_array import pint_namespace, UnitRegistry
from marray import marray_namespace
# XXX: imagine that `dask_namespace` exists
from dask.array import dask_namespace
import cupy as cp
ureg = UnitRegistry()
dacp = dask_namespace(cupy)
pmdacp = pint_namespace(marray_namespace(dacp))
x = pmdacp.arange(4, chunks=2, mask=dacp.asarray([False, True, False, True]), units=ureg.pint) |
@lucascolley That solves the manual initialization, but the problem goes deeper, I think. Accessing nested "value" properties, unclear or incompatible order of nesting, and creation of derived objects in binary operations. |
That does seem like a problem. Perhaps solvable if each array carries a
That shouldn't be a problem, at least with standard-compatible wraps.
Perhaps the onus could be on users to not mix incompatible namespaces? The "users" here are still library authors rather than end users.
could you give an example? |
As a very simple case, something we do quite often is compare two quantity-arrays to obtain a mask. That is, mask = arr1 < arr2 should return an object that does not have a unit (or Footnote: I don't think using |
I can see why it feels wrong, but practically, are there any problems? |
We (Scipp) have done that in the past but eventually decided to distinguish dimensionless from unit-less. This was not just for masks but also (memory) index arrays, etc. Unfortunately we were not so good at documenting the full reasoning for the decision at the time. I think it is was mainly about being able to detector user errors better. It was also weird to have units for say, an array of strings or other Python objects, so we wanted unit-less anyway. Edit: One example of the problems we have had, albeit a bit cryptic, reading it now: scipp/scipp#2396 |
Astropy and Quantity2 also remove the unit when booleans are involved. Indeed, I've argued for astropy (and may do so here) that |
Preach. It comes up regularly in |
The Array API doesn't police how dtypes are defined / used. It has a minimum set of dtypes (https://data-apis.org/array-api/latest/API_specification/data_types.html) but many libraries define more dtypes, e.g. for ML. I think it's literally impossible / highly impractical / way too much work for us to enumerate for all python array libraries what subset of dtypes are allowed. Especially for the Quantity API I don't think this makes much sense. Individual implementing libraries can choose to take on this Sisyphean task, but for the Quantity API it's probably best to say that values are e.g. the Array API and leave it at that. |
Agree that integers/bools at some level is an implementation detail. Back a bit to the main topic: I agree with @SimonHeybrock that the units API is perhaps the most important to lock in - if designed right, the So, it seems units have to provide ways to
In astropy, (4) and (5) are dealt with somewhat ad-hoc, in an interaction between p.s. One way of thinking of units is that they are part of the p.s.2 Back to |
I wonder what the implications would be for getting things to work downstream without support for integral/boolean dtypes. E.g. I wonder whether there is any code in SciPy which relies on the specified behaviour that |
Related to this, I've long wanted something akin to result_unit(Literal["<func_name>"], *units_of_args: Unit) -> Unit: ... where the literal might be a string name, enum, or something. So usage would be result_unit("multiply", unit1, unit2) -> unit1 * unit2 or mirroring the Array API namespace, but for units uns = unit.__unit_namespace__()
uns.multiply(unit1, unit2) -> unit1 * unit2 |
That list for units seems good to me. Then for quantities, it is a matter of implementing at least the methods (and operators) required by the array API standard. |
1. Parse a string unit (presumably, __init__ in most cases)
2. Convert to string (presumably, __str__)
3. Convert to a standard system (.si? does every units package have
that? Introduce __si__?)
4. Produce a function that converts a value from one unit to another (
unit.to(...) in astropy, which handles equivalencies too)
5. Given an (Array API/numpy) function, produce a set of converters that
should be applied to inputs before calling the function on values, and a
resulting unit.
6. Anything else
3. pint builds its unit registry from a unit file. A user could create a
registry without si if they wanted to.
4. pint's Unit.to() converts the Unit into a Quantity then performs .to()
returning a Quantity. Might need a different name for this
5. Lucas has effectively written that, but would need to swap
Quantity.m_as() to Unit.to() (and pint would need changing to return a
conversion function)
6. You'd also need multiplication, division, power and addition/subtraction
of the units
7. pint provides errors, are these errors provided in other libraries?
https://github.com/hgrecco/pint/blob/master/pint/errors.py
8. Creating a dimensionless unit
9. equality checks
Using Unit to provide conversion functions feels odd initially but I can
see how it would work well!
…On Tue, Jan 7, 2025 at 3:28 PM Nathaniel Starkman ***@***.***> wrote:
Given an (Array API/numpy) function, produce a set of converters that
should be applied to inputs before calling the function on values, and a
resulting unit.
Related to this, I've long wanted something akin to result_type but that
works on units, not dtypes.
Two patterns come to mind.
result_unit(Literal["<func_name>"], *units_of_args: Unit) -> Unit: ...
where the literal might be a string name, enum, or something. So usage
would be
result_unit("multiply", unit1, unit2) -> unit1 * unit2
or mirroring the Array API namespace, but for units
uns = unit.__unit_namespace__()uns.multiply(unit1, unit2) -> unit1 * unit2
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEMLEBK4X7A4TDJAG5AAI32JPXDHAVCNFSM6AAAAABUSSR52OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZVGU3TQMBUGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
pint doesn't use <<, instead I would use value * Unit(), Quantity(value,
unit) or Quantity.to(unit). Quantity is typically shortened to Q_ or Q as
it's used often. Q_(value, unit) feels short enough that nobody has
suggested using << !
Using value*Unit() does lead to bugs later when temperatures are used so <<
could be a welcome addition to pint.
On Tue, Jan 7, 2025 at 4:06 PM Andrew Savage ***@***.***>
wrote:
…
1. Parse a string unit (presumably, __init__ in most cases)
2. Convert to string (presumably, __str__)
3. Convert to a standard system (.si? does every units package have
that? Introduce __si__?)
4. Produce a function that converts a value from one unit to another (
unit.to(...) in astropy, which handles equivalencies too)
5. Given an (Array API/numpy) function, produce a set of converters
that should be applied to inputs before calling the function on values, and
a resulting unit.
6. Anything else
3. pint builds its unit registry from a unit file. A user could create a
registry without si if they wanted to.
4. pint's Unit.to() converts the Unit into a Quantity then performs .to()
returning a Quantity. Might need a different name for this
5. Lucas has effectively written that, but would need to swap
Quantity.m_as() to Unit.to() (and pint would need changing to return a
conversion function)
6. You'd also need multiplication, division, power and
addition/subtraction of the units
7. pint provides errors, are these errors provided in other libraries?
https://github.com/hgrecco/pint/blob/master/pint/errors.py
8. Creating a dimensionless unit
9. equality checks
Using Unit to provide conversion functions feels odd initially but I can
see how it would work well!
On Tue, Jan 7, 2025 at 3:28 PM Nathaniel Starkman <
***@***.***> wrote:
> Given an (Array API/numpy) function, produce a set of converters that
> should be applied to inputs before calling the function on values, and a
> resulting unit.
>
> Related to this, I've long wanted something akin to result_type but that
> works on units, not dtypes.
> Two patterns come to mind.
>
> result_unit(Literal["<func_name>"], *units_of_args: Unit) -> Unit: ...
>
> where the literal might be a string name, enum, or something. So usage
> would be
>
> result_unit("multiply", unit1, unit2) -> unit1 * unit2
>
> or mirroring the Array API namespace, but for units
>
> uns = unit.__unit_namespace__()uns.multiply(unit1, unit2) -> unit1 * unit2
>
> —
> Reply to this email directly, view it on GitHub
> <#23 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADEMLEBK4X7A4TDJAG5AAI32JPXDHAVCNFSM6AAAAABUSSR52OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZVGU3TQMBUGI>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Yes, my SI is perhaps optional - I think of it more as a way to a normally guaranteed-to-work way of converting a unit from one package to another. But just like the Array API does not decide what p.s. One further reason why perhaps we should not jump to defining how units interact, is that it is not always totally obvious. You mentioned subtracting units for temperatures - we don't do that in astropy, but we have something similar for magnitudes, i.e., logarithmic units of some other units. And we can subtract magnitude units... But I think none of that has to be part of a standard units API - the real need would seem to be conversion. |
@andrewgsavage - in astropy, |
even when SI is defined, libraries may make different decisions as to whether angle or information etc are defined as base units. pint doesn't assign them as base units so would cancel them out when converting to base units. I presume you don't do unit-unit because astropy doesn't have a delta_degC like pint. A subtract method is needed to work out the return unit for some operations, eg degC-degC = delta_degC result_unit("subtract", unit1, unit2) -> unit1 - unit2 Could astropy add a subtract method that would return degC for degC-degC?
I am a little lost as to the benefit of <<. In both those cases you can use u.Quantity(q, unit) and it does the same thing (maybe with copy=False), but for a new user seeing Quantity() is much clearer. |
Indeed, astropy has degrees C and F, but very partial support -- we'd need to write a new unit type that allows for offsets. As I mentioned, our logarithmic units do have
(Of course, we really need those for magnitudes...) Anyway, I think the bottom line is that we perhaps most of all need to define how units interact with values/arrays, converting to a unit within the same unit package, and being able to define what conversion is needed for given functions. Also, agree that it is important to define dimensionless. |
There are good reasons for supporting And aside from this, I agree that things like this should not be restricted. When I wrote above that we use |
Interesting discussion. I am quite new to the world of units, Quantities(Measurements) in Python, having worked mostly in C++ to do a lot of the same things as you are talking about here in llnl/units. This library now includes a python wrapper, which was needed to support interoperability with other code using the C++ version. If a "standard" is agreed upon here I would be happy to align to it to the extent possible. A couple things came up reading through the discussion. I am not clear what the definition of + or - would be on a pure Unit object. I don't support that operation in our library, vs multiplication and division which are generalizable to any unit producing another different unit. Whereas I don't know what meter - second might means. The other topic which I haven't seen is error handling, what to do when a string doesn't convert to a valid unit, or a conversion is invalid, or math operation doesn't make sense. My suspicion is that is handled differently in the different libraries, and if one purpose of this is to enable a higher level operations (like arrays) then there also needs to be consistent ways of marking errors so that library can handle those kinds of conditions in a consistent way. |
Perhaps an alternative to subtraction is to provide a to_delta_unit()
method that would return delta units for offset units (or error if delta
units are not implemented, preventing degC-degC=degC issues), or self if
it's not an offset unit.
+1 on errors
…On Wed, Jan 8, 2025 at 2:37 PM Philip Top ***@***.***> wrote:
Interesting discussion. I am quite new to the world of units,
Quantities(Measurements) in Python, having worked mostly in C++ to do a lot
of the same things as you are talking about here in llnl/units
<https://github.com/LLNL/units>. This library now includes a python
wrapper, which was needed to support interoperability with other code using
the C++ version. If a "standard" is agreed upon here I would be happy to
align to it to the extent possible.
A couple things came up reading through the discussion. I am not clear
what the definition of + or - would be on a pure Unit object. I don't
support that operation in our library, vs multiplication and division which
are generalizable to any unit producing another different unit. Whereas I
don't know what meter - second might means.
The other topic which I haven't seen is error handling, what to do when a
string doesn't convert to a valid unit, or a conversion is invalid, or math
operation doesn't make sense. My suspicion is that is handled differently
in the different libraries, and if one purpose of this is to enable a
higher level operations (like arrays) then there also needs to be
consistent ways of marking errors so that library can handle those kinds of
conditions in a consistent way.
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEMLEFK2WB3JOD2SMTDJDD2JUZ35AVCNFSM6AAAAABUSSR52OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXHAZTANJRG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@ksunden I think you may be interested in the discussion here, based on https://github.com/matplotlib/data-prototype/blob/main/examples/units.py |
Perhaps we should make a new org, e.g.. |
Sounds good to me. I've created the org and invited @nstarman @andrewgsavage @mhvk @SimonHeybrock. If anybody else is interested in working on these libraries, let me know. I've also transferred my work on the array API standard interface to https://github.com/quantity-dev/quantity-array. |
Thanks for doing the new org! Most useful apart from discussion will be to have a test suite that can verify conformity. I'm not sure we necessarily want to host different implementations in it, but having forks may be useful for internal testing - anyway, we can see how it goes. |
I would be interested and happy to help test it out. |
discussion board at https://github.com/orgs/quantity-dev/discussions. I've added you as an owner @nstarman. |
🤚🏻 |
interested ! |
There has been some interest in a Quantity or Units API, similar to the array API standard.
hgrecco/pint#2101
astropy/astropy#8210
astropy/astropy#13460
astropy/astropy-APEs#91
https://pydims.github.io/pydims/developer/index.html
A Quantity API that standardises methods and attributes would make it easier to write code that supports multiple unit libraries. At present this is difficult and has lead to multiple integration packages being written to support units, eg pint-xarray and xarray-quantity.
Although the implementations of the unit libraries differ, they share the same core concepts of a Quantity and Unit. I echo the suggestion to create a Protocol, similiar to https://github.com/nstarman/units/tree/main/src/units/api However I think the first version of a Quantity API should have a smaller scope to make it easier to agree on and adopt.
At time of writing, support for the array api is in development. An initial implementation suggests standardising the following methods would allow an implementation to be used across multiple libraries.
Unit.__sub__
is used to get a delta unit, eg for temperature.Quantity.__sub__
could be used instead.That's not many functions at all!
pint-pandas looks like it'd also need
Unit(unit_string)
and I've usedQuantity.to(Unit).magnitude
instead of.m_as
. It also uses pint's errors and formatting but that's quite complex.Equally I think this could be easy for the unit libraries to implement; for example pint would add
Quantity.value
that returnsQuantity.magnitude
, astropy would addQuantity.units
that returnsQuantity.unit
and so on (attribute names tbd!), there's no need to depreciate the current attributes. It would be good to change documentation to use the standardised methods to encourage their use.I'm not against standardising other classes or methods like Systems or
<<
, I'd just rather do so in a future version.The text was updated successfully, but these errors were encountered: