Separate the "value" from a Quantity #223

maxhutch · 2019-04-03T13:44:06Z

Consistent with my suggestion for using composition rather than inheritance for Symbol, I think it is a cleaner way of dealing with Quantities: have a Quantity store provenance, symbol type, tags, and any other meta-data that is added in the future while leaving the shape and numerical information in a member. If you decide to support distribution-valued quantities in the future, this will help keep the distribution class hierarchy separate, or let you adopt one from another package. This is also what we do, fwiw.

Its not entirely clear to me where units belong in this picture. Right now we treat them as meta-data (in the Quantity object, not the value object), and treat non-numeric data as dimensionless (units=""). This simplifies the math a little: we can convert everything into a standard unit system, drop the units, do a bunch of math, and then convert them back to whatever the user wants to see. We don't have access to Pint on our backend, though; if we had, we might have done things differently.

The text was updated successfully, but these errors were encountered:

mkhorton · 2019-04-03T18:41:00Z

This seems reasonable to me -- thoughts everyone?

I think in this framework, units probably correctly belong with the value (thanks to pint), but are also metadata with the symbol, so there is unfortunately some duplication there.

Can I ask you to clarify what you mean by "distribution-valued quantities"? If you mean uncertainties, that comes from existing integration with https://pythonhosted.org/uncertainties/ which works "for free", whereas if you mean actual data tables, that's a bit trickier.

maxhutch · 2019-04-03T18:51:57Z

I did mean "uncertainties". That package makes the linear approximation, which fails pretty badly in my experience. If you're integrating with machine learning, the uncertainties will not be "small", and many physical models are highly non-linear (e.g. Arrhenius). mcerp and soerp might work well enough; I don't have experience with them or their approximation methods.

mkhorton · 2019-04-03T18:58:57Z

Hmm, I think I was thinking something like an Arrhenius model wouldn't output a single number ("distribution") but rather a range of discrete values (eg for a range of temps). Outputting an actual distribution does seem a lot more powerful, but what would be the functional form of this? (A Python lambda?) How would you serialize such a distribution?

mcerp looks interesting -- I'm also not familiar, so would have to read up. We'd have to figure out how to integrate that though even if we did want to use it.

maxhutch · 2019-04-03T19:22:02Z

Consider the case where the activation energy is 1.0 +/- 0.05. Given an exactly known temperature, the reaction rate would be uncertain because the activation energy is uncertain. Because the uncertainty is in the argument of an exponential, the linear theory will be junky and the reaction rate won't be normally distributed. There are lots of different ways to deal with this; I don't think you need to solve that problem right now. I was just trying to highlight it as something you might run into and, when you do, separation of the "value" from the "quantity" might be helpful.

mkhorton · 2019-04-03T19:24:41Z

Ah, I understand what you mean now! Thanks for the clarification. Yes, I see, a linear uncertainty would definitely be inappropriate.

clegaspi · 2019-04-03T22:10:34Z

Thank you for your feedback, Max! I agree that this seems like a reasonable change, and that unit should be tacked on with the value rather than as part of the Quantity.

I'm envisioning Quantity as a container holding:

a symbol object, which contains:
- a descriptor object holding information about the required data type, shape, unit dimensionality
- an iterable of constraint objects which apply to the symbol (perhaps this could just be included in the descriptor object)
a value object, which contains:
- a representation of uncertainty (numerical, ordinal)
- a representation of units
- a descriptor object which must match the symbol descriptor
a provenance object
a representation of conditions (object, iterable of objects?) like temp, pressure, etc. (future work)
other metadata

The new quantity object would be responsible for ensuring that a value object has a matching descriptor as the symbol object and that it meets the constraints in the symbol, not entirely unlike its current functionality.

Am I picking up what you're putting down?

maxhutch · 2019-04-04T03:25:30Z

Basically. What's the utility of having a descriptor in the value object beyond what you already get from the quantity containing a symbol which itself contains a descriptor? I'm with you on everything else.

clegaspi mentioned this issue Apr 3, 2019

Symbol overhaul: Add support for enumeration-type symbols #211

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate the "value" from a Quantity #223

Separate the "value" from a Quantity #223

maxhutch commented Apr 3, 2019

mkhorton commented Apr 3, 2019

maxhutch commented Apr 3, 2019

mkhorton commented Apr 3, 2019

maxhutch commented Apr 3, 2019

mkhorton commented Apr 3, 2019

clegaspi commented Apr 3, 2019 •

edited

Loading

maxhutch commented Apr 4, 2019

Separate the "value" from a Quantity #223

Separate the "value" from a Quantity #223

Comments

maxhutch commented Apr 3, 2019

mkhorton commented Apr 3, 2019

maxhutch commented Apr 3, 2019

mkhorton commented Apr 3, 2019

maxhutch commented Apr 3, 2019

mkhorton commented Apr 3, 2019

clegaspi commented Apr 3, 2019 • edited Loading

maxhutch commented Apr 4, 2019

clegaspi commented Apr 3, 2019 •

edited

Loading