Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate the "value" from a Quantity #223

Open
maxhutch opened this issue Apr 3, 2019 · 7 comments
Open

Separate the "value" from a Quantity #223

maxhutch opened this issue Apr 3, 2019 · 7 comments

Comments

@maxhutch
Copy link

maxhutch commented Apr 3, 2019

Consistent with my suggestion for using composition rather than inheritance for Symbol, I think it is a cleaner way of dealing with Quantities: have a Quantity store provenance, symbol type, tags, and any other meta-data that is added in the future while leaving the shape and numerical information in a member. If you decide to support distribution-valued quantities in the future, this will help keep the distribution class hierarchy separate, or let you adopt one from another package. This is also what we do, fwiw.

Its not entirely clear to me where units belong in this picture. Right now we treat them as meta-data (in the Quantity object, not the value object), and treat non-numeric data as dimensionless (units=""). This simplifies the math a little: we can convert everything into a standard unit system, drop the units, do a bunch of math, and then convert them back to whatever the user wants to see. We don't have access to Pint on our backend, though; if we had, we might have done things differently.

@mkhorton
Copy link
Collaborator

mkhorton commented Apr 3, 2019

This seems reasonable to me -- thoughts everyone?

I think in this framework, units probably correctly belong with the value (thanks to pint), but are also metadata with the symbol, so there is unfortunately some duplication there.

Can I ask you to clarify what you mean by "distribution-valued quantities"? If you mean uncertainties, that comes from existing integration with https://pythonhosted.org/uncertainties/ which works "for free", whereas if you mean actual data tables, that's a bit trickier.

@maxhutch
Copy link
Author

maxhutch commented Apr 3, 2019

I did mean "uncertainties". That package makes the linear approximation, which fails pretty badly in my experience. If you're integrating with machine learning, the uncertainties will not be "small", and many physical models are highly non-linear (e.g. Arrhenius). mcerp and soerp might work well enough; I don't have experience with them or their approximation methods.

@mkhorton
Copy link
Collaborator

mkhorton commented Apr 3, 2019

Hmm, I think I was thinking something like an Arrhenius model wouldn't output a single number ("distribution") but rather a range of discrete values (eg for a range of temps). Outputting an actual distribution does seem a lot more powerful, but what would be the functional form of this? (A Python lambda?) How would you serialize such a distribution?

mcerp looks interesting -- I'm also not familiar, so would have to read up. We'd have to figure out how to integrate that though even if we did want to use it.

@maxhutch
Copy link
Author

maxhutch commented Apr 3, 2019

Consider the case where the activation energy is 1.0 +/- 0.05. Given an exactly known temperature, the reaction rate would be uncertain because the activation energy is uncertain. Because the uncertainty is in the argument of an exponential, the linear theory will be junky and the reaction rate won't be normally distributed. There are lots of different ways to deal with this; I don't think you need to solve that problem right now. I was just trying to highlight it as something you might run into and, when you do, separation of the "value" from the "quantity" might be helpful.

@mkhorton
Copy link
Collaborator

mkhorton commented Apr 3, 2019

Ah, I understand what you mean now! Thanks for the clarification. Yes, I see, a linear uncertainty would definitely be inappropriate.

@clegaspi
Copy link
Contributor

clegaspi commented Apr 3, 2019

Thank you for your feedback, Max! I agree that this seems like a reasonable change, and that unit should be tacked on with the value rather than as part of the Quantity.

I'm envisioning Quantity as a container holding:

  • a symbol object, which contains:
    • a descriptor object holding information about the required data type, shape, unit dimensionality
    • an iterable of constraint objects which apply to the symbol (perhaps this could just be included in the descriptor object)
  • a value object, which contains:
    • a representation of uncertainty (numerical, ordinal)
    • a representation of units
    • a descriptor object which must match the symbol descriptor
  • a provenance object
  • a representation of conditions (object, iterable of objects?) like temp, pressure, etc. (future work)
  • other metadata

The new quantity object would be responsible for ensuring that a value object has a matching descriptor as the symbol object and that it meets the constraints in the symbol, not entirely unlike its current functionality.

Am I picking up what you're putting down?

@maxhutch
Copy link
Author

maxhutch commented Apr 4, 2019

Basically. What's the utility of having a descriptor in the value object beyond what you already get from the quantity containing a symbol which itself contains a descriptor? I'm with you on everything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants