Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of precision can cause incorrect conversions #380

Open
JBorrow opened this issue Feb 27, 2023 · 9 comments
Open

Lack of precision can cause incorrect conversions #380

JBorrow opened this issue Feb 27, 2023 · 9 comments

Comments

@JBorrow
Copy link
Contributor

JBorrow commented Feb 27, 2023

  • unyt version: 2.9.5 (tested 2.9.0 too)
  • Python version: 3.9.7
  • Operating System: MacOS

Description

When working with data in float32 around 1e10 (far from precision limit), unyt incorrectly converts units:

What I Did


import unyt
import numpy as np

arr = unyt.unyt_array(
    (10 ** (np.random.rand(128) * 10 - 5)).astype(np.float32), units="73833316470723*Msun/5000"
)

print("Base")
print("%g" % arr[arr > 1e13 * unyt.Msun].min().to("Msun"))
print("Float64")
print("%g" % arr[arr.astype(np.float64) > 1e13 * unyt.Msun].min().to("Msun"))
print("Value")
print("%g" % arr[arr.to("Msun").v > 1e13].min().to("Msun"))
print("Post-conversion")
arr.convert_to_units("Msun")
print("%g" % arr[arr > 1e13 * unyt.Msun].min().to("Msun"))

Output

Base
261694
Float64
1.0247e+13
Value
1.0247e+13
Post-conversion
/Users/jborrow/.pyenv/versions/3.9.7/lib/python3.9/site-packages/unyt/array.py:1902: RuntimeWarning: overflow encountered in multiply
  inp1 = np.asarray(inp1, dtype=new_dtype) * conv
1.82121e+08

These units might seem weird, but they are pretty much what just comes out of typical h and a-dependent 1e10 Msun units... As the 'value' version shows, this should work!

@JBorrow
Copy link
Contributor Author

JBorrow commented Feb 27, 2023

The thing here that worries me is that 1e13 msun in those units is ~677, far, far, far from the precision limit in the array!

@ngoldbaum
Copy link
Member

ngoldbaum commented Feb 27, 2023

Maybe you want to use a UnitSystem?

In [27]: us = unyt.unit_systems.UnitSystem(name="my_unit_system", mass_unit=73833316470723*unyt.Msun/5000, length_unit='m', time_unit='s')

In [28]: m = (1e13*unyt.Msun).to(us['mass'])

In [29]: (arr[arr > m].min()).to('Msun')
Out[29]: unyt_quantity(1.3776737e+13, dtype=float32, units='Msun')

unyt.Msun is stored in MKS units, if you need physical constants in a different unit system, you might find unyt.unit_systems.add_constants and unyt.unit_systems.add_symbols to be useful for creating containers for unit objects that have a custom unit system.

@JBorrow
Copy link
Contributor Author

JBorrow commented Feb 27, 2023

I see that this works, but I am not sure why?

There is enough precision 'in the system' here to do all unit conversions... Why do I get an array full of 1s when I do the comparison?

@ngoldbaum
Copy link
Member

The issue is that unyt.Msun is in kg, so unyt tries to convert arr to kg to match, and that causes an overflow.

Why do I get an array full of 1s when I do the comparison?

Not sure what this is referring to.

@JBorrow
Copy link
Contributor Author

JBorrow commented Feb 27, 2023

Ah so everything is converted to kg, and then compared? This is not ideal, you're right.

@MatthieuSchaller
Copy link

Would it make sense to convert the LHS and RHS to the largest common denominator of the same unit (here mass) to do the comparison?

@ngoldbaum
Copy link
Member

Maybe? There would likely be a performance cost to do that check. Currently no effort is made to avoid overflows when doing these sorts of conversions.

@JBorrow
Copy link
Contributor Author

JBorrow commented Feb 27, 2023

Surely it would be cheaper? As you only have to convert one of the items rather than both?

Maybe? There would likely be a performance cost to do that check. Currently no effort is made to avoid overflows when doing these sorts of conversions.

@christopherlovell
Copy link

christopherlovell commented Sep 6, 2023

This has also become an issue for us on another project using galactic scale (> 1e10 Msun) masses. The solution was to use the galactic base class directly on the units themselves, rather than converting the composite

import numpy as np
import unyt as u

x = np.array([1e14]).astype(np.float32)
u.unyt_array(x * u.Msun)

> [inf] kg

u.unyt_array(x, units=u.Msun.in_base('galactic'))

> [1.e+14] 1.0*Msun

For Josh's example above, you need to consistently use this whenever the units are referenced

import unyt
import numpy as np

arr = unyt.unyt_array(
    (10 ** (np.random.rand(128) * 10 - 5)).astype(np.float32),
    units=73833316470723*u.Msun.in_base('galactic')/5000
)

print("Base")
print("%g" % arr[arr > 1e13 * unyt.Msun.in_base('galactic')].min().to(u.Msun.in_base('galactic')))
print("Float64")
print("%g" % arr[arr.astype(np.float64) > 1e13 * unyt.Msun].min().to(u.Msun.in_base('galactic')))
print("Value")
print("%g" % arr[arr.to("Msun").v > 1e13].min().to(u.Msun.in_base('galactic')))
print("Post-conversion")
arr.convert_to_units(u.Msun.in_base('galactic'))
print("%g" % arr[arr > 1e13 * unyt.Msun.in_base('galactic')].min().to("Msun"))

Output:

Base
1.46087e+13
Float64
1.46087e+13
Value
1.46087e+13
Post-conversion
1.46087e+13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants