Support for NA_units_ #163

billdenney · 2018-07-25T02:49:07Z

I just had a bug in my code, but it was hard to track down due to an unclear error.

What I just did was, in essence, the following:

> set_units(NA_real_, NA_character_, mode="standard")
Error: is.language(x) is not TRUE

I thought that I was getting the error because I wasn't using mode="standard", but it really was due to the NA_character_.

I have two thoughts:

If both the x and value options to set_units are NA, could NA be returned? (That could easily be a bad idea for several reasons, and I don't have a strong opinion if the answer is a resounding no.)
Could the Error: is.language(x) is not TRUE error be clarified? I thought I was making it clear that I wasn't sending in a language object when I gave NA_character_, so the error didn't make sense to me. (Admittedly, I wasn't intentionally sending in NA_character_.)

A simple fix for option 2 above could possibly be changing the stopifnot(is.character(x), length(x) == 1) at the top of as_units.character to stopifnot(is.character(x), length(x) == 1, is.na(x)).

The text was updated successfully, but these errors were encountered:

billdenney · 2018-07-25T02:53:45Z

As I was fuzzing the interface after the above error, I found that this error could also be clarified:

set_units(NA_real_, NA, mode="standard")
Error in if (any(unclass(value) != 1)) warning(paste("numeric value",  : 
  missing value where TRUE/FALSE needed

Enchufa2 · 2018-07-25T07:59:12Z

IMO, a missing value should be valid; a missing unit... has no meaning. So I would like to get an error, but definitively a more informative one. NAs and NULLs are tricky when it comes to if statements. Thanks for spotting it!

edzer · 2018-07-28T14:33:51Z

(sorry, committed this in a branch; will soon be merged)

billdenney · 2018-08-01T13:33:55Z

As I'm working with more real data, I'm finding that this is a common issue for me. I will relatively often have a vector where there are instances of combined NA as both the magnitude and the units.

What do you think about defining a new constant, NA_units_ (similar to what is done in lubridate for dates).

In effect, NA_units_ would allow for NA as the value for the unit part while normal units would not allow that. It would require that the magnitude part is NA. All mathematical operations (+, -, *, /, and the % relatives) with NA_units_ would return NA_units_.

Enchufa2 · 2018-08-01T13:52:49Z

Could you provide an example? Are there any instances of a non-NA number with a NA unit? Because, otherwise, we could simply support NA in Ops.units by returning NA if any operand is NA.

billdenney · 2018-08-01T14:18:44Z

What I'm working with right now is clinical laboratory data in a clinical study. When a test isn't done for some reason (for example if the patient didn't come in for a scheduled visit), then the value and units for the test are explicitly missing (like d_missing below).

Immediately, I'm just trying to setup the data, and it looks like the following:

d_no_missing <-
  data.frame(Patient_ID=1:3,
             Lab_Name=c("Glucose", "Cholesterol", "Cholesterol"),
             Lab_Value=c(5, 100, 200),
             Lab_Units=c("mmol/L", "mg/dL", "mg/dL"),
             stringsAsFactors=FALSE)
d_missing <- d_no_missing
d_missing[2, c("Lab_Value", "Lab_Units")] <- NA

d_no_missing$Value_with_Units <-
  mixed_units(d_no_missing$Lab_Value,
              d_no_missing$Lab_Units,
              mode="standard")
d_missing$Value_with_Units <-
  mixed_units(d_missing$Lab_Value,
              d_missing$Lab_Units,
              mode="standard")

Enchufa2 · 2018-08-01T15:13:42Z

I see. Then one path would be to enable missing units. It requires some rework of the functions that manage the units attribute, but everything else would work seamlessly. Another path would be to return plain NAs and make units work with them. Let's see what @edzer thinks.

edzer · 2018-08-04T11:17:14Z

Great thought. I see the convenience for your use case, but I'm not convinced we should require for NA_units_ units that the magnitude is only allowed to be NA. I see numeric values in units computations as actual values with implicitly missing units. If we allow for NA_units_,

> set_units(1, km/h) * 2
2 [km/h]

should actually return 2 [NA].

Enchufa2 · 2018-08-04T12:16:30Z

I'm not convinced we should require for NA_units_ units that the magnitude is only allowed to be NA.

Me neither. If NA_units_ is allowed, we should go for the whole pack, and allow this to be just another (missing) unit. But

set_units(1, km/h) * 2 should actually return 2 [NA].

I'm not convinced about this. It is quite a change. Where is the boundary between unitless and missing units? It is not clear to me. I wouldn't change the behaviour based on a fuzzy boundary just to support a corner case. Moreover, unitless is a units concept, but missing is an R concept; so unitless is more general for this topic. Therefore, I would default to unitless (current behaviour) unless NA is explicitly specified.

In other words, for me, 2 (isolated, by default) has no units. Unless I have more information; unless I know that there is some unit, but I don't know which one it is, so I explicitly should have to specify set_units(2, NA).

edzer · 2018-08-04T12:29:07Z

I agree that would be quite a change, and I was thinking about making this optional. This

> set_units(1, km/h) * 1
1 [km/h]
> set_units(1, km/h) + 1
Error in Ops.units(set_units(1, km/h), 1) : 
  both operands of the expression should be "units" objects

however annoys me. unitless is a unit; no unit implies a missing unit. Having the option to catch (and break on) the case where plain numbers are elevated to unitless would be good. See here.

Enchufa2 · 2018-08-04T12:44:57Z

unitless is a programming construct that is needed to deal with a programming issue: the case of losing units as a result of operations with units that cancel out. So it is not a unit, conceptually; it literally means no units.

Missing implies different things. Missing means that you know, actually, that there is some unit associated with that data, but it is not known or went missing for some reason. It is quite different to me.

Enchufa2 · 2018-08-04T12:46:20Z

Oh, and I think that the error from set_units(1, km/h) + 1 should be the same as what you get from set_units(1, km/h) + set_units(1, 1), i.e.: Error: cannot convert 1 into km/h.

billdenney · 2018-08-04T15:25:23Z

The conversation morphed a bit. I think that there are a few different items here:

Values that have no units (like the number 2 as the example, 2 * 1 km = 2 km).
Values with an unknown magnitude (of NA_real_) and known units, like NA km.
Values with an unknown magnitude and unknown units.

The first is supported in the current library as is, and it should not change in my opinion.

The second is also already supported in the current library, too, and in my opinion, it should not change.

The third is not supported in the current library, and it supports several operations— mainly around the combination of mixed units. The example I gave above is my typical use case. Another example use case is c() with a mixed_units object and NA_real_ (or probably any NA).

I don’t think that it’s sensible to have a non-missing magnitude and an NA unit. For example, set_units(2, NA, mode=“standard”) seems like it should be an error, as it currently is. (There could be a use case that I’m not thinking of, but to me, that is what numeric classes are for.)

Enchufa2 · 2018-08-21T12:41:25Z

Automatically closed by 8c6bda8, which doesn't allow missing units, so I'm reopening this.

paulponcet · 2019-04-30T17:26:01Z

I would like to add the following use case to support the implementation of something like NA_unit_. If I read data in a table in a database, the table may (unfortunately, but that is real life) not be well documented, hence I may know the physical meaning of each column and not know the effective unit of each column. In that case, I would like to specify a missing unit for these columns, while waiting for more information from the database holder.

paulponcet · 2020-03-21T17:14:07Z

Dear package maintainers, do you see any chance that this proposal be accepted sooner or later? I would see this notion of missing unit NA_unit as a valuable feature for this package, and it seems to me that the implementation would not be too difficult.

In my view it should first be clear that:

"unitless" and "missing unit" are two different things, as highlighted in some comments above.
a vector with a missing unit can actually be unitless.

Now, say we have a vector x (unitless, or with a non-missing unit, or with a missing unit) and a vector y with missing unit. Then:

x * y, y * x, x/y, y/x, x^y, y^x should have a missing unit;
x + y and x - y should not be allowed, since without a known unit we don't know whether x and y have the same physical dimension.

Enchufa2 · 2020-03-23T09:04:24Z

This is still open for two main reasons:

The implementation is not so straightforward because udunits doesn't support missing units, so they need to be handled as a separate case in R, before pushing stuff to udunits. So it requires some care and additional testing to avoid breaking current behaviour.
Re-reading the thread, I'd say we didn't reach consensus.

However, it's easier to gather consensus around something that works, so we could use a PR to refine the proposal. :)

billdenney · 2021-10-13T13:47:13Z

I'm coming back to this issue as I'm working to develop some additional unit conversion tools for laboratory data. I ahve a thought about implementation. What if we define a unit during package load called "NA_units_" as a new base unit:

While that would not require a specific magnitude, it would be a simple way to accomplish the goal of a unit value that is NA. Would that be an acceptable method? The downside I see to the above is that it would be confusing because there would be a unit named "NA_units_" and a constant defined named "NA_units_", but it would still work as expected, as far as I can see:

The next question would be: How should we handle calls to is.na()? Currently, only the magnitude part appears to be examined. Should we define is.na.units() to look for the units to be NA_units_ in addition to an NA magnitude? I'd suggest that we simply keep using the magnitude check and ignore the units part for is.na().

library(units)
#> udunits database from C:/Users/Bill Denney/Documents/R/win-library/4.1/units/share/udunits/udunits2.xml
install_unit(symbol="NA_units_")
NA_units_ <- set_units(NA_real_, "NA_units_", mode="standard")

set_units(1, "NA_units_")
#> 1 [NA_units_]
set_units(2, NA_units_)
#> 2 [NA_units_]

^{Created on 2021-10-13 by the reprex package (v2.0.1)}

edzer closed this as completed in 8c6bda8 Aug 21, 2018

Enchufa2 reopened this Aug 21, 2018

edzer added the enhancement label Sep 11, 2018

Enchufa2 mentioned this issue Apr 30, 2019

Allow missing unit? #203

Closed

Enchufa2 changed the title ~~Unclear Error With Invalid Input to set_units~~ Support for NA_units_ Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for NA_units_ #163

Support for NA_units_ #163

billdenney commented Jul 25, 2018 •

edited

Loading

billdenney commented Jul 25, 2018

Enchufa2 commented Jul 25, 2018 •

edited

Loading

edzer commented Jul 28, 2018

billdenney commented Aug 1, 2018

Enchufa2 commented Aug 1, 2018

billdenney commented Aug 1, 2018

Enchufa2 commented Aug 1, 2018

edzer commented Aug 4, 2018

Enchufa2 commented Aug 4, 2018

edzer commented Aug 4, 2018

Enchufa2 commented Aug 4, 2018

Enchufa2 commented Aug 4, 2018

billdenney commented Aug 4, 2018 •

edited

Loading

Enchufa2 commented Aug 21, 2018

paulponcet commented Apr 30, 2019

paulponcet commented Mar 21, 2020 •

edited

Loading

Enchufa2 commented Mar 23, 2020

billdenney commented Oct 13, 2021 •

edited by Enchufa2

Loading

Support for NA_units_ #163

Support for NA_units_ #163

Comments

billdenney commented Jul 25, 2018 • edited Loading

billdenney commented Jul 25, 2018

Enchufa2 commented Jul 25, 2018 • edited Loading

edzer commented Jul 28, 2018

billdenney commented Aug 1, 2018

Enchufa2 commented Aug 1, 2018

billdenney commented Aug 1, 2018

Enchufa2 commented Aug 1, 2018

edzer commented Aug 4, 2018

Enchufa2 commented Aug 4, 2018

edzer commented Aug 4, 2018

Enchufa2 commented Aug 4, 2018

Enchufa2 commented Aug 4, 2018

billdenney commented Aug 4, 2018 • edited Loading

Enchufa2 commented Aug 21, 2018

paulponcet commented Apr 30, 2019

paulponcet commented Mar 21, 2020 • edited Loading

Enchufa2 commented Mar 23, 2020

billdenney commented Oct 13, 2021 • edited by Enchufa2 Loading

billdenney commented Jul 25, 2018 •

edited

Loading

Enchufa2 commented Jul 25, 2018 •

edited

Loading

billdenney commented Aug 4, 2018 •

edited

Loading

paulponcet commented Mar 21, 2020 •

edited

Loading

billdenney commented Oct 13, 2021 •

edited by Enchufa2

Loading