deprecate install_conversion_constant #89

edzer · 2018-01-18T11:07:56Z

It is clear from #71, #84, and my experiments in the udunits branch that user-defined units are better handled by the udunits2 C library than by R. I removed the R layer of installing user-defined units; this removes the flexibility to define any R function for conversion, which has the advantage that we don't have to verify that functions are meaningful (monotonous), which we didn't. Also, the udunits2 C api does give, I think, all meaningful operations, which we can now directly interface.

I propose to drop install_conversion_constant in favor of install_conversion_scale, which complements install_conversion_offset. The problem with _constant is that it doesn't make clear whether the constant refers to a scaling or an addition of a constant; _scale and _offset are used in the C API.

I also propose the new semantics:

install_conversion_scale("m", "thousandm", 1000)

creates thousandm where X thousandm equal 1000 X m. This is exactly opposite to what install_conversion_constant now does. Also, thousandm should not already be bound to a unit, and m should be an existing unit.

Similarly:

install_conversion_offset("K", "my_temp_C", 273.15)

creates the new my_temp_C as values in K + 273.15.

The function profile is (from,to,value), and returns to=from*value or to=from+value.

The text was updated successfully, but these errors were encountered:

Enchufa2 · 2018-01-18T12:04:59Z

Are you planning to drop the udunits2 dependency in the next release?

edzer · 2018-01-18T12:06:47Z

If you and @t-kalinowski both agree this is a good idea, yes.

Enchufa2 · 2018-01-18T12:54:16Z

Completely agree. I would like to redesign the C/C++ part, but I may not have time depending on your roadmap for the next release. Anyway, as it is, it works, so the redesign can wait.

edzer · 2018-01-18T14:03:03Z

Agreed; something possibly more urgent now: I managed to get memory leaks to zero, but still see, when units (udunits branch) is loaded:

edzer@gin-pebesma:~/git$ R --no-restore --no-save -q -d valgrind
==24394== Memcheck, a memory error detector
==24394== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==24394== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==24394== Command: /usr/lib/R/bin/exec/R --no-restore --no-save -q
==24394== 
> library(units)

Attaching package: ‘units’

==24394== Conditional jump or move depends on uninitialised value(s)
==24394==    at 0x40300A9: ???
==24394==    by 0x142FE99F: ???
==24394==    by 0x142FE99F: ???
==24394==    by 0x142FE9A2: ???
==24394==    by 0xFFEFF8FCF: ???
==24394==  Uninitialised value was created by a heap allocation
==24394==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24394==    by 0x4F7D2E3: ??? (in /usr/lib/R/lib/libR.so)
==24394==    by 0x4F7ECC1: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==24394==    by 0x4F2A73A: Rf_mkCharLenCE (in /usr/lib/R/lib/libR.so)
==24394==    by 0x4F808C7: Rf_install (in /usr/lib/R/lib/libR.so)
==24394==    by 0x4F81004: ??? (in /usr/lib/R/lib/libR.so)
==24394==    by 0x4F73167: setup_Rmainloop (in /usr/lib/R/lib/libR.so)
==24394==    by 0x4F74E28: Rf_mainloop (in /usr/lib/R/lib/libR.so)
==24394==    by 0x10887A: main (in /usr/lib/R/bin/exec/R)
==24394== 
The following object is masked from ‘package:base’:

    %*%

and have no clue what to do to find out what is causing this.

Enchufa2 · 2018-01-18T14:32:30Z

Interesting. Taking a look.

edzer · 2018-01-18T14:41:31Z

ac7bce0 simplifies the XPtr structure; neither that, nor removing .onLoad resolve the valgrind error.

Enchufa2 · 2018-01-18T15:44:58Z

I'd say this is an R problem, because I see the same report with the master branch, where there is no compiled code.

edzer · 2018-01-19T23:15:39Z

All CRAN versions before 0.5-0 ("overhaul"), or any other packages I'm involved in don't have it.

t-kalinowski · 2018-01-19T23:21:35Z

I'll try to post a longer response later this weekend.
My first thought is that it's related to the top-level environment that was introduced
here

Enchufa2 · 2018-01-20T00:15:41Z

The errors package reports the same on load. The thing in common is the matmult operator: I've removed it, reinstalled, and the conditional jump disappears.

Enchufa2 · 2018-01-20T00:19:21Z

@edzer Oh, you already found it. :) I'd say this confirms that this is an R problem. Is suppressing support for matrix multiplication the best approach?

edzer · 2018-01-20T08:31:09Z

I guess it was a false positive that would have gone away when I had built R instrumented with valgrind (see here), since CRAN results also doesn't list it.

I uncommented %*% for now but we can add it back later. I'm not fond of overwriting base functions and of the warning messages it generates, an alternative could be to introduce %u*% or so; I also don't know whether the way we now have a single unit for a complete matrix is what we want - thinking of covariance matrices.

edzer · 2018-01-20T09:27:28Z

Confirmed: when I build R after ./configure --with-valgrind-instrumentation=2 then it doesn't give the valgrind error we see above.

t-kalinowski · 2018-01-21T01:58:11Z

Hi @edzer,

Glad the memory leak was diagnosed (and fixed? or it never really existed?).

Responding to the original issue, I really like the idea of removing install_conversion_constant() (and, i'm guessing install_conversion_function()?. However, I don't yet have an opinion about what the best user facing API to replace those functions should be.

I downloaded the udunits branch and I played around with it. I'm excited about the functionality but ambivalent about the interface. I suspect that rather than making a new family of functions install_conversion_{scale, offset, (and in time:..invert,raise,log}, feels both cluttered (lots of functions) and limiting because it doesn't allow for any composition (e.g, if you want to apply a scale and an offset, you need an intermediate "tmp" unit). Ultimately, I think that the cleanest interface might be to roll this all into install_symbolic_unit(), and accept a compact lambda syntax for defining units. I'm imagining something like

install_symbolic_unit(Kelvin = degree_C - 273.15)
install_symbolic_unit(Newton = m*kg / s^2)

With the example above, there are still quite a few details to work out, but I'm hoping for something that simple.

In order to get started on this, what do you think about making a set of functions internal to package:units that are thin wrappers around the udunits C API (with the naming convetion from package:udunits maintained). Scanning through the API these jump out as being useful primitives:

ud.new.dimensionless.unit()
ud.new.base.unit() # i'm guessing this will help solve the ud.are.convertable("cell", "rad") problem?

I think having this set of R functions that accept and return pointers to a ut_unit functions would be powerful.

ud.scale()
ud.offset()
ud.invert()
ud.raise()
ud.root()
ud.log()

and a set of functions to go from pointer <--> character

ud.parse()
ud.format()  # also, this looks like it'll be nice to replace 
# as.character.units(), allowing to select output in unit symbols or names
ud.map.unit.to.name() or ud.map.name.to.unit() # I don't get the difference

With this set of primitives, the examples above might be something like this:

install_symbolic_unit(Kelvin = degree_C - 273.15) # would in essence do: 
ud.parse("degree_C @ 273.15") %>% ud.map.unit.to.name("Kelvin")

install_symbolic_unit(Newton = m*kg / s^2)  # would in essence do: 
ud.parse("m*kg / s^2") %>% ud.map.unit.to.name("Newton")

t-kalinowski · 2018-01-21T02:02:47Z

Actually, I wonder if we even need the full set of primitives I listed above. We could probably just get by with:

ud.parse(chr) # returns a pointer to a ut_unit
ud.map.unit.to.name(ut_unit_ptr, chr) 
ud.new.base.unit(chr)
ud.new.dimensionless.unit(chr)

ud.parse should allow us to offload all potential uses of the ud.scale, ud.offset, ud.invert... to the C library.

t-kalinowski · 2018-01-21T04:12:24Z

so, I think this is much simpler than I originally though if we depend on ut_parse()
I now get:

> ptr <- R_ut_parse("degC @ 274")
> R_ut_map_name_to_unit("mykelvin", ptr)
<pointer: 0x4991420>
> 
> set_units(5, mykelvin) %>% set_units(degC)
279 °C
> 
> 
> ptr <- R_ut_parse("m*kg / s^2")
> R_ut_map_name_to_unit("myNewton", ptr)
<pointer: 0x5594030>
> set_units(5, myNewton) + set_units(5, Newton)
10 N

my corresponding cpp function is very simple:

// [[Rcpp::export]]
XPtrUT R_ut_map_name_to_unit( CharacterVector name, SEXP inunit) {
  ut_unit *unit = ut_unwrap(inunit);
  if (ut_map_name_to_unit(name[0], enc, unit) != UT_SUCCESS)
    handle_error("R_ut_add_scale");
  return ut_wrap(unit);
}

t-kalinowski · 2018-01-21T14:16:47Z

Here is a quick mockup of a potential implementation and interface with this approach.
Basic and most common usage examples:

install_symbolic_unit("cell") # new base unit by default
install_symbolic_unit("CFU", "cell * 4")

I'm hoping that any valid syntax accepted by the C library will work and we can just reproduce this table in the documentation: https://www.unidata.ucar.edu/software/udunits/udunits-2.0.4/udunits2lib.html#Examples
e.g., install_symbolic_unit("CFU", "4 cell")

# more complex examples with name and symbol aliases
install_symbolic_unit(name = c("cell", "cells"))
install_symbolic_unit(name = "colony_forming_unit",
                      def = "4 cells",
                      symbol = c("CFU", "cfu"))


# more complex unit definition
install_symbolic_unit("my_degC")
install_symbolic_unit("my_degF", "(my_degC * 9/5) + 32", "myF")

implementation mockup:

is_nullable_scalar_character <- function(x) {
    is.null(x) || is.character(x) && identical(length(x), 1L)
}
is_nullable_character <- function(x) {
    is.null(x) || is.character(x)
}


install_symbolic_unit(symbol = NULL, def = NULL, name = NULL) {

  stopifnot(is_nullable_scalar_character(def),
            is_nullable_character(symbol),
            is_nullable_character(name))

  if(is.null(def)) {
    ut_unit <- R_ut_new_base_unit()
  } else if (identical(def, "unitless")) {
    ut_unit <- R_ut_new_unitless_unit()
  } else {
    # convert offsets to valid udunits syntax that uses '@'
    def <- gsub("+", "@", def, fixed = TRUE) 
    def <- gsub("([^+*/-]\\s*)(-)", "\\1@ -", def, fixed = TRUE) 
    # this is just need to be more robust
    # perhaps some more processing for log() -> lg()/ln()/lb()
    stopifnot(R_ut_is_parseable(def))
    ut_unit <- R_ut_parse(def)
  }

  if(is.null(name) && is.null(symbol))
    stop("name or symbol must be specified")

  for (nm in seq_along(name))
    R_ut_map_unit_to_name(ut_unit, nm)

  for (sym in seq_along(symbol))
    R_ut_map_unit_to_symbol(ut_unit, sym)

  invisible(TRUE)
}

Supporting both symbols and names will enable supporting a more powerful
as.character.units() and format.units() powered by R_ut_format

edzer · 2018-01-21T17:37:17Z

@t-kalinowski yes, the memory "error" was a valgrind false positive.

I had only started with simple examples of what you can do with the udunits C pointer, but it is indeed pretty powerful. Some ideas:

have a shallow Rcpp interface, and have all the functions ut_someting interfaced by an R_ut_something with corresponding function parameter type and order. At the C++ level, the functions then need to do the necessary error handling.
have a single constructor function for creating symbolic_units objects that adds the external pointer to the ut_unit object, so we never have to reparse.
have a print method that adds (optionally?) the R_ut_format output, or prints only that format, for verification.
have a larger test suite that verifies that our R representation and the udunits internal representation are semantically identical.

An option would be to get away with the R representation altogether, but see #86.

I have been thinking about R_ut_new_base_unit for new units we don't want to be convertable with unitless (like g/g?), but it feels a bit like a hack. Maybe a good one?

t-kalinowski · 2018-01-21T17:55:06Z

All good idea, although I'm not sure the 2nd bullet point is worth it since parsing is quick enough, and implementing a cache is often more headache than it's worth (see frequently cited quote).

Just a quick response to the last point: as I've been mulling over on #91, I think the cleanest way to deal with units that are ratios composed of the same base unit, is to require users to create custom unique units that will inhibit unwanted unit simplification or conversion. for example:

install_symbolic_unit("g_arsenic")
install_symbolic_unit("kg_soil")
make_units(g_arsenic / kg_soil)

edzer · 2018-01-21T21:29:43Z

I think it is not acceptable when units does not properly cope with g/kg, mm/m etc. units 0.4-6 gives

> library(units)
> set_units(1:10, g/kg)
Units: 1
 [1] 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010

which is the best we could do given that symbolic_units doesn't carry a multiplying constant.

t-kalinowski · 2018-01-21T22:43:18Z

I find 0.4-6 behavior dangerous, as it can lead to unexpected and wrong behavior. Consider this use case: user has mass of arsenic and mass of soil and wants to calculate a concentration:

soil_mass <- set_units(1, kg)
arsenic_mass <- set_units(10, ug)
(concentration <- soil_mass / arsenic_mass)
# 1e+08 1

The dangerous part is that any subsequent calculations with concentration will be meaningless and probably wrong.
Consider the simple case of converting to another compatible unit
ug_arsenic / kg soil -> g arsenic / kg soil

set_units(concentration, g/kg) # user intending grams Arsenic / kg soil
# 1e+11 (g/kg)

Or, consider using concentration to calculate the total arsenic mass in a different mass of soil

arsenic_mass <- concentration * set_units(5, kg)
set_units(arsenic_mass, ug) # should be 50 ug Arsenic
# 5e+17 ug

The only way around this issue is to prevent unit simplification. However, there are many cases where you do want the units to simplify. Even in the example above, multiplying concentration * set_units(5, kg), you want kg to cancel out the denominator (kg soil), but not the numerator (As mass).
Unfortunately, without introducing new base units, there is no way for the package to infer when the user want units simplified, and when not.
Also, without introducing base units, there is lots of mental tracking requiring of the user to remember which mass unit corresponds to which compound.

The more I think about it, the more I think the way to solve this problem is through the example I posted above, using install_symbolic_unit()

edzer · 2018-01-21T22:59:19Z

What you want is a new feature, what I want is to remove a bug introduced in 0.5-0. Let's do that first.

With this implementation, the symbolic units are preserved so they can be printed correctly, and assigning a unit like "mg/kg" no longer changes the numeric value of the target vector

close #89: new install_unit and remove_unit

edzer added a commit that referenced this issue Jan 19, 2018

fixes the valgrind error, #89

44faf0c

edzer added a commit that referenced this issue Jan 20, 2018

put %*% back in place; #89

20dc8d4

t-kalinowski mentioned this issue Jan 21, 2018

printing of units #86

Open

edzer added a commit that referenced this issue Jan 21, 2018

rename ud.* into ud_* ; add R_ut_map_name_to_unit; #89

114c339

edzer added a commit that referenced this issue Jan 22, 2018

should fix the g/kg regression, #89, #91

fb215d5

mailund mentioned this issue Feb 28, 2018

Conversions and user-defined units #123

Closed

Enchufa2 mentioned this issue Mar 21, 2018

How to custom-convert kg m-2 s-1 to mm/day #129

Closed

Enchufa2 added the enhancement label Nov 7, 2020

Enchufa2 added a commit that referenced this issue Nov 22, 2020

close #89: new install_unit and remove_unit

e21c2a2

edzer closed this as completed in 6a30598 Dec 15, 2020

edzer added a commit that referenced this issue Dec 15, 2020

Merge pull request #261 from r-quantities/feature/install

6686942

close #89: new install_unit and remove_unit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deprecate install_conversion_constant #89

deprecate install_conversion_constant #89

edzer commented Jan 18, 2018

Enchufa2 commented Jan 18, 2018

edzer commented Jan 18, 2018

Enchufa2 commented Jan 18, 2018

edzer commented Jan 18, 2018

Enchufa2 commented Jan 18, 2018

edzer commented Jan 18, 2018

Enchufa2 commented Jan 18, 2018

edzer commented Jan 19, 2018

t-kalinowski commented Jan 19, 2018 •

edited

Loading

Enchufa2 commented Jan 20, 2018

Enchufa2 commented Jan 20, 2018

edzer commented Jan 20, 2018

edzer commented Jan 20, 2018

t-kalinowski commented Jan 21, 2018

t-kalinowski commented Jan 21, 2018

t-kalinowski commented Jan 21, 2018

t-kalinowski commented Jan 21, 2018 •

edited

Loading

edzer commented Jan 21, 2018 •

edited

Loading

t-kalinowski commented Jan 21, 2018 •

edited

Loading

edzer commented Jan 21, 2018

t-kalinowski commented Jan 21, 2018

edzer commented Jan 21, 2018

deprecate install_conversion_constant #89

deprecate install_conversion_constant #89

Comments

edzer commented Jan 18, 2018

Enchufa2 commented Jan 18, 2018

edzer commented Jan 18, 2018

Enchufa2 commented Jan 18, 2018

edzer commented Jan 18, 2018

Enchufa2 commented Jan 18, 2018

edzer commented Jan 18, 2018

Enchufa2 commented Jan 18, 2018

edzer commented Jan 19, 2018

t-kalinowski commented Jan 19, 2018 • edited Loading

Enchufa2 commented Jan 20, 2018

Enchufa2 commented Jan 20, 2018

edzer commented Jan 20, 2018

edzer commented Jan 20, 2018

t-kalinowski commented Jan 21, 2018

t-kalinowski commented Jan 21, 2018

t-kalinowski commented Jan 21, 2018

t-kalinowski commented Jan 21, 2018 • edited Loading

edzer commented Jan 21, 2018 • edited Loading

t-kalinowski commented Jan 21, 2018 • edited Loading

edzer commented Jan 21, 2018

t-kalinowski commented Jan 21, 2018

edzer commented Jan 21, 2018

t-kalinowski commented Jan 19, 2018 •

edited

Loading

t-kalinowski commented Jan 21, 2018 •

edited

Loading

edzer commented Jan 21, 2018 •

edited

Loading

t-kalinowski commented Jan 21, 2018 •

edited

Loading