-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate install_conversion_constant #89
Comments
Are you planning to drop the udunits2 dependency in the next release? |
If you and @t-kalinowski both agree this is a good idea, yes. |
Completely agree. I would like to redesign the C/C++ part, but I may not have time depending on your roadmap for the next release. Anyway, as it is, it works, so the redesign can wait. |
Agreed; something possibly more urgent now: I managed to get memory leaks to zero, but still see, when units (udunits branch) is loaded:
and have no clue what to do to find out what is causing this. |
Interesting. Taking a look. |
ac7bce0 simplifies the XPtr structure; neither that, nor removing .onLoad resolve the valgrind error. |
I'd say this is an R problem, because I see the same report with the master branch, where there is no compiled code. |
All CRAN versions before 0.5-0 ("overhaul"), or any other packages I'm involved in don't have it. |
I'll try to post a longer response later this weekend. |
The |
@edzer Oh, you already found it. :) I'd say this confirms that this is an R problem. Is suppressing support for matrix multiplication the best approach? |
I guess it was a false positive that would have gone away when I had built R instrumented with valgrind (see here), since CRAN results also doesn't list it. I uncommented |
Confirmed: when I build R after |
Hi @edzer, Glad the memory leak was diagnosed (and fixed? or it never really existed?). Responding to the original issue, I really like the idea of removing I downloaded the udunits branch and I played around with it. I'm excited about the functionality but ambivalent about the interface. I suspect that rather than making a new family of functions
With the example above, there are still quite a few details to work out, but I'm hoping for something that simple. In order to get started on this, what do you think about making a set of functions internal to package:units that are thin wrappers around the udunits C API (with the naming convetion from package:udunits maintained). Scanning through the API these jump out as being useful primitives: ud.new.dimensionless.unit()
ud.new.base.unit() # i'm guessing this will help solve the ud.are.convertable("cell", "rad") problem? I think having this set of R functions that accept and return pointers to a ud.scale()
ud.offset()
ud.invert()
ud.raise()
ud.root()
ud.log() and a set of functions to go from pointer <--> character ud.parse()
ud.format() # also, this looks like it'll be nice to replace
# as.character.units(), allowing to select output in unit symbols or names
ud.map.unit.to.name() or ud.map.name.to.unit() # I don't get the difference With this set of primitives, the examples above might be something like this:
|
Actually, I wonder if we even need the full set of primitives I listed above. We could probably just get by with:
|
so, I think this is much simpler than I originally though if we depend on ut_parse() > ptr <- R_ut_parse("degC @ 274")
> R_ut_map_name_to_unit("mykelvin", ptr)
<pointer: 0x4991420>
>
> set_units(5, mykelvin) %>% set_units(degC)
279 °C
>
>
> ptr <- R_ut_parse("m*kg / s^2")
> R_ut_map_name_to_unit("myNewton", ptr)
<pointer: 0x5594030>
> set_units(5, myNewton) + set_units(5, Newton)
10 N my corresponding cpp function is very simple: // [[Rcpp::export]]
XPtrUT R_ut_map_name_to_unit( CharacterVector name, SEXP inunit) {
ut_unit *unit = ut_unwrap(inunit);
if (ut_map_name_to_unit(name[0], enc, unit) != UT_SUCCESS)
handle_error("R_ut_add_scale");
return ut_wrap(unit);
} |
Here is a quick mockup of a potential implementation and interface with this approach. install_symbolic_unit("cell") # new base unit by default
install_symbolic_unit("CFU", "cell * 4") I'm hoping that any valid syntax accepted by the C library will work and we can just reproduce this table in the documentation: https://www.unidata.ucar.edu/software/udunits/udunits-2.0.4/udunits2lib.html#Examples # more complex examples with name and symbol aliases
install_symbolic_unit(name = c("cell", "cells"))
install_symbolic_unit(name = "colony_forming_unit",
def = "4 cells",
symbol = c("CFU", "cfu"))
# more complex unit definition
install_symbolic_unit("my_degC")
install_symbolic_unit("my_degF", "(my_degC * 9/5) + 32", "myF") implementation mockup: is_nullable_scalar_character <- function(x) {
is.null(x) || is.character(x) && identical(length(x), 1L)
}
is_nullable_character <- function(x) {
is.null(x) || is.character(x)
}
install_symbolic_unit(symbol = NULL, def = NULL, name = NULL) {
stopifnot(is_nullable_scalar_character(def),
is_nullable_character(symbol),
is_nullable_character(name))
if(is.null(def)) {
ut_unit <- R_ut_new_base_unit()
} else if (identical(def, "unitless")) {
ut_unit <- R_ut_new_unitless_unit()
} else {
# convert offsets to valid udunits syntax that uses '@'
def <- gsub("+", "@", def, fixed = TRUE)
def <- gsub("([^+*/-]\\s*)(-)", "\\1@ -", def, fixed = TRUE)
# this is just need to be more robust
# perhaps some more processing for log() -> lg()/ln()/lb()
stopifnot(R_ut_is_parseable(def))
ut_unit <- R_ut_parse(def)
}
if(is.null(name) && is.null(symbol))
stop("name or symbol must be specified")
for (nm in seq_along(name))
R_ut_map_unit_to_name(ut_unit, nm)
for (sym in seq_along(symbol))
R_ut_map_unit_to_symbol(ut_unit, sym)
invisible(TRUE)
} Supporting both symbols and names will enable supporting a more powerful |
@t-kalinowski yes, the memory "error" was a valgrind false positive. I had only started with simple examples of what you can do with the udunits C pointer, but it is indeed pretty powerful. Some ideas:
An option would be to get away with the R representation altogether, but see #86. I have been thinking about |
All good idea, although I'm not sure the 2nd bullet point is worth it since parsing is quick enough, and implementing a cache is often more headache than it's worth (see frequently cited quote). Just a quick response to the last point: as I've been mulling over on #91, I think the cleanest way to deal with units that are ratios composed of the same base unit, is to require users to create custom unique units that will inhibit unwanted unit simplification or conversion. for example: install_symbolic_unit("g_arsenic")
install_symbolic_unit("kg_soil")
make_units(g_arsenic / kg_soil) |
I think it is not acceptable when > library(units)
> set_units(1:10, g/kg)
Units: 1
[1] 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 which is the best we could do given that |
I find 0.4-6 behavior dangerous, as it can lead to unexpected and wrong behavior. Consider this use case: user has mass of arsenic and mass of soil and wants to calculate a concentration:
The dangerous part is that any subsequent calculations with
Or, consider using
The only way around this issue is to prevent unit simplification. However, there are many cases where you do want the units to simplify. Even in the example above, multiplying The more I think about it, the more I think the way to solve this problem is through the example I posted above, using |
What you want is a new feature, what I want is to remove a bug introduced in 0.5-0. Let's do that first. |
With this implementation, the symbolic units are preserved so they can be printed correctly, and assigning a unit like "mg/kg" no longer changes the numeric value of the target vector
close #89: new install_unit and remove_unit
It is clear from #71, #84, and my experiments in the udunits branch that user-defined units are better handled by the udunits2 C library than by R. I removed the R layer of installing user-defined units; this removes the flexibility to define any R function for conversion, which has the advantage that we don't have to verify that functions are meaningful (monotonous), which we didn't. Also, the udunits2 C api does give, I think, all meaningful operations, which we can now directly interface.
I propose to drop
install_conversion_constant
in favor ofinstall_conversion_scale
, which complementsinstall_conversion_offset
. The problem with_constant
is that it doesn't make clear whether the constant refers to a scaling or an addition of a constant;_scale
and_offset
are used in the C API.I also propose the new semantics:
creates
thousandm
where Xthousandm
equal 1000 Xm
. This is exactly opposite to whatinstall_conversion_constant
now does. Also,thousandm
should not already be bound to a unit, andm
should be an existing unit.Similarly:
creates the new
my_temp_C
as values inK
+ 273.15.The function profile is
(from,to,value)
, and returns to=from*value or to=from+value.The text was updated successfully, but these errors were encountered: