Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic conversion of fundamental units to derived variables #132

Open
jamarav opened this issue Apr 25, 2018 · 19 comments
Open

Automatic conversion of fundamental units to derived variables #132

jamarav opened this issue Apr 25, 2018 · 19 comments

Comments

@jamarav
Copy link

jamarav commented Apr 25, 2018

Good afternoon. I'm starting to use the library units (I think it's a very useful tool).
I have a couple of doubts about the library.
I'm programming a small script to evaluate the performance in refrigeration compressors.
I have the following problem:
I evaluate the compressor efficiency with the following expression:

compressor_efficiency<-(mref*Dhs)/Wcomp

Previous to this calculation I define the following variables:

mref<-set_units(vector_data_mref, 'kg/s')
Dhs<-set_units(vector_data_Dhs, 'J/(kg)')
Wcomp<-set_units(vector_data_Wcomp, 'W')

When I evaluate the compressor efficiency:
compressor_efficiency<-(mref*Dhs)/Wcomp
The units are:

> compressor_efficiency
0.5833333 J/s/W

This unit is dimensionless. Is there any way to indicate that internally interpret J / s as W? The result would be:

> compressor_efficiency
0.5833333 1

Another doubt would be the following:
When I define a new variable, entropy:
entropy<-set_units(vector_data_entropy, 'J/(kg*K)')
The printed result is:

> entropy
2175.70 J/K/kg

Is there an option to print it in the following form?

entropy
2175.70 J/(kg*K)

I think that is much clearer in this way separating numerator and denominator.

Finally, it is not very important but I would like to know if there is an option to print the units with parentheses:

> Power_input
500 (W)

Thanks in advance

@Enchufa2
Copy link
Member

There are several issues here (please, open separate threads the next time).

When I evaluate the compressor efficiency [...] This unit is dimensionless.

This is an open issue (see e.g. #123). The thing is that units maintains a unit representation at R level to be able to customise formatting and so on. However, simplification is not properly implemented at this level. On the other hand, udunits has a binary representation with proper simplification, but then you have to rely on udunits also to format units, which is not the best nor flexible at all. For example, with the udunits branch:

units:::R_ut_format(units:::R_ut_parse("J/s/W"))
#> [1] "1"

but also (from #123):

units:::R_ut_format(units:::R_ut_parse("mile/gallon"))
#> [1] "425143.683171079 m⁻²"

There are ongoing efforts to move things (e.g., user-defined units) to the C part, because udunits already manages all the hard work, but there is a trade-off when we consider, as I said, flexibility to format and print units, for instance.

One solution for this would be to provide a function simplify_units that would parse the R representation into udunits, but still we have to sort out how to parse the result back into R.

For now, you could add the following after your computations:

units(compressor_efficiency) <- 1

This will convert the efficiency to unitless, or fail with an error if units were misused in previous steps.

Is there an option to print it in the following form?

entropy
2175.70 J/(kg*K)

I don't think so. But:

units_options(negative_power=TRUE)
as_units("J/K/kg")
#> 1 J*K^-1*kg^-1

Finally, it is not very important but I would like to know if there is an option to print the units with parentheses:

Power_input
500 (W)

There is another option for this (group, see ?units_options), but it is currently applied to plots only. It may be extended to general formatting. @edzer thoughts?

@edzer
Copy link
Member

edzer commented Jun 29, 2018

Units appear now more consistently as e.g. 500 [W] where you can change the [ ] with units_options(group = c("(", ")")).

I'm in favour of makeing more aggressive simplification possible, need to look into how we could do this.

@edzer
Copy link
Member

edzer commented Jun 30, 2018

This function

to_si <- function(x) { 
  u_str = as.character(units(x))
  u = units:::R_ut_parse(u_str)
  ft = units:::R_ut_format(u, ascii = TRUE)
  new = as_units(strsplit(ft, " ")[[1]][2])
  set_units(x, new, mode = "standard")
}

converts to SI units. Shall we use that in case the user actively sets option simplify to TRUE? @Enchufa2 @t-kalinowski

> to_si(set_units(1, gallon/mile))
2.352146e-06 [m^2]
> to_si(set_units(1, gallon*mile))
6.09203 [m^4]

@t-kalinowski
Copy link
Contributor

t-kalinowski commented Jun 30, 2018 via email

@Enchufa2
Copy link
Member

I agree with @t-kalinowski. I also think that there may be cases in which someone may want to simplify some things and not others, convert to SI some things and not others. So it's nice to have these features as options, but having them as functions would be useful too.

@edzer
Copy link
Member

edzer commented Jul 1, 2018

The thing is that we have the opportunity to use simplify = TRUE for this: with simplify = NA by default right now setting units_options(simplify = TRUE) only influences setting units to numeric:

> units_options(simplify = TRUE)
> set_units(1, mg/kg)
1e-06 [1]
> units_options(simplify = NA)
> set_units(1, mg/kg)
1 [mg/kg]
> units_options(simplify = FALSE)
> set_units(1, mg/kg)
1 [mg/kg]

Further simplification is now done always, by the package, by symbols comparison. We could branch this further and

  • leave as default the symbolic simplificatio, with simplify = NA
  • use simplify = TRUE to do simplification to SI

To me, this sounds the simplest and most elegant approach.

@Enchufa2
Copy link
Member

Enchufa2 commented Jul 1, 2018

We have:

> units_options(simplify = TRUE)
> set_units(1, "gallon*in/dgallon")
10 in
> units_options(simplify = NA)
> set_units(1, "gallon*in/dgallon")
1 gallon*in/dgallon
> units_options(simplify = FALSE)
> set_units(1, "gallon*in/dgallon")
1 gallon*in/dgallon

So you mean that the first result should be m, the second one should be in, and the third one should be the same? I'm still not convinced, because converting to SI is more than a simplification, it's, well, a conversion. It could be misleading for the user.

I'm not convinced either about simplify=NA. What does it mean? Missing simplification? Then, it should be equivalent to simplify=FALSE, so why not simplify=FALSE by default?

Another thing you could do is to export to_si and simplify and document them together. Then you can explain there that

  • simplify=FALSE (which should be the defult IMHO) does nothing.
  • simplify=TRUE applies the simplify function.
  • simplify=some_function applies that function, e.g., simplify=to_si.

@edzer
Copy link
Member

edzer commented Jul 1, 2018

Thanks, valid point about in and conversion to m.

units_options(simplify = FALSE) now turns all simplification off:

> units_options(simplify = FALSE)
> u
2 [m/s]
> u * 1/u
1 [m*s/m/s]

we have NA for the combination of

  • not simplifying on set_units(1, mg/kg)
  • simplifying on the cases like the one above

Maybe then add an option, say, convert_to_SI, which when TRUE takes over all symbolic stuff by converting to base SI units?

@Enchufa2
Copy link
Member

Enchufa2 commented Jul 1, 2018

Another option is what @t-kalinowski proposed, and I think it's fine.

Regarding the name, what about convert_to_base instead? The user could potentially uninstall all SI base units and install CGS units, for example. Then the conversion would be to CGS, not SI. The documentation may reflect that, by default, this "base" is SI units.

edzer added a commit that referenced this issue Jul 1, 2018
@edzer
Copy link
Member

edzer commented Jul 1, 2018

OK, we now have

> library(units)
udunits system database from /usr/share/xml/udunits
> units_options(convert_to_base=TRUE)
> set_units(1, gallon/km)
1 [gallon/km]
> set_units(1, gallon/km) * 1 # calls .simplify_units
3.785412e-06 [m^2]

where we convert to base when we simplify. Is that the right place, or should this not happen directly in set_units?

@Enchufa2
Copy link
Member

Enchufa2 commented Jul 1, 2018

Mmmh, if I set the global option to TRUE, I would expect that the conversion happens always, i.e.:

> library(units)
udunits system database from /usr/share/xml/udunits
> units_options(convert_to_base=TRUE)
> set_units(1, gallon/km)
3.785412e-06 [m^2]
> set_units(1, gallon/km) * 1 # calls .simplify_units
3.785412e-06 [m^2]

That's why I was stressing the need to export simplification functions, including this new to_base, because the user may want to simplify a few results while keeping the global options to FALSE.

@edzer
Copy link
Member

edzer commented Sep 11, 2018

So, I guess this issue can be closed?

@Enchufa2
Copy link
Member

What do you think about my concern? Now, if convert_to_base=TRUE, set_units(1, gallon/km) * 1 converts to base but set_units(1, gallon/km) alone does not. I would expect an automatic conversion in both cases.

@edzer
Copy link
Member

edzer commented Sep 11, 2018

OK, I'll leave this open; needs a lot more love & patience to get this convert_to_base running.

@jamarav
Copy link
Author

jamarav commented Feb 13, 2024

Well, apparently I've bumped into an issue that I opened a few years. What a coincidence!

I was actually looking for the functionality of convert_base() reported by @edzer.

I think it could be very interesting. Perhaps it is not necessary to consider it as a general option, but as a simple function that we can call in case of need. In the future, if necessary, it could be included as a general option.

I don't have experience with the use of udunits from C and I suppose that as you say there is the possibility to install different base systems.

I have simply taken the function reported a few years ago by @edzer and modified it a bit.

I have conducted several tests and from what I understood, when evaluating units, we can find four typologies of string when capturing the units. For example:

"W" #[1]
"0.001 m" #[2]
"K @ 273.15" #[3]
"0.001 K @ 273150" #[4]

With this in mind the function would be:

convert_to_base <- function(x) {
  u_str = base::as.character(base::units(x))
  u = units:::R_ut_parse(u_str)
  ft = units:::R_ut_format(u, ascii = TRUE)

  ft = base::strsplit(x = ft, split = " @ ")[[1]][1]
  ft = base::strsplit(x = ft, split = " ")[[1]]
  ft = ft[length(ft)]

  new = as_units(ft)

  set_units(x, new, mode = "standard")
}

I think it could be very interesting to include it as a function available to the user. We would have a quick way to be able to convert to SI in case of need.

@jamarav
Copy link
Author

jamarav commented Feb 13, 2024

Well, it seem that the function that I reported above gets some errors. For example:

convert_to_base <- function(x) {
  R_ut_parse = utils::getFromNamespace("R_ut_parse", "units")
  R_ut_format = utils::getFromNamespace("R_ut_format", "units")
  
  u_str = as.character(base::units(x))
  u = R_ut_parse(u_str)
  ft = R_ut_format(u, ascii = TRUE)
  
  ft = strsplit(x = ft, split = " @ ")[[1]][1]
  ft = strsplit(x = ft, split = " ")[[1]]
  ft = ft[length(ft)]
  
  new = units::as_units(ft)
  
  units::set_units(x, new, mode = "standard")
}

x<-set_units(25, "g/mol")

x %>% convert_to_base()

# 40 [1/kg.mol]

I suppose the solution will be simple, but I don't know the internal function that takes care of these problems. Any idea?

@t-kalinowski
Copy link
Contributor

Perhaps something like this:

library(units)

convert_to_base <- function(x) {
  canonicalize <- function(s) {
    s |> 
      R_ut_parse() |> 
      R_ut_format(TRUE, TRUE, TRUE) |>
      gsub(" ", " * ", x = _)
  }
  
  u <- units(x)
  u <- sprintf(
    "( %s ) / ( %s )", 
    canonicalize(u$numerator), 
    canonicalize(u$denominator)
  )
  
  # message(u)
  u <- as_units(str2lang(u))
  u <- u / as.numeric(u)
  # message(class(u))
  # str(unclass(u))

  units(x) <- u
  x
}

environment(convert_to_base) <- asNamespace("units")

x <- set_units(25, "g/mol")
convert_to_base(x)
#> 0.025 [kg/mol]

set_units(25, ug/mol) |> convert_to_base()
#> 2.5e-08 [kg/mol]
set_units(25, mg/mol) |> convert_to_base()
#> 2.5e-05 [kg/mol]
set_units(25, g/mol) |> convert_to_base()
#> 0.025 [kg/mol]
set_units(25, kg/mol) |> convert_to_base()
#> 25 [kg/mol]

@jamarav
Copy link
Author

jamarav commented Feb 15, 2024

Thank you very much @t-kalinowski for the suggestions. The code you reported has helped me a lot. Unfortunately, it is perhaps a bit more complicated because of the freedom on the part of the user. Your code, for example, requires imperatively that the numerator or denominator does not contain a character(0). I have been doing some tests these days and have implemented the following function, which I think covers all cases.

convert_to_base <- function(x, simplify = T, merge_num_den = F) {
  R_ut_parse <- utils::getFromNamespace("R_ut_parse", "units")
  R_ut_format <- utils::getFromNamespace("R_ut_format", "units")

  u_strBase <- function(u_str, spfy = T) {
    u_new <- u_str |>
      R_ut_parse() |>
      R_ut_format(names = F, definition = T, ascii = T)

    u_new <- strsplit(x = u_new, split = " @ ")[[1]][1]
    u_new <- strsplit(x = u_new, split = " ")[[1]]
    u_new <- u_new[length(u_new)]

    if (spfy) {
      u_new <- u_new |>
        R_ut_parse() |>
        R_ut_format(names = F, definition = F, ascii = T)
    }

    u_new <- u_new |>
      gsub(".", " ", fixed = T, x = _)

    return(u_new)
  }


  u <- base::units(x)

  u <- sapply(u, function(i) paste0(i, collapse = "*", recycle0 = T))
  u[u == ""] <- "1"

  u["numerator"] <- sprintf("(%s)", u["numerator"])
  u["denominator"] <- sprintf("(%s)", u["denominator"])

  if (merge_num_den) u <- paste(u, collapse = "/")

  u_base <- sapply(u, function(j) u_strBase(u_str = j, spfy = simplify))

  if (merge_num_den) {
    u_base <- sprintf("(%s)", u_base)
  } else {
    unitless <- (u_base == "1")

    u_base["numerator"] <- sprintf("(%s)", u_base["numerator"])
    u_base["denominator"] <- sprintf("(%s)-1", u_base["denominator"])

    u_base <- u_base[!unitless]
    u_base <- paste(u_base, collapse = " ")
  }

  units::set_units(x, u_base, mode = "standard", implicit_exponents = T)
}

The basic operation would be as follows:
A unit object is sent to the function. It internally captures its units and creates a vector u that distinguishes between the numerator and denominator. Then, the u_strBase function takes care of converting to base units. During my tests, I think setting names = F in R_ut_format simplifies the output format so it can be easily reformatted later to apply unit conversion by using set_units. But, the most important thing to convert to base units is setting definition = T. Furthermore, some splits are required from the output of R_ut_format to identify the string that refers only to the units. Once the string referring to the units is captured, the reformatting is quite simple. With names = F, you only have to replace the multiplication represented by "." with a space. Then I concatenated numerator and denominator units and used set_units by setting implicit_exponents = T.

Furthermore, I implemented some other functionalities. convert_to_base includes two variable options: simplify and merge_num_den.

The option simplify enables a second call to R_ut_format. I found that by concatenating two calls to R_ut_format, we can:

  • Convert to base units with the first call (definition = T in R_ut_format).
  • Perform a simplification in the second call to get a more friendly format in the units (definition = F in R_ut_format). That is why I have left simplify=T as default.

Concerning the second option, merge_num_den, it allows merging numerator and denominator before calling u_strBase. This is only useful in certain cases, such as converting kJ/s to W. This is how I started describing this function, but after some testing, I found that it is much more consistent to let R_ut_format apply separate simplifications to the numerator or denominator. That is why I have left merge_num_den=F as default, as it can only be useful in some assumptions, and in others, it gives worse results. A clear example is the enthalpy (kJ/kg) where applying the simplifications without distinguishing numerator from denominator, we get "Gy" (Gray: J/kg), which in my case makes little sense when talking about enthalpies.

@edzer @Enchufa2 and @t-kalinowski , I hope this will help you implement the convert_base function and include it in future package versions. I think the function I report is working consistently, but I am open to any suggestions for improvement.

Finally, here are some tests I have carried out on this function:

u <- "kJ/kg"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 32000 [J/kg]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 32000 [Gy]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 32000 [kg*m^2/kg/s^2]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 32000 [m^2/s^2]

u <- "fahrenheit"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 273.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 273.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 273.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 273.15 [K]

u <- "celsius"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 305.15 [K]

u <- "degree_C"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 305.15 [K]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 305.15 [K]

u <- "kJ/(kg*fahrenheit)"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 57600 [J/K/kg]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 57600 [m^2/K/s^2]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 57600 [kg*m^2/K/kg/s^2]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 57600 [m^2/K/s^2]

u <- "J/s"
set_units(32, u, mode = "standard") |> convert_to_base()
#> 32 [J/s]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T)
#> 32 [W]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = F, simplify = F)
#> 32 [kg*m^2/s^3]
set_units(32, u, mode = "standard") |> convert_to_base(merge_num_den = T, simplify = F)
#> 32 [kg*m^2/s^3]

@Enchufa2
Copy link
Member

This definitely helps. Thanks all for the discussion and prototypes. I'll try to find some time to put things together. But this will be during the next half-term, because I'm a bit overloaded now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants