Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUGGESTION: Support exporting all types of variables names (not just basic ones) #88

Closed
HenrikBengtsson opened this issue Feb 11, 2017 · 5 comments

Comments

@HenrikBengtsson
Copy link

Background

It's not possible to export objects with other than "basic" names (= "according to R's variable naming rules"). For example,

library("batchtools")
reg <- makeRegistry(NA)
objs <- list(a = 1, `b` = 2, `c_{i}` = 3)
str(objs)
batchExport(export = objs, reg = reg)

gives

Error in source("export-non-standard-names.R", echo = TRUE) : 
  Assertion on 'export' failed: Vector must be named according to R's variable naming rules.

Enter a frame number, or 0 to exit   

1: batchExport(export = objs, reg = reg)
2: assertList(export, names = "strict")
3: makeAssertion(x, res, .var.name, add)
4: mstop("Assertion on '%s' failed: %s.", var.name, res)

User's can somewhat workaround this by exporting as batchExport(export = list(my_exports = objs)) and then add something like mapply(names(exports), exports, FUN = assign, envir = .GlobalEnv) but that's tedious, error prone, and not very efficient.

Simple suggestion: Reference / define "R's variable naming rules"

It's probably not fully clear to the user what "R's variable naming rules" is. I quickly tried to find a formal reference, but "all" I found was from ?make.names:

"A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words."

If that's your definition, I think it would be useful to reference that in from ?checkmate::assertList but also from ?batchtools::batchExport.

I think it would also be helpful to give some examples of valid and non-valid names in ?batchtools::batchExport. This would lower the risk for run-time surprises.

Actual suggestion: Why not support all variable names?

One might want to export objects with "non-standard variable names", e.g. locally defined infix or operator functions, e.g. `%my_op%` or c_{i}. Although these are unusual, they are certainly not unlikely.

As far as I understand it, the main reason for the current naming restriction is that the name of the exported object is restricted to what can be represented in a filename (roughly POSIX minus some symbols). This comes from loadRegistryDependencies():

  path = file.path(x$file.dir, "exports")
  fns = list.files(path, pattern = "\\.rds$")
  if (length(fns) > 0L) {
    ee = .GlobalEnv
    Map(function(name, fn) {
      assign(x = name, value = readRDS(fn), envir = ee)
    }, name = basename(stri_sub(fns, to = -5L)), fn = file.path(path, fns))
  }

However, if the name would be encoded in the actual filename, but instead be stored in the saved object, then this wouldn't be a limitation, e.g.

  path = file.path(x$file.dir, "exports")
  pns = list.files(path, pattern = "\\.rds$", all.files = TRUE, full.names = TRUE)
  if (length(pns) > 0L) {
    ee = .GlobalEnv
    for (pn in pns) {
      object <- readRDS(pn)
      assign(x = object$name, value = object$value, envir = ee)
      object <- NULL ## Fastest version of rm(list = "object")
    }
  }

I understand the history / legacy / convenience of having file names reflect the variable name, but is it really needed. Could you use a unique hash code for filename instead (as it looks like you're doing in the logs(?) directory. Alternatively, they could be incremental indices, or do something like base::make.names(names, unique = TRUE) allowing for some information in the filenames.

See also

This looks identical to what's going on in BatchJobs, cf. tudo-r/BatchJobs#93

PS. Much of my feedback is now coming in as I've started to implement the future.batchtools wrapper and running initial tests on batchtools via that framework. Please, treat my issues / feedback / suggestions as what they are and feel free to discuss and say you disagree. You're actions / reply should not be the same regardless of me being a JOSS reviewer or not. I'm pretty sure you're doing just that, but I wanted to make it explicit it does not affect the JOSS review.

@mllg
Copy link
Owner

mllg commented Feb 13, 2017

You are right, I am too restrictive regarding variable names. I always forget about the infix operators.

The reasoning for not storing the variable inside the RDS is that you have to read all files just to get the names, i.e. each time you call batchExport.

make.names() is not bijective, I would need something like a base32 encoder/decoder.

I guess I'll just store the real names in the registry and use a random string for the file names.

@mllg
Copy link
Owner

mllg commented Feb 13, 2017

I've changed my mind. I think I will go with base32. See here: #89

@HenrikBengtsson
Copy link
Author

This is good news.

The reasoning for not storing the variable inside the RDS is that you have to read all files just to get the names, i.e. each time you call batchExport.

I didn't consider that batchtools might need to see what's already exported.

FYI, I've used utils::URLencode() / utils::URLdecode() in future.BatchJobs as a workaround for a long time now and it's been working very well. Haven't had any issues. My point is, I think your approach will work without surprises.

@mllg
Copy link
Owner

mllg commented Feb 15, 2017

base32 encoding is merged. Should now work with all object names.

@mllg mllg closed this as completed Feb 15, 2017
@HenrikBengtsson
Copy link
Author

Awesome. I've verified this with makeClustersFunctionInteractive(external = TRUE) and makeClustersFunctionTORQUE().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants