-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attaching bootnet
generates a .Random.seed
in the .GlobalEnv
#82
Comments
After some more debugging I think I found the commit that introduces the problem. Installing the package at this commit (i.e., b10799f) and running the remotes::install_github("SachaEpskamp/bootnet", ref = "b10799ffb8a801189e8cd3e7a090621d2016d817") At the previous commit (i.e., 93cdb31) things seem to be fine. In fact, this seed is generated by attaching the |
Hi Mihai, Thanks for all this! So the issue lies with the snow package and not the bootnet package? I changed to using the snow package because the parallel computing led to crashes on Mac before I think. But snow is an old package, so I should indeed update this. Do you think it would be better to revert back to using the parallel package? Best, Sacha |
Hi Sacha, Indeed, the issue is with the The The rationale behind this wrapper is to:
You can see the self-contained wrapper here, and here is a toy example of how I use it: # Some variables.
data <- matrix(rnorm(9), 3, 3)
# Create backend instance.
backend <- Backend$new()
# Start the cluster.
# If the type is not provided, it is inferred based on the OS.
# The number of cores are selected s.t. at least one core is always left free.
# Upon creation the cluster is always cleared to ensure nothing unintentional is
# copied (e.g., when forking).
backend$start(cores = 2, type = "psock")
# Export variables to a cluster.
backend$export(variables = c("data"), environment = .GlobalEnv)
# Inspect what variables are on the cluster.
backend$inspect()
# Evaluate an arbitrary expression on the cluster.
backend$evaluate(expression = { data^2 })
# Clear the cluster.
backend$clear()
# To check that the cluster has been cleaned.
backend$inspect()
# Run tasks on the cluster in an `sapply` fashion.
backend$sapply(x = data[, 1], fun = function(x) { x^2 })
# Run tasks on the cluster in an `apply` fashion.
backend$apply(x = data, margin = 2, fun = function(x) { x^2 })
# Adopt a cluster that was created externally.
# It will fail if there is already an active cluster registered with the backend.
backend$adopt(cluster = parallel::makePSOCKcluster(2))
# Close it.
# If the cluster is not stopped, when the `backend` instance is removed during
# the garbage collection the cluster is also automatically stopped.
backend$stop()
# Try to adopt again now that it is close.
backend$adopt(cluster = parallel::makePSOCKcluster(2))
# Now the cluster type is switched from `psock` or `fork` to `adopted`.
backend$type
# Check that it also works with the adopted cluster.
backend$evaluate(expression = { rnorm(3) })
# The following fields can be accessed.
# Is there a an active cluster registered with the backend?
backend$active
# How many nodes?
backend$cores
# What type?
backend$type
# The `parallel` cluster object that can be used with the `parallel` functions.
backend$cluster
# Stop the cluster.
backend$stop()
# The fields are reset upon cluster stop.
backend$active
backend$cluster
backend$type
backend$cores In my functions, I actually use it as follows: # Simulate sequentially.
simulate <- function(data) {
sapply(data, function(x) { Sys.sleep(0.5); return(x^2) })
}
# Simulate in parallel.
simulate_parallel <- function(data, backend) {
backend$sapply(data, function(x) { Sys.sleep(0.5); return(x^2) })
}
# Let's say this is the exported function in the `NAMESPACE`.
simulation <- function(data, cores = NULL, backend_type = NULL) {
# Decide whether it is necessary to create a parallel backend.
use_backend <- !is.null(cores) && cores > 1
# Prepare backend if necessary.
if (use_backend) {
# Create backend instance.
backend <- Backend$new()
# Start it.
backend$start(cores, type = backend_type)
# Run the task.
result <- simulate_parallel(data, backend)
# Close the backend.
backend$stop()
# Otherwise just run the task sequentially.
} else {
result <- simulate(data)
}
return(result)
}
# Data.
set.seed(1)
data <- rnorm(10)
# Sequential.
simulation(data)
# Parallel.
simulation(data, 5) You don't need the But I like the I hope this makes sense! |
Hi Mihai, Thank you for your insights here and sorry for the late reply. I notice indeed that now every time I start RStudio I get the object I am reluctant to change the dependency on |
Hi Sacha, I recall encountering an issue with Long story short, the cluster was falling to create the worker processes when Lines 533 to 536 in b10799f
My guess would be that Indeed, we can try |
Hi Mihai, I see, thanks for the info! I already expected there was some bug in |
Also, in
Sure thing! I can also, of course, help with that. |
Note for clarity:
|
I am using
bootnet
in a package and when creating aPSOCK
cluster I noticed that the child processes are populated with a.Random.seed
.After a lot of debugging, I tracked this down to
bootnet
being attached to theR
session. Consider the following code in a file calledscript.R
:Running:
Rscript script.R
yields:It seems that after
bootnet
is loaded the.GlobalEnv
is polluted with a.Random.seed
object. I am not sure this is intentional and, if it is, whether invoking theRNG
at package load is sensible. In my particular case, this resulted in a hard to debug scenario when dealing with seeds onPSOCK
clusters.The text was updated successfully, but these errors were encountered: