Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parabar and other performance and documentation improvements #7

Open
wants to merge 50 commits into
base: master
Choose a base branch
from

Conversation

mihaiconstantin
Copy link
Collaborator

@mihaiconstantin mihaiconstantin commented Oct 26, 2024

This PR adds support for parabar and can close #5 and SachaEpskamp/bootnet#82 (i.e., when the changes are implemented into bootnet).

There are quite a bit of changes, including breaking changes, but I think they are worth it. Briefly, the main changes are:

  • Added support for parbar backends.
  • Improved the package documentation and the examples in the README.md.
  • Improved the performance (i.e., see the Performance Note below).
  • Added a silly package logo to README.md.
  • Added vignettes/supercomputer.Rmd discussing how parSim can be used on the Lisa cluster.

In terms of parSim arguments, there are a few improvements as well:

  • packages
    • This is a new argument that can be used to load the necessary packages on the parallel backend before the task is executed. This mitigates the need to pollute the simulation expression with library and require calls, which would end up being executed at every single repetition.
  • exclude
    • Updated the exclude argument to work with unquoted expressions. For instance, one can now write exclude = sample_size == 100 | beta == 0 or even more sophisticated expressions such as exclude = sample_size %in% c(50, 250) | beta == 0. Internally this is now handled using base R substitution and evaluation (i.e., I have no idea how to do this in dplyr and the previous implementation got deprecated).
  • save
    • Merge the previous write and name under a single and clear to use argument called save. As indicated in the documentation, this argument takes a logical or string value controlling whether and where to save the results to a text file. If save = TRUE the results are saved to a temporary file (i.e., created using base::tempfile). If save is a string, the results are saved to the specified file. If save = FALSE the results are not saved. Upon saving the results, a message is printed to the console indicating the location of the saved file.

Some additional perks:

  • User of parSim now also have access to the parabar::configure_bar function which enables them to configure the progress bar however they see fit, or switch between the R built-in progress bar and the more modern one provided by the progress package.
  • The default branch at mihaiconstantin/parSim maintains a fork that uses roxygen2 for generating the .Rd files. This PR does not include roxygen2.

Performance Note

Please note that I dropped the data.table version. Instead, I rewrote the parSim with an eye on performance. The current parSim implementation is roughly 66% faster than the previous implementation when executing the task in parallel. This number is based on the README.md example executed in parallel with four cores, 100 replications (i.e., pertaining to the simulation conditions), and progress tracking:

Unit: seconds
 expr      min       lq     mean   median       uq      max neval
  new 1.214629 1.239405 1.270970 1.258756 1.294768 1.647161   100
  old 3.678963 3.729840 3.787505 3.760879 3.802689 4.240032   100

The progress bar was configured as follows:

configure_bar(
    type = "modern",
    format = "[:bar] [:percent] [:elapsed]",
    show_after = 0.15
)

Based on my tests, the current implementation is also marginally faster when executing tasks sequentially, with or without a progress bar.

Finally, please note that I bumped the version in DESCRIPTION from 0.1.4 to 0.2.0. This is to avoid errors in the CRAN checks because the current version on github.com/SachaEpskamp/parSim is 0.1.4, which is inconsistent with the CRAN release that is at 0.1.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Base parallelization on the parabar package
1 participant