-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future for package developers and temporary overriding of plans #450
Comments
Thanks - this is useful and I encourage everyone to join in a think hard about this problem. I should clarify that I've got this, and other limitations on my mind carrying them with me when I work on the future framework, i.e. the current constraints are not set in stone and I hope we'll get to expand on what can be done at some point. However, these things just take a very long time to mature and require feedback from lots of developers, end-users, and their use-cases. Even if it looks like there's an obvious quick solution in front of you, it might not be the best one because it might bring us into a dead-end (e.g. the progressr updates is a good example of that) Some quick comments:
I agree. Basically, the current I think that a framework for specifying "resources" that a future requires could be a way forward here (#172). I can also imagine optional or preferred resources, e.g. a future my declare that it prefers to run a file system where some big genomic reference file is already downloaded/cached but it could still do without (e.g. it can download it again if it runs elsewhere). With "resources", we could imagine the user being able to declare multiple, alternative future backends that code can pick from. If the code requires "localhost", then for instance remote backends are immediately ruled out, leaving say "sequential", "multicore", "multisession", "callr", and "batchtools_local" as the options.
I haven't had time to digest this so cannot say much right now, but I agree that whatever the solution should be it needs to be standardized so that end-user can feel familiar with it. Back to your {styler} per se. Is the above really a problem with this type of tools-for-tools package? Isn't it mainly used occasionally by developers during development, i.e. it's unlikely that it'll clash with a user running genomic research on an HPC scheduler? Or, are there use-cases where {styler} is in constant use in the background, e.g. R GUIs? (FWIW, this far I've only used it for cleaning up legacy code) |
Thanks for your detailed answer.
Yeah, this was also an issue with styler, as the files from the local file system need to be available (r-lib/styler#277 (comment)). The idea about preferred backends also sounds good. Probably redundant to say here but I guess it also makes sense how these challenges are addressed in other programming languages and what their communities have learned from thinking about the problem and implementing solutions.
{styler} probably won't have the tools-for-tools problem (yet). It is suggested by 13 and imported by 7 CRAN packages, although I haven't looked into how styler is used in these. Maybe I should - and check which of these packages also have {future} in the recursive dependencies and how it is used. I would say most common use case is the RStudio Addin for interactive styling of the active file (where parallelisation does not make sense because of startup costs), periodical styling with |
It's an almost 3 year long standing feature request in r-lib/stlyler to offer parallelisation and I have been very conservative on this and rejected implementations in r-lib/styler#682 and r-lib/styler#617, mainly because of your recommendations for package developers in
?future::plan()
. I also read up on #263 and #274. I agree that there should be clear rules on how to use future strategies to avoid lack of control of the end user and a 'parallel chaos' with an increasing adoption of the future framework. I think one thing to consider in this discussion is that requirements for different tasks are widely different. A package for processing genomic sequences has a different user base and different scaling requirements than a tool like styler. For styler, 99% just want to have results quicker, they probably don't even know anything about future, remote workers or similar. And they should not have to. So I appreciate your note in?future::plan()
:The problem with this solution is that there is no way for users (package developer or end user) of code that contains the above workaround to override the strategy anymore. Which is exactly what we wanted to prevent in the first place. For this reason, I suggest to extend the recommendation and give the user a way to override this with an env variable. Of course every package developer could think of their own way to override, but wouldn't it be great to have a spec for this? Along these lines:
Maybe package authors should document also that they are temporarily setting a future plan in
?pkg1::future_strategy
or similar. Not sure I have enough context here for naming conventions and similar, but I guess you get the main idea. What do you think?The text was updated successfully, but these errors were encountered: