Feature: Configuration to set maximum parallelization of :parallel runner #414

mattbrictson · 2017-12-20T04:27:43Z

There have been multiple requests to set an upper limit on the number of git operations that are executed in parallel in the default Capistrano git strategy. Similarly, users are also asking for a limit on the number of parallel bundle install executions in capistrano-bundler.

What these tasks have in common is that they all use the default :parallel runner provided by SSHKit. When using Capistrano to deploy to a large number of servers, firing off these operations to all servers in parallel can overload shared resources like a git server or private gem repository.

Rather than implement rate limiting for each SCM, capistrano-bundler, etc., I feel like a more general solution should be provided by SSHKit itself.

My proposal would be to change the implementation of the :parallel runner to essentially be a subclass of the :groups runner, except with defaults of wait: 0 and limit: INFINITY. Then, if a user wants to limit the amount of parallelization, they could simply do this:

# Limit the default :parallel runner to 10 threads
SSHKit.config.default_runner_config = { limit: 10 }

If sharing implementation and configuration keys between :parallel and :groups is too confusing, then perhaps the :parallel runner could use a different configuration key (but to the same effect):

# Limit the default :parallel runner to 10 threads
SSHKit.config.default_runner_config = { threads: 10 }

Thoughts?

See also:

The text was updated successfully, but these errors were encountered:

grzegorzblaszczyk · 2017-12-20T12:00:46Z

@mattbrictson
Configuration like:

# Limit the default :parallel runner to 10 threads
SSHKit.config.default_runner_config = { limit: 10 }

affects every command sent to servers and I wanted to be able to only limit git-related tasks, so only SCM plugin would be limited, so I do not slow down the rest of deployment process.

will-in-wi · 2017-12-20T13:35:19Z

I think we do have an issue where it is needed to slow down a class of operation, but not all operations. For example, in a Rails deploy, if we just slow down Git operations we add a lot less overall runtime as compared to slowing down everything including asset precompilation and such.

I wonder whether it makes any sense to do something like tagging specific groups, e.g.:

on release_roles(:all), type: %i[scm bundle precompilation] do; end

And then allow the user to limit by type…

SSHKit.config.default_runner_config = { type: { scm: { limit: 10 } } }
# OR
SSHKit.config.typed_runner_config(:scm) = { limit: 10 }

mattbrictson · 2017-12-20T16:40:44Z

I think we do have an issue where it is needed to slow down a class of operation, but not all operations.

Good point. Unfortunately this kind of puts us back where we started, in terms of we have to modify every task that could potentially run into this type of problem (e.g. all SCM tasks, bundle install tasks). If we have to modify all these tasks anyway, I'd rather do it in the style of capistrano/capistrano#1957 rather than introduce a completely new concept like type.

Perhaps the design of SSHKit/Capistrano is not such that we can easily address this in an common way without significant modifications.

Any ideas, @leehambley?

leehambley · 2017-12-20T18:28:41Z

Any ideas, @leehambley?

I'm wondering if we can use some annotations on the classes (Rake::Task. in this case) and tag it somehow, which would allow us to pick the "ideal" or the "contended resource" executor backend... either way though it's a tricky problem and I'm not aware of any other software broadly in our category (Ansible, Chef, etc) having anything we could draw on for inspiration.

It'd be a pretty invasive change in any case, but it does keep coming up...

mattbrictson · 2017-12-21T02:50:04Z

You're right, this does come up a lot. Conceptually, users think about execution in terms of Rake tasks and would like to configure things at that level. For example:

Run the bundler:install task only on the :app role; or
Run the git:* tasks in :groups; or
Run db:migrate on primary(:db)

But in reality, this configuration is done in the on block. So there is a slight disconnect between the mental model and the code that actually has to be written.

As a result, whenever someone wants to alter the execution behavior (i.e. where the commands are executed or how (parallel, groups) the commands are run), they essentially have no choice but to reimplement the entire task. Alternatively the task has to be written in such a way that all anticipated execution customizations can be controlled via Capistrano variables, like :git_max_concurrent_connections, :git_wait_interval, :bundle_roles, :bundle_servers, etc.

I guess I am just restating what we all already know, but this is what I am wrestling with when trying to come up with a good solution that fits into the current design.

Another possiblity would be to establish a convention that there is a standard set of configuration variables for each set of tasks to control execution behavior: :[feature]_roles and :[feature]_execution_options. For git, that would look like:

namespace :git do
  task :wrapper do
    on fetch(:git_roles), fetch(:git_execution_options) do
      # ...
    end
  end
end

# defaults
set_if_empty :git_roles, -> { release_roles(:all) }
set_if_empty :git_execution_options, {}

# example customization
set :git_execution_options, { in: :groups, limit: 10, wait: 2 }

But it might be too late in the development of Capistrano and its many plugins to introduce such a convention, and while it does offer a lot of fine-grained control, the concepts might overwhelm new users.

mattbrictson · 2017-12-28T16:40:10Z

If we are happy with this style then I'll ask for a revision of capistrano/capistrano#1957 to use it and get that merged in. Perhaps we can do some quick PRs to implement the same for SVN and Hg as well.

👇

on release_roles(fetch(:git_roles)), fetch(:git_execution_options) do
  # ...
end

# defaults
set_if_empty :git_roles, :all
set_if_empty :git_execution_options, {}

:[feature]_roles is a convention already used in many other projects, so I think that will be somewhat familiar to people already using Capistrano. :[feature]_execution_options would be a new convention.

mattbrictson · 2017-12-28T16:41:54Z

Or maybe it should be :git_runner_config instead of :git_execution_options in order to be consistent with SSHKit's default_runner_config terminology?

on release_roles(fetch(:git_roles)), fetch(:git_runner_config) do
  # ...
end

# defaults
set_if_empty :git_roles, :all
set_if_empty :git_runner_config, {}

leehambley · 2017-12-28T19:58:36Z

Hey @mattbrictson thanks for making a strong suggestion.

I'm not sure I'm keen on the solution, but I don't really have anything better to suggest, I think given that the user thinks of these things on a rake task level I'd like to modify the take API, or set something on the Rake::Task that is yielded to the block.

For lack of a better example:

namespace :git do
  task :something do |t|
     t.contended_resource = true 
     # ...
  end
end

This would also keep the option for doing Rake::Task['git:something'].contended_resource = true for "modifying the tasks later".

I don't really care if we have a simple bool flag as above, or something like "runner options" like you suggest. I would prefer to keep it very simple, like a hint to the system that we can choose to interpret in our own way, rather than adding another toolkit for tuning behaviour, hence my preferences for a simple "uses contended resource" flag, which would make is reign in the parallelism slightly perhaps.

Just food for thought, anyway, even in my simple-bool proposal, we'd still have to have a set of params somewhere that dictates what that means, that's where your idea and mine would align, and we could get "contended resource run opts" from the settings hash?

mattbrictson · 2017-12-28T20:49:12Z

@leehambley thanks for the example, that helps me understand what you are going for.

Do you have some ideas on how we can establish a link between the on block and the task where it was declared? SSHKit is not aware of tasks (this is a Capistrano concern), so by the time it comes to execute the on block with the backend, that important context has been lost.

Also, is this what you have in mind for usage?

Rake::Task["git:check"].contended_resource = true
Rake::Task["git:clone"].contended_resource = true
Rake::Task["git:update"].contended_resource = true
set :contended_resource_runner_config, { in: :groups, limit: 10, wait: 2 }

leehambley · 2017-12-28T22:31:46Z

Do you have some ideas on how we can establish a link between the on block and the task where it was declared?

Sure, actually - I thought that since our on comes from our dsl.rb (and hands-off to SSHKit) that we could do something there, I don't know what exactly.

I'd imagined since execute from Rake is the secret sauce that executes the captured block that we might do what the various types of Rake task (file, task, etc) do to differentiate themselves and overload execute to invoke to use a different on shim depending on the task config..

I haven't tried any of this out - but I suspect it ought to work:

https://github.com/ruby/rake/blob/06381f62847b32b04db0362c174426ca5299c63f/lib/rake/task.rb#L242-L252 defines the base - you can see where the actions are called, I think at that point they're already bound/etc and I don't know if it's too it's too late to change the way the act resolves the on method.

Fun discovery, the first 100 line spike of Rake: https://github.com/ruby/rake/blob/93e55a4ef1dbaee42f0f355f86d837c4e2551fc1/doc/proto_rake.rdoc#L99

mattbrictson added discuss! new feature labels Dec 20, 2017

mattbrictson mentioned this issue Dec 20, 2017

(#1058) Update git.rake - configurable max concurrent connections capistrano/capistrano#1957

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Configuration to set maximum parallelization of :parallel runner #414

Feature: Configuration to set maximum parallelization of :parallel runner #414

mattbrictson commented Dec 20, 2017

grzegorzblaszczyk commented Dec 20, 2017 •

edited

Loading

will-in-wi commented Dec 20, 2017

mattbrictson commented Dec 20, 2017

leehambley commented Dec 20, 2017

mattbrictson commented Dec 21, 2017

mattbrictson commented Dec 28, 2017

mattbrictson commented Dec 28, 2017 •

edited

Loading

leehambley commented Dec 28, 2017

mattbrictson commented Dec 28, 2017

leehambley commented Dec 28, 2017 •

edited

Loading

Feature: Configuration to set maximum parallelization of :parallel runner #414

Feature: Configuration to set maximum parallelization of :parallel runner #414

Comments

mattbrictson commented Dec 20, 2017

grzegorzblaszczyk commented Dec 20, 2017 • edited Loading

will-in-wi commented Dec 20, 2017

mattbrictson commented Dec 20, 2017

leehambley commented Dec 20, 2017

mattbrictson commented Dec 21, 2017

mattbrictson commented Dec 28, 2017

mattbrictson commented Dec 28, 2017 • edited Loading

leehambley commented Dec 28, 2017

mattbrictson commented Dec 28, 2017

leehambley commented Dec 28, 2017 • edited Loading

grzegorzblaszczyk commented Dec 20, 2017 •

edited

Loading

mattbrictson commented Dec 28, 2017 •

edited

Loading

leehambley commented Dec 28, 2017 •

edited

Loading