Skip to content

Commit

Permalink
[docs] add tutorial on parallelism (#3231)
Browse files Browse the repository at this point in the history
  • Loading branch information
odow authored Feb 22, 2023
1 parent 6a422d3 commit 00064c3
Show file tree
Hide file tree
Showing 3 changed files with 316 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ build/
latex_build/
src/tutorials/*/*.md
!src/tutorials/*/introduction.md
!src/tutorials/algorithms/parallelism.md
src/moi/
src/tutorials/linear/transp.txt
src/release_notes.md
7 changes: 6 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,11 @@ end

function _literate_directory(dir)
for filename in _file_list(dir, dir, ".md")
if !endswith(filename, "introduction.md")
if endswith(filename, "introduction.md")
continue
elseif endswith(filename, "parallelism.md")
continue
else
rm(filename)
end
end
Expand Down Expand Up @@ -168,6 +172,7 @@ const _PAGES = [
"tutorials/algorithms/benders_decomposition.md",
"tutorials/algorithms/cutting_stock_column_generation.md",
"tutorials/algorithms/tsp_lazy_constraints.md",
"tutorials/algorithms/parallelism.md",
],
"Applications" => [
"tutorials/applications/power_systems.md",
Expand Down
309 changes: 309 additions & 0 deletions docs/src/tutorials/algorithms/parallelism.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
# Parallelism

The purpose of this tutorial is to give a brief overview of parallelism in
Julia as it pertains to JuMP, and to explain some of the things to be aware of
when writing parallel algorithms involving JuMP models.

## Multi-threading and Distributed computing

There are two main types of parallelism in Julia:

1. Multi-threading
2. Distributed computing

In multi-threading, multiple tasks are run in a single Julia process and share
the same memory. In distributed computing, tasks are run in multiple Julia
processes with independent memory spaces. This can include processes across
multiple physical machines, such as in a high-performance computing cluster.

Choosing and understanding the type of parallelism you are using is important
because the code you write for each type is different, and there are different
limitations and benefits to each approach. However, the best choice is highly
problem dependent, so you may want to experiment with both approaches to
determine what works for your situation.

### Multi-threading

To use multi-threading with Julia, you must either start Julia with the
command line flag `--threads=N`, or you must set the `JULIA_NUM_THREADS`
environment variable before launching Julia. For this documentation, we set
the environment variable to:

````julia
julia> ENV["JULIA_NUM_THREADS"]
"4"
````

You can check how many threads are available using:

````julia
julia> Threads.nthreads()
4
````

The easiest way to use multi-threading in Julia is by placing the
`Threads.@threads` macro in front of a `for`-loop:

````julia
julia> @time begin
ids = Int[]
Threads.@threads for i in 1:Threads.nthreads()
global ids
push!(ids, Threads.threadid())
sleep(1.0)
end
end
1.050600 seconds (47.75 k allocations: 3.093 MiB, 4.40% compilation time)s
````

This for-loop sleeps for `1` second on each iteration. Thus, if it had
executed sequentially, it should have taken the same number of seconds as
there are threads available. Instead, it took only 1 second, showing that the
iterations were executed simultaneously. We can verify this by checking the
`Threads.threadid()` of the thread that executed each iteration:

````julia
julia> ids
4-element Vector{Int64}:
1
4
2
3
````

!!! tip
For more information, read the Julia documentation
[Multi-Threading](https://docs.julialang.org/en/v1/manual/multi-threading/#man-multithreading).

### Distributed computing

To use distributed computing with Julia, use the `Distributed` package:

````julia
julia> import Distributed
````

Like multi-threading, we need to tell Julia how many processes to add. We can
do this either by starting Julia with the `-p N` command line argument, or by
using `Distributed.addprocs`:

````julia
julia> import Pkg

julia> project = Pkg.project();

julia> workers = Distributed.addprocs(4; exeflags = "--project=$(project.path)")
4-element Vector{Int64}:
2
3
4
5
````

!!! warning
Not loading the parent environment with `--project` is a common mistake.

The added processes are "worker" processes that we can use to do computation
with. They are orchestrated by the process with the id `1`. You can check
what process the code is currently running on using `Distributed.myid()`

````julia
julia> Distributed.myid()
1
````

As a general rule, to get maximum performance you should add as many processes
as you have logical cores available.

Unlike the `for`-loop approach of multi-threading, distributed computing
extends the Julia `map` function to a "parallel-map" function
`Distributed.pmap`. For each element in the list of arguments to map over,
Julia will copy the element to an idle worker process and evaluate the
function passing the element as an input argument.

````julia
julia> function hard_work(i::Int)
sleep(1.0)
return Distributed.myid()
end
hard_work (generic function with 1 method)

julia> Distributed.pmap(hard_work, 1:4)
ERROR: On worker 2:
UndefVarError: #hard_work not defined
Stacktrace:
[...]
````

Unfortunately, if you try this code directly, you will get an error message
that says `On worker 2: UndefVarError: hard_work not defined`. The error is
thrown because, although process `1` knows what the `hard_work` function is,
the worker processes do not.

To fix the error, we need to use `Distributed.@everywhere`, which evaluates
the code on every process:

````julia
julia> Distributed.@everywhere begin
function hard_work(i::Int)
sleep(1.0)
return Distributed.myid()
end
end
````

Now if we run `pmap`, we see that it took only 1 second instead of 4, and that
it executed on each of the worker processes:

````julia
julia> @time ids = Distributed.pmap(hard_work, 1:4)
1.202006 seconds (216.39 k allocations: 13.301 MiB, 4.07% compilation time)
4-element Vector{Int64}:
2
3
5
4
````

!!! tip
For more information, read the Julia documentation
[Distributed Computing](https://docs.julialang.org/en/v1/manual/distributed-computing/).

## Using parallelism the wrong way

### When building a JuMP model

For large problems, building the model in JuMP can be a bottleneck, and you
may consider trying to write code that builds the model in parallel, for
example, by wrapping a `for`-loop that adds constraints with
`Threads.@threads`.

Unfortunately, you cannot build a JuMP model in parallel, and attempting to do
so may error or produce incorrect results.

In most cases, we find that the reason for the bottleneck is not JuMP, but in
how you are constructing the problem data, and that with changes, it is
possible to build a model in a way that is not the bottleneck in the solution
process.

!!! tip
Looking for help to make your code run faster? Ask for help on the
[community forum](https://discourse.julialang.org/c/domain/opt/13). Make
sure to include a reproducible example of your code.

### With a single JuMP model

A common approach people try is to use parallelism with a single JuMP model.
For example, to optimize a model over multiple right-hand side vectors, you
may try:

```julia
using JuMP
import HiGHS
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, x)
@objective(model, Min, x)
solutions = Float64[]
Threads.@threads for i in 1:10
set_lower_bound(x, i)
optimize!(model)
push!(solutions, objective_value(model))
end
```

This will not work, and attempting to do so may error, crash Julia or produce
incorrect results.

## Using parallelism the right way

To use parallelism with JuMP, the simplest rule to remember is that each
worker must have its own instance of a JuMP model.

### With multi-threading

With multi-threading, create a new instance of `model` in each iteration of
the for-loop:

````julia
julia> using JuMP

julia> import HiGHS

julia> solutions = Float64[]
Float64[]

julia> Threads.@threads for i in 1:10
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, x)
@objective(model, Min, x)
set_lower_bound(x, i)
optimize!(model)
push!(solutions, objective_value(model))
end

julia> solutions
8-element Vector{Float64}:
9.0
1.0
7.0
4.0
8.0
5.0
3.0
6.0
````

### With distributed computing

With distributed computing, remember to evaluate all of the code on all of the
processes using `Distributed.@everywhere`, and then write a function which
creates a new instance of the model on every evaluation:

````julia
julia> Distributed.@everywhere begin
using JuMP
import HiGHS
end

julia> Distributed.@everywhere begin
function solve_model_with_right_hand_side(i)
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, x)
@objective(model, Min, x)
set_lower_bound(x, i)
optimize!(model)
return objective_value(model)
end
end

julia> solutions = Distributed.pmap(solve_model_with_right_hand_side, 1:10)
10-element Vector{Float64}:
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
````

## Other types of parallelism

### GPU

JuMP does not support GPU programming, and few solvers support execution on a
GPU.

### Parallelism within the solver

Many solvers use parallelism internally. For example, commercial solvers like
[Gurobi](https://github.com/jump-dev/Gurobi.jl) and [CPLEX](https://github.com/jump-dev/CPLEX.jl)
both parallelize the search in branch-and-bound. Solvers supporting internal
parallelism will typically support the [`MOI.NumberOfThreads`](@ref) attribute,
which you can set using [`set_optimizer_attribute`](@ref).

0 comments on commit 00064c3

Please sign in to comment.